Performance Benchmarks

Comprehensive guide to performance benchmarking and regression testing

Overview

Rivellum includes a comprehensive benchmarking suite for tracking performance metrics and detecting regressions. The benchmark framework measures throughput (TPS), latency percentiles, PoUW operations, and ZK proof overhead across four canonical scenarios.

Quick Start

Running Benchmarks

# Run all scenarios
cargo run --release -p rivellum-bench run --scenario all

# Run a specific scenario
cargo run --release -p rivellum-bench run --scenario solo-transfers

# CI smoke mode (reduced load)
cargo run --release -p rivellum-bench run --ci-smoke

Comparing Against Baseline

# Compare latest results against baseline
cargo run --release -p rivellum-bench compare \
  --baseline bench-baselines/baseline.json \
  --current bench-results/latest.json \
  --fail-on-regression

Exporting Results

# Export to HTML
cargo run --release -p rivellum-bench export \
  --input bench-results/latest.json \
  --output reports/benchmark.html \
  --format html

# Export to CSV
cargo run --release -p rivellum-bench export \
  --input bench-results/latest.json \
  --output reports/benchmark.csv \
  --format csv

Benchmark Scenarios

1. Solo Transfers

Scenario ID: solo-transfers

Pure transfer transactions with no contracts or PoUW. Measures baseline transaction processing performance.

Expected Performance:

TPS: ~500
P95 Latency: ~25ms

2. Mixed Contracts

Scenario ID: mixed-contracts

50% transfers, 50% simple contract calls. Measures performance under mixed workload.

Expected Performance:

TPS: ~350
P95 Latency: ~38ms

3. PoUW Heavy

Scenario ID: pouw-heavy

Transactions with Proof-of-Useful-Work challenges enabled. Measures PoUW verification overhead.

Expected Performance:

TPS: ~200
P95 Latency: ~65ms
PoUW Ops/s: ~100

4. ZK Enabled

Scenario ID: zk-enabled

Transfers with ZK privacy proofs enabled. Measures ZK proof generation and verification overhead.

Expected Performance:

TPS: ~150
P95 Latency: ~95ms
ZK Overhead: ~25%

Micro-Benchmarks

Criterion-based micro-benchmarks are available for low-level operations:

# Run all micro-benchmarks
cargo bench

# Run specific benchmark suite
cargo bench --bench crypto_bench
cargo bench --bench intent_bench
cargo bench --bench execution_bench

Available Micro-Benchmarks

crypto_bench:

Keypair generation
Signature creation
Signature verification
Address generation
State root hashing (10, 100, 1000 items)

intent_bench:

Intent parsing
Intent serialization
Intent validation
Batch intent parsing (10, 50, 100 intents)

execution_bench:

Transfer execution
Execution trace generation
Batch execution (10, 50, 100 transactions)
Mock ZK proof generation

Baseline Management

Creating a New Baseline

When performance improvements are validated and merged, update the baseline:

# Run benchmarks
cargo run --release -p rivellum-bench run \
  --scenario all \
  --output bench-results/new-baseline.json

# Review results
cargo run --release -p rivellum-bench export \
  --input bench-results/new-baseline.json \
  --output reports/review.html \
  --format html

# Replace baseline (after review)
cp bench-results/new-baseline.json bench-baselines/baseline.json
git add bench-baselines/baseline.json
git commit -m "chore: update performance baseline"

Baseline Update Guidelines

Update baselines when:

Intentional performance optimizations are merged
Infrastructure changes affect baseline performance
New hardware configurations are adopted

DO NOT update baselines to hide regressions.

Regression Detection

Thresholds

Default regression thresholds:

TPS Decrease: -10% (fail if TPS drops more than 10%)
P95 Latency Increase: +15% (fail if P95 increases more than 15%)

Custom Thresholds

cargo run --release -p rivellum-bench compare \
  --baseline bench-baselines/baseline.json \
  --current bench-results/latest.json \
  --tps-threshold 5.0 \
  --p95-threshold 10.0 \
  --fail-on-regression

CI Integration

Benchmarks run automatically in CI with --ci-smoke mode (reduced load):

# .github/workflows/ci.yml
- name: Run benchmark smoke tests
  run: |
    ./target/release/rivellum-bench run --ci-smoke --output bench-results/ci-run.json

- name: Compare against baselines
  run: |
    ./target/release/rivellum-bench compare \
      --baseline bench-baselines/baseline.json \
      --current bench-results/ci-run.json \
      --fail-on-regression

Performance Dashboard

View real-time benchmark results in the Portal:

http://localhost:3001/performance

The dashboard displays:

Current vs. baseline TPS and latency metrics
Latency percentile charts (P50, P95, P99)
PoUW and ZK overhead metrics
Regression indicators

Architecture

Macro-Benchmark Flow

Scenario Selection: Choose from registry or run all
Load Generation: Submit transactions via HTTP to running node
Metrics Collection: Fetch /metrics endpoint for node statistics
Result Calculation: Compute TPS, latency percentiles, overhead
JSON Output: Save structured results for comparison

Micro-Benchmark Flow

Criterion Setup: Configure benchmark groups and parameters
Warm-up: Run iterations to stabilize performance
Measurement: Collect timing samples
Analysis: Statistical analysis with outlier detection
HTML Report: Generate detailed criterion reports

File Structure

rivellum/
├── crates/
│   └── rivellum-bench/
│       ├── src/
│       │   ├── lib.rs          # Core types and traits
│       │   ├── main.rs         # CLI entry point
│       │   ├── scenarios/      # Macro-benchmark scenarios
│       │   ├── metrics.rs      # Metrics collection
│       │   ├── compare.rs      # Regression detection
│       │   └── export.rs       # Result export (CSV/HTML)
│       └── benches/            # Criterion micro-benchmarks
├── bench-baselines/
│   └── baseline.json           # Reference baseline
└── bench-results/
    └── latest.json             # Most recent run

Best Practices

Running Benchmarks

Clean Environment: Close unnecessary applications
Consistent Hardware: Use same machine for comparisons
Warm-up: Allow node to stabilize before benchmarking
Multiple Runs: Run 3-5 times and average results for important measurements

Interpreting Results

TPS: Higher is better (more transactions per second)
Latency: Lower is better (faster response times)
P95/P99: Focus on tail latencies for user experience
Overhead: Measure cost of features (PoUW, ZK)

Debugging Regressions

If CI detects a regression:

Review the comparison output in CI artifacts
Run benchmarks locally to reproduce
Use git bisect to find the offending commit
Profile the regressed code path
Fix or justify the regression

Command Reference

rivellum-bench CLI

rivellum-bench 0.1.0
Rivellum performance benchmark tool

USAGE:
    rivellum-bench <SUBCOMMAND>

SUBCOMMANDS:
    run        Run benchmark scenarios
    compare    Compare results against baseline
    export     Export results to CSV or HTML
    list       List available scenarios
    help       Print this message or the help of the given subcommand(s)

Run Options

rivellum-bench-run
Run benchmark scenarios

USAGE:
    rivellum-bench run [OPTIONS]

OPTIONS:
    -s, --scenario <SCENARIO>        Scenario to run [default: all]
    -n, --node-url <NODE_URL>        Node URL [default: http://localhost:8080]
    -t, --tx-count <TX_COUNT>        Number of transactions [default: 1000]
    -c, --concurrency <CONCURRENCY>  Concurrent load generators [default: 10]
    -o, --output <OUTPUT>            Output file [default: bench-results/latest.json]
        --ci-smoke                   Enable CI smoke mode (reduced load)

Compare Options

rivellum-bench-compare
Compare results against baseline

USAGE:
    rivellum-bench compare [OPTIONS]

OPTIONS:
    -b, --baseline <BASELINE>          Baseline file [default: bench-baselines/baseline.json]
    -c, --current <CURRENT>            Current results file [default: bench-results/latest.json]
        --tps-threshold <TPS>          TPS regression threshold % [default: 10.0]
        --p95-threshold <P95>          P95 regression threshold % [default: 15.0]
        --fail-on-regression           Exit with error if regressions detected

Troubleshooting

Node Not Running

Error: Failed to fetch metrics: connection refused

Solution: Start a node before running benchmarks:

cargo run --release -p rivellum-node -- --config config/test-config.toml

Low TPS Results

Possible causes:

Node running in debug mode (use --release)
Insufficient hardware resources
Network latency (use localhost)
Concurrent processes consuming CPU

Baseline File Missing

Error: Failed to load baseline results: No such file or directory

Solution: Create initial baseline:

cargo run --release -p rivellum-bench run --scenario all --output bench-baselines/baseline.json

Future Enhancements

Planned improvements:

Multi-node cluster benchmarks
Continuous performance tracking dashboard
Flamegraph integration for profiling
Historical trend analysis
Automated baseline updates on main branch
Real ZK proof benchmarks (vs. mock)

Rivellum Portal

Performance Benchmarks

Overview

Quick Start

Running Benchmarks

Comparing Against Baseline

Exporting Results

Benchmark Scenarios

1. Solo Transfers

2. Mixed Contracts

3. PoUW Heavy

4. ZK Enabled

Micro-Benchmarks

Available Micro-Benchmarks

Baseline Management

Creating a New Baseline

Baseline Update Guidelines

Regression Detection

Thresholds

Custom Thresholds

CI Integration

Performance Dashboard

Architecture

Macro-Benchmark Flow

Micro-Benchmark Flow

File Structure

Best Practices

Running Benchmarks

Interpreting Results

Debugging Regressions

Command Reference

rivellum-bench CLI

Run Options

Compare Options

Troubleshooting

Node Not Running

Low TPS Results

Baseline File Missing

Future Enhancements

See Also