rivellum Backup and Restore Operations Guide

Overview

This guide covers state management, backup strategies, and disaster recovery procedures for rivellum nodes. The snapshot and pruning infrastructure enables operators to:

Create point-in-time snapshots of blockchain state
Restore from snapshots for disaster recovery or testing
Prune old ledger entries to manage disk usage on long-running nodes
Verify snapshot integrity through chain ID and state root checks

Snapshot Management
Pruning Configuration
Backup Strategies
Disaster Recovery
Best Practices
Troubleshooting

Snapshot Management

What is a Snapshot?

A snapshot is a complete copy of the blockchain state at a specific height, including:

State Database (state.db/) - All account balances, nonces, and contract state
Metadata (snapshot_meta.json) - Height, state root, timestamp, chain ID, version

Snapshots do not include the ledger (transaction history), which can be replayed from genesis or other nodes.

Creating Snapshots

Basic Snapshot Creation

# Create a snapshot of the current state
rivellum-node snapshot create --output ./snapshots/snapshot-2024-01-15

# With a description
rivellum-node snapshot create \
  --output ./snapshots/mainnet-snapshot-height-1000000 \
  --description "Mainnet snapshot at height 1M"

Output Structure

snapshot-2024-01-15/
├── state.db/           # Copy of RocksDB/sled state database
│   ├── CURRENT
│   ├── MANIFEST-*
│   └── *.sst
└── snapshot_meta.json  # Metadata file

Snapshot Metadata

The snapshot_meta.json file contains:

{
  "height": 1000000,
  "state_root": "0x1234...",
  "created_at_ms": 1705334400000,
  "ledger_path": "/data/ledger.log",
  "chain_id": "rivellum-mainnet",
  "version": 1,
  "description": "Mainnet snapshot at height 1M"
}

Listing Snapshots

# List all snapshots in default directory (./snapshots)
rivellum-node snapshot list

# List snapshots in a specific directory
rivellum-node snapshot list --dir /backups/rivellum/snapshots

Example Output:

╔═══════════════════════════════════════════════════════════╗
║          rivellum AVAILABLE SNAPSHOTS                    ║
╚═══════════════════════════════════════════════════════════╝

Found 2 snapshot(s) in: /backups/rivellum/snapshots

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Path:        /backups/rivellum/snapshots/snapshot-2024-01-15
Height:      1000000
State Root:  StateRoot(0x1234...)
Chain ID:    rivellum-mainnet
Created:     1705334400000 ms since epoch
Description: Mainnet snapshot at height 1M

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Path:        /backups/rivellum/snapshots/snapshot-2024-01-20
Height:      1050000
State Root:  StateRoot(0x5678...)
Chain ID:    rivellum-mainnet
Created:     1705766400000 ms since epoch

Restoring from Snapshots

Basic Restore

# Restore from a snapshot (with chain ID verification)
rivellum-node snapshot restore --input ./snapshots/snapshot-2024-01-15

What happens during restore:

Loads snapshot metadata
Verifies chain ID matches node config (unless --no-verify)
Creates backup of existing state: data_dir/state_backup_{timestamp}/
Copies snapshot state.db to data_dir/state.db
Reports success with snapshot details

Restore Without Verification (DANGEROUS)

# Skip chain ID verification (for testing or cross-chain recovery)
rivellum-node snapshot restore \
  --input ./snapshots/snapshot-2024-01-15 \
  --no-verify

⚠️ Warning: Only use --no-verify if you understand the risks. Restoring a snapshot from a different chain can lead to inconsistent state.

Post-Restore Steps

After restoration, you may need to:

Replay ledger - If you have a ledger.log file, replay it to catch up to current height
Sync from peers - Connect to network peers to download missing blocks
Verify state root - Check that the state root matches your expectations

Pruning Configuration

Overview

Pruning automatically removes old ledger entries to manage disk usage. This is critical for long-running nodes that would otherwise accumulate hundreds of GB of transaction history.

Configuration

Add to your config/default.toml:

[pruning]
# Enable automatic pruning
enabled = true

# Keep last N ledger entries (default: 10000)
# This determines how much history is retained
keep_last_entries = 10000

# Prune every N seconds (default: 3600 = 1 hour)
pruning_interval_secs = 3600

# Require snapshot before pruning (default: true)
# Safety feature: ensures you have a snapshot before deleting history
require_snapshot = true

Pruning Modes

Conservative (Recommended)

[pruning]
enabled = true
keep_last_entries = 10000        # ~1 day of history at 1 intent/10s
pruning_interval_secs = 3600     # Prune hourly
require_snapshot = true          # Safety on

Best for: Production mainnet nodes

Aggressive (High Throughput)

[pruning]
enabled = true
keep_last_entries = 1000         # ~2 hours of history
pruning_interval_secs = 600      # Prune every 10 minutes
require_snapshot = true

Best for: Testnet nodes, nodes with limited disk space

Archive Node (No Pruning)

[pruning]
enabled = false

Best for: Explorers, auditing, research nodes

Pruning Safety Features

Require Snapshot - If require_snapshot = true, pruning will fail if no recent snapshot exists
Retention Window - keep_last_entries ensures you always have recent history
Atomic Operations - Pruning operations are atomic per entry
Logging - All pruning operations are logged for auditing

Backup Strategies

Recommended Backup Schedule

Production Mainnet

Frequency	Type	Retention	Storage
Daily	Snapshot	30 days	S3/GCS
Weekly	Snapshot	1 year	S3 Glacier
Monthly	Snapshot	Permanent	Cold storage

Testnet

Frequency	Type	Retention	Storage
Weekly	Snapshot	4 weeks	Local disk

Automated Snapshot Creation

Using Cron (Linux)

# Add to crontab: Daily snapshot at 2 AM
0 2 * * * /usr/local/bin/rivellum-node snapshot create \
  --config /etc/rivellum/config.toml \
  --output /backups/snapshot-$(date +\%Y-\%m-\%d) \
  --description "Daily automated snapshot"

Using Task Scheduler (Windows)

# PowerShell script: daily_snapshot.ps1
$date = Get-Date -Format "yyyy-MM-dd"
$output = "C:\backups\rivellum\snapshot-$date"
& "C:\Program Files\rivellum\rivellum-node.exe" snapshot create `
  --output $output `
  --description "Daily automated snapshot"

Schedule via Task Scheduler:

Trigger: Daily at 2:00 AM
Action: Run PowerShell script
Run whether user is logged on or not

Cloud Storage Integration

Upload to AWS S3

#!/bin/bash
# backup_to_s3.sh

SNAPSHOT_DIR="/backups/snapshot-$(date +%Y-%m-%d)"
S3_BUCKET="s3://my-rivellum-backups"

# Create snapshot
rivellum-node snapshot create --output "$SNAPSHOT_DIR"

# Upload to S3
aws s3 sync "$SNAPSHOT_DIR" "$S3_BUCKET/snapshots/$(basename $SNAPSHOT_DIR)"

# Clean up old local snapshots (keep last 7 days)
find /backups -name "snapshot-*" -type d -mtime +7 -exec rm -rf {} \;

Upload to Google Cloud Storage

#!/bin/bash
# backup_to_gcs.sh

SNAPSHOT_DIR="/backups/snapshot-$(date +%Y-%m-%d)"
GCS_BUCKET="gs://my-rivellum-backups"

# Create snapshot
rivellum-node snapshot create --output "$SNAPSHOT_DIR"

# Upload to GCS
gsutil -m rsync -r "$SNAPSHOT_DIR" "$GCS_BUCKET/snapshots/$(basename $SNAPSHOT_DIR)"

Storage Recommendations

Snapshot Size Estimation

Empty State: ~10 MB
Small Network (<10k accounts): ~100 MB
Medium Network (~1M accounts): ~5-10 GB
Large Network (>10M accounts): ~50-100 GB

Plan for 2-3x growth over 1 year.

Storage Providers

Provider	Use Case	Cost (approx)
Local Disk	Fast access, recent snapshots	Hardware cost
AWS S3 Standard	Active snapshots (30 days)	$0.023/GB/month
AWS S3 Glacier	Long-term archives	$0.004/GB/month
GCS Standard	Active snapshots	$0.020/GB/month
GCS Nearline	Monthly archives	$0.010/GB/month
Backblaze B2	Budget option	$0.005/GB/month

Disaster Recovery

Scenarios and Solutions

Scenario 1: Corrupted State Database

Symptoms:

Node crashes on startup
RocksDB corruption errors
Inconsistent state root

Recovery:

# 1. Stop the node (if running)
systemctl stop rivellum-node

# 2. Backup corrupted state (for forensics)
mv /data/rivellum/state.db /data/rivellum/state.db.corrupted

# 3. Restore from most recent snapshot
rivellum-node snapshot restore \
  --input /backups/snapshot-2024-01-20

# 4. Replay ledger to catch up (if available)
rivellum-node run --replay-ledger

# 5. Restart node
systemctl start rivellum-node

Scenario 2: Disk Failure

Symptoms:

Disk I/O errors
Data directory inaccessible

Recovery:

# 1. Replace failed disk and mount at /data

# 2. Download latest snapshot from cloud storage
aws s3 sync s3://my-backups/snapshot-2024-01-20 /backups/snapshot-2024-01-20

# 3. Restore snapshot
rivellum-node snapshot restore \
  --input /backups/snapshot-2024-01-20

# 4. Rejoin network and sync
rivellum-node run --config /etc/rivellum/config.toml

Scenario 3: Accidental Deletion

Symptoms:

State directory deleted
Ledger missing

Recovery:

# 1. Check for automatic backups (created during restore)
ls -la /data/rivellum/state_backup_*

# 2. Restore from latest backup
mv /data/rivellum/state_backup_1705766400 /data/rivellum/state.db

# 3. Restart node
systemctl restart rivellum-node

Scenario 4: Wrong Chain Restored

Symptoms:

Chain ID mismatch errors
Genesis doesn't match network

Recovery:

# 1. Identify correct snapshot for your chain
rivellum-node snapshot list --dir /backups

# 2. Restore with correct snapshot
rivellum-node snapshot restore --input /backups/mainnet-snapshot-X

# 3. Verify chain ID in config matches
grep chain_id /etc/rivellum/config.toml
# Should output: chain_id = "rivellum-mainnet"

Recovery Time Objectives (RTO)

Scenario	RTO	Notes
Corrupted state (local snapshot)	5-10 minutes	Restore + verify
Disk failure (cloud snapshot)	30-60 minutes	Download + restore
Complete node rebuild	2-4 hours	Install + restore + sync

Testing Disaster Recovery

Monthly DR Test:

# 1. Create test environment
mkdir -p /tmp/dr-test/data

# 2. Restore production snapshot to test location
rivellum_DATA_DIR=/tmp/dr-test/data \
  rivellum-node snapshot restore --input /backups/latest

# 3. Verify state
rivellum_DATA_DIR=/tmp/dr-test/data \
  rivellum-node validate-genesis /etc/rivellum/genesis.json

# 4. Clean up
rm -rf /tmp/dr-test

Best Practices

Snapshot Management

Label Descriptively - Use meaningful descriptions with height and date

rivellum-node snapshot create \
  --output ./snapshot-mainnet-h1000000-2024-01-15 \
  --description "Mainnet snapshot at height 1M before upgrade"

Verify After Creation - Always check metadata after creating a snapshot
```
cat ./snapshot-*/snapshot_meta.json | jq .
```
Test Restores - Periodically test restoration in a non-production environment
Automate - Use cron/task scheduler for regular snapshots

Pruning

Start Conservative - Begin with large keep_last_entries (10000+)
Monitor Disk Usage - Track disk growth and adjust pruning accordingly
Keep Snapshots - Always set require_snapshot = true in production
Log Pruning - Review logs to ensure pruning runs as expected

Storage

3-2-1 Rule - 3 copies, 2 different media, 1 offsite
- Copy 1: Local snapshots (fast access)
- Copy 2: Network-attached storage (NAS)
- Copy 3: Cloud storage (S3/GCS)

Encrypt Backups - Encrypt snapshots before uploading to cloud

tar -czf - ./snapshot-2024-01-15 | \
  gpg --encrypt --recipient ops@rivellum.io > snapshot.tar.gz.gpg

Version Snapshots - Keep multiple versions for rollback options

Operational

Document Procedures - Maintain runbooks for common scenarios
Alert on Failures - Monitor snapshot creation and alert on failures
Capacity Planning - Estimate storage needs 6-12 months ahead
Access Control - Restrict snapshot restore to authorized operators

Troubleshooting

Common Issues

"Chain ID mismatch" Error

Problem:

Error: Chain ID mismatch! Snapshot has 'rivellum-testnet' but config expects 'rivellum-mainnet'

Solution:

Verify you have the correct snapshot for your network
Use --no-verify only if intentionally switching chains (testing only)
Check chain_id in config/default.toml

"Source state database not found"

Problem:

Error: Source state database not found: /data/rivellum/state.db

Solution:

Ensure node is stopped before creating snapshot
Verify data_dir in config points to correct location
Check if state database was moved or deleted

Snapshot Restoration Hangs

Problem: Restore command appears frozen

Solution:

Large snapshots take time (10+ GB can take 5-10 minutes)
Check disk I/O with iostat -x 1 (Linux) or Task Manager (Windows)
Ensure destination has enough free space (2x snapshot size)

Pruning Not Running

Problem: Ledger keeps growing despite pruning enabled

Solution:

Check logs for pruning errors: grep -i prune /var/log/rivellum/node.log
Verify enabled = true in [pruning] config section
Ensure require_snapshot is satisfied (create a snapshot if needed)
Check that pruning_interval_secs has elapsed

Debugging Commands

# Check current state database size
du -sh /data/rivellum/state.db

# Check ledger size
wc -l /data/rivellum/ledger.log

# Verify snapshot metadata
cat /backups/snapshot-*/snapshot_meta.json | jq .

# Check disk space
df -h /data

# Monitor pruning in real-time (Linux)
tail -f /var/log/rivellum/node.log | grep -i prune

Advanced Topics

Cross-Chain Snapshots

Snapshots can be used to bootstrap testnets from mainnet state:

# 1. Create mainnet snapshot
rivellum-node --config mainnet.toml snapshot create --output /tmp/mainnet-snap

# 2. Restore to testnet (with verification disabled)
rivellum-node --config testnet.toml snapshot restore \
  --input /tmp/mainnet-snap \
  --no-verify

# 3. Modify chain_id in metadata if needed
# Edit testnet config to match

Incremental Snapshots (Future Feature)

Planned for future releases: incremental snapshots that only capture state changes since last full snapshot.

Snapshot Compression

To save storage space:

# Create and compress
rivellum-node snapshot create --output /tmp/snapshot
tar -czf snapshot-$(date +%Y-%m-%d).tar.gz /tmp/snapshot

# Restore from compressed
tar -xzf snapshot-2024-01-15.tar.gz
rivellum-node snapshot restore --input ./snapshot-2024-01-15

References

Last Updated: 2024-01-26
Version: 1.0.0

rivellum Backup and Restore Operations Guide

Overview

Table of Contents

Snapshot Management

What is a Snapshot?

Creating Snapshots

Basic Snapshot Creation

Output Structure

Snapshot Metadata

Listing Snapshots

Restoring from Snapshots

Basic Restore

Restore Without Verification (DANGEROUS)

Post-Restore Steps

Pruning Configuration

Overview

Configuration

Pruning Modes

Conservative (Recommended)

Aggressive (High Throughput)

Archive Node (No Pruning)

Pruning Safety Features

Backup Strategies

Recommended Backup Schedule

Production Mainnet

Testnet

Automated Snapshot Creation

Using Cron (Linux)

Using Task Scheduler (Windows)

Cloud Storage Integration

Upload to AWS S3

Upload to Google Cloud Storage

Storage Recommendations

Snapshot Size Estimation

Storage Providers

Disaster Recovery

Scenarios and Solutions

Scenario 1: Corrupted State Database

Scenario 2: Disk Failure

Scenario 3: Accidental Deletion

Scenario 4: Wrong Chain Restored

Recovery Time Objectives (RTO)

Testing Disaster Recovery

Best Practices

Snapshot Management

Pruning

Storage

Operational

Troubleshooting

Common Issues

"Chain ID mismatch" Error

"Source state database not found"

Snapshot Restoration Hangs

Pruning Not Running

Debugging Commands

Advanced Topics

Cross-Chain Snapshots

Incremental Snapshots (Future Feature)

Snapshot Compression

References