rivellum Backup and Restore Operations Guide
Overview
This guide covers state management, backup strategies, and disaster recovery procedures for rivellum nodes. The snapshot and pruning infrastructure enables operators to:
- Create point-in-time snapshots of blockchain state
- Restore from snapshots for disaster recovery or testing
- Prune old ledger entries to manage disk usage on long-running nodes
- Verify snapshot integrity through chain ID and state root checks
Table of Contents
- Snapshot Management
- Pruning Configuration
- Backup Strategies
- Disaster Recovery
- Best Practices
- Troubleshooting
Snapshot Management
What is a Snapshot?
A snapshot is a complete copy of the blockchain state at a specific height, including:
- State Database (
state.db/) - All account balances, nonces, and contract state - Metadata (
snapshot_meta.json) - Height, state root, timestamp, chain ID, version
Snapshots do not include the ledger (transaction history), which can be replayed from genesis or other nodes.
Creating Snapshots
Basic Snapshot Creation
# Create a snapshot of the current state
rivellum-node snapshot create --output ./snapshots/snapshot-2024-01-15
# With a description
rivellum-node snapshot create \
--output ./snapshots/mainnet-snapshot-height-1000000 \
--description "Mainnet snapshot at height 1M"
Output Structure
snapshot-2024-01-15/
āāā state.db/ # Copy of RocksDB/sled state database
ā āāā CURRENT
ā āāā MANIFEST-*
ā āāā *.sst
āāā snapshot_meta.json # Metadata file
Snapshot Metadata
The snapshot_meta.json file contains:
{
"height": 1000000,
"state_root": "0x1234...",
"created_at_ms": 1705334400000,
"ledger_path": "/data/ledger.log",
"chain_id": "rivellum-mainnet",
"version": 1,
"description": "Mainnet snapshot at height 1M"
}
Listing Snapshots
# List all snapshots in default directory (./snapshots)
rivellum-node snapshot list
# List snapshots in a specific directory
rivellum-node snapshot list --dir /backups/rivellum/snapshots
Example Output:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā rivellum AVAILABLE SNAPSHOTS ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Found 2 snapshot(s) in: /backups/rivellum/snapshots
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Path: /backups/rivellum/snapshots/snapshot-2024-01-15
Height: 1000000
State Root: StateRoot(0x1234...)
Chain ID: rivellum-mainnet
Created: 1705334400000 ms since epoch
Description: Mainnet snapshot at height 1M
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Path: /backups/rivellum/snapshots/snapshot-2024-01-20
Height: 1050000
State Root: StateRoot(0x5678...)
Chain ID: rivellum-mainnet
Created: 1705766400000 ms since epoch
Restoring from Snapshots
Basic Restore
# Restore from a snapshot (with chain ID verification)
rivellum-node snapshot restore --input ./snapshots/snapshot-2024-01-15
What happens during restore:
- Loads snapshot metadata
- Verifies chain ID matches node config (unless
--no-verify) - Creates backup of existing state:
data_dir/state_backup_{timestamp}/ - Copies snapshot state.db to
data_dir/state.db - Reports success with snapshot details
Restore Without Verification (DANGEROUS)
# Skip chain ID verification (for testing or cross-chain recovery)
rivellum-node snapshot restore \
--input ./snapshots/snapshot-2024-01-15 \
--no-verify
ā ļø Warning: Only use --no-verify if you understand the risks. Restoring a snapshot from a different chain can lead to inconsistent state.
Post-Restore Steps
After restoration, you may need to:
- Replay ledger - If you have a ledger.log file, replay it to catch up to current height
- Sync from peers - Connect to network peers to download missing blocks
- Verify state root - Check that the state root matches your expectations
Pruning Configuration
Overview
Pruning automatically removes old ledger entries to manage disk usage. This is critical for long-running nodes that would otherwise accumulate hundreds of GB of transaction history.
Configuration
Add to your config/default.toml:
[pruning]
# Enable automatic pruning
enabled = true
# Keep last N ledger entries (default: 10000)
# This determines how much history is retained
keep_last_entries = 10000
# Prune every N seconds (default: 3600 = 1 hour)
pruning_interval_secs = 3600
# Require snapshot before pruning (default: true)
# Safety feature: ensures you have a snapshot before deleting history
require_snapshot = true
Pruning Modes
Conservative (Recommended)
[pruning]
enabled = true
keep_last_entries = 10000 # ~1 day of history at 1 intent/10s
pruning_interval_secs = 3600 # Prune hourly
require_snapshot = true # Safety on
Best for: Production mainnet nodes
Aggressive (High Throughput)
[pruning]
enabled = true
keep_last_entries = 1000 # ~2 hours of history
pruning_interval_secs = 600 # Prune every 10 minutes
require_snapshot = true
Best for: Testnet nodes, nodes with limited disk space
Archive Node (No Pruning)
[pruning]
enabled = false
Best for: Explorers, auditing, research nodes
Pruning Safety Features
- Require Snapshot - If
require_snapshot = true, pruning will fail if no recent snapshot exists - Retention Window -
keep_last_entriesensures you always have recent history - Atomic Operations - Pruning operations are atomic per entry
- Logging - All pruning operations are logged for auditing
Backup Strategies
Recommended Backup Schedule
Production Mainnet
| Frequency | Type | Retention | Storage |
|---|---|---|---|
| Daily | Snapshot | 30 days | S3/GCS |
| Weekly | Snapshot | 1 year | S3 Glacier |
| Monthly | Snapshot | Permanent | Cold storage |
Testnet
| Frequency | Type | Retention | Storage |
|---|---|---|---|
| Weekly | Snapshot | 4 weeks | Local disk |
Automated Snapshot Creation
Using Cron (Linux)
# Add to crontab: Daily snapshot at 2 AM
0 2 * * * /usr/local/bin/rivellum-node snapshot create \
--config /etc/rivellum/config.toml \
--output /backups/snapshot-$(date +\%Y-\%m-\%d) \
--description "Daily automated snapshot"
Using Task Scheduler (Windows)
# PowerShell script: daily_snapshot.ps1
$date = Get-Date -Format "yyyy-MM-dd"
$output = "C:\backups\rivellum\snapshot-$date"
& "C:\Program Files\rivellum\rivellum-node.exe" snapshot create `
--output $output `
--description "Daily automated snapshot"
Schedule via Task Scheduler:
- Trigger: Daily at 2:00 AM
- Action: Run PowerShell script
- Run whether user is logged on or not
Cloud Storage Integration
Upload to AWS S3
#!/bin/bash
# backup_to_s3.sh
SNAPSHOT_DIR="/backups/snapshot-$(date +%Y-%m-%d)"
S3_BUCKET="s3://my-rivellum-backups"
# Create snapshot
rivellum-node snapshot create --output "$SNAPSHOT_DIR"
# Upload to S3
aws s3 sync "$SNAPSHOT_DIR" "$S3_BUCKET/snapshots/$(basename $SNAPSHOT_DIR)"
# Clean up old local snapshots (keep last 7 days)
find /backups -name "snapshot-*" -type d -mtime +7 -exec rm -rf {} \;
Upload to Google Cloud Storage
#!/bin/bash
# backup_to_gcs.sh
SNAPSHOT_DIR="/backups/snapshot-$(date +%Y-%m-%d)"
GCS_BUCKET="gs://my-rivellum-backups"
# Create snapshot
rivellum-node snapshot create --output "$SNAPSHOT_DIR"
# Upload to GCS
gsutil -m rsync -r "$SNAPSHOT_DIR" "$GCS_BUCKET/snapshots/$(basename $SNAPSHOT_DIR)"
Storage Recommendations
Snapshot Size Estimation
- Empty State: ~10 MB
- Small Network (<10k accounts): ~100 MB
- Medium Network (~1M accounts): ~5-10 GB
- Large Network (>10M accounts): ~50-100 GB
Plan for 2-3x growth over 1 year.
Storage Providers
| Provider | Use Case | Cost (approx) |
|---|---|---|
| Local Disk | Fast access, recent snapshots | Hardware cost |
| AWS S3 Standard | Active snapshots (30 days) | $0.023/GB/month |
| AWS S3 Glacier | Long-term archives | $0.004/GB/month |
| GCS Standard | Active snapshots | $0.020/GB/month |
| GCS Nearline | Monthly archives | $0.010/GB/month |
| Backblaze B2 | Budget option | $0.005/GB/month |
Disaster Recovery
Scenarios and Solutions
Scenario 1: Corrupted State Database
Symptoms:
- Node crashes on startup
- RocksDB corruption errors
- Inconsistent state root
Recovery:
# 1. Stop the node (if running)
systemctl stop rivellum-node
# 2. Backup corrupted state (for forensics)
mv /data/rivellum/state.db /data/rivellum/state.db.corrupted
# 3. Restore from most recent snapshot
rivellum-node snapshot restore \
--input /backups/snapshot-2024-01-20
# 4. Replay ledger to catch up (if available)
rivellum-node run --replay-ledger
# 5. Restart node
systemctl start rivellum-node
Scenario 2: Disk Failure
Symptoms:
- Disk I/O errors
- Data directory inaccessible
Recovery:
# 1. Replace failed disk and mount at /data
# 2. Download latest snapshot from cloud storage
aws s3 sync s3://my-backups/snapshot-2024-01-20 /backups/snapshot-2024-01-20
# 3. Restore snapshot
rivellum-node snapshot restore \
--input /backups/snapshot-2024-01-20
# 4. Rejoin network and sync
rivellum-node run --config /etc/rivellum/config.toml
Scenario 3: Accidental Deletion
Symptoms:
- State directory deleted
- Ledger missing
Recovery:
# 1. Check for automatic backups (created during restore)
ls -la /data/rivellum/state_backup_*
# 2. Restore from latest backup
mv /data/rivellum/state_backup_1705766400 /data/rivellum/state.db
# 3. Restart node
systemctl restart rivellum-node
Scenario 4: Wrong Chain Restored
Symptoms:
- Chain ID mismatch errors
- Genesis doesn't match network
Recovery:
# 1. Identify correct snapshot for your chain
rivellum-node snapshot list --dir /backups
# 2. Restore with correct snapshot
rivellum-node snapshot restore --input /backups/mainnet-snapshot-X
# 3. Verify chain ID in config matches
grep chain_id /etc/rivellum/config.toml
# Should output: chain_id = "rivellum-mainnet"
Recovery Time Objectives (RTO)
| Scenario | RTO | Notes |
|---|---|---|
| Corrupted state (local snapshot) | 5-10 minutes | Restore + verify |
| Disk failure (cloud snapshot) | 30-60 minutes | Download + restore |
| Complete node rebuild | 2-4 hours | Install + restore + sync |
Testing Disaster Recovery
Monthly DR Test:
# 1. Create test environment
mkdir -p /tmp/dr-test/data
# 2. Restore production snapshot to test location
rivellum_DATA_DIR=/tmp/dr-test/data \
rivellum-node snapshot restore --input /backups/latest
# 3. Verify state
rivellum_DATA_DIR=/tmp/dr-test/data \
rivellum-node validate-genesis /etc/rivellum/genesis.json
# 4. Clean up
rm -rf /tmp/dr-test
Best Practices
Snapshot Management
-
Label Descriptively - Use meaningful descriptions with height and date
rivellum-node snapshot create \ --output ./snapshot-mainnet-h1000000-2024-01-15 \ --description "Mainnet snapshot at height 1M before upgrade" -
Verify After Creation - Always check metadata after creating a snapshot
cat ./snapshot-*/snapshot_meta.json | jq . -
Test Restores - Periodically test restoration in a non-production environment
-
Automate - Use cron/task scheduler for regular snapshots
Pruning
- Start Conservative - Begin with large
keep_last_entries(10000+) - Monitor Disk Usage - Track disk growth and adjust pruning accordingly
- Keep Snapshots - Always set
require_snapshot = truein production - Log Pruning - Review logs to ensure pruning runs as expected
Storage
-
3-2-1 Rule - 3 copies, 2 different media, 1 offsite
- Copy 1: Local snapshots (fast access)
- Copy 2: Network-attached storage (NAS)
- Copy 3: Cloud storage (S3/GCS)
-
Encrypt Backups - Encrypt snapshots before uploading to cloud
tar -czf - ./snapshot-2024-01-15 | \ gpg --encrypt --recipient ops@rivellum.io > snapshot.tar.gz.gpg -
Version Snapshots - Keep multiple versions for rollback options
Operational
- Document Procedures - Maintain runbooks for common scenarios
- Alert on Failures - Monitor snapshot creation and alert on failures
- Capacity Planning - Estimate storage needs 6-12 months ahead
- Access Control - Restrict snapshot restore to authorized operators
Troubleshooting
Common Issues
"Chain ID mismatch" Error
Problem:
Error: Chain ID mismatch! Snapshot has 'rivellum-testnet' but config expects 'rivellum-mainnet'
Solution:
- Verify you have the correct snapshot for your network
- Use
--no-verifyonly if intentionally switching chains (testing only) - Check
chain_idinconfig/default.toml
"Source state database not found"
Problem:
Error: Source state database not found: /data/rivellum/state.db
Solution:
- Ensure node is stopped before creating snapshot
- Verify
data_dirin config points to correct location - Check if state database was moved or deleted
Snapshot Restoration Hangs
Problem: Restore command appears frozen
Solution:
- Large snapshots take time (10+ GB can take 5-10 minutes)
- Check disk I/O with
iostat -x 1(Linux) or Task Manager (Windows) - Ensure destination has enough free space (2x snapshot size)
Pruning Not Running
Problem: Ledger keeps growing despite pruning enabled
Solution:
- Check logs for pruning errors:
grep -i prune /var/log/rivellum/node.log - Verify
enabled = truein[pruning]config section - Ensure
require_snapshotis satisfied (create a snapshot if needed) - Check that
pruning_interval_secshas elapsed
Debugging Commands
# Check current state database size
du -sh /data/rivellum/state.db
# Check ledger size
wc -l /data/rivellum/ledger.log
# Verify snapshot metadata
cat /backups/snapshot-*/snapshot_meta.json | jq .
# Check disk space
df -h /data
# Monitor pruning in real-time (Linux)
tail -f /var/log/rivellum/node.log | grep -i prune
Advanced Topics
Cross-Chain Snapshots
Snapshots can be used to bootstrap testnets from mainnet state:
# 1. Create mainnet snapshot
rivellum-node --config mainnet.toml snapshot create --output /tmp/mainnet-snap
# 2. Restore to testnet (with verification disabled)
rivellum-node --config testnet.toml snapshot restore \
--input /tmp/mainnet-snap \
--no-verify
# 3. Modify chain_id in metadata if needed
# Edit testnet config to match
Incremental Snapshots (Future Feature)
Planned for future releases: incremental snapshots that only capture state changes since last full snapshot.
Snapshot Compression
To save storage space:
# Create and compress
rivellum-node snapshot create --output /tmp/snapshot
tar -czf snapshot-$(date +%Y-%m-%d).tar.gz /tmp/snapshot
# Restore from compressed
tar -xzf snapshot-2024-01-15.tar.gz
rivellum-node snapshot restore --input ./snapshot-2024-01-15
References
Last Updated: 2024-01-26
Version: 1.0.0