Database Design
Overview
SupaScan uses ClickHouse as its primary analytical database, chosen for its exceptional performance with time-series data, columnar storage, and real-time analytics capabilities. The system leverages multiple data sources including RPC nodes, archival nodes, and custom indexing to provide comprehensive Solana blockchain analysis.
Architecture Overview
Data Sources and Storage Strategy
graph TD
A[Solana RPC Nodes] --> B[Real-time Indexer]
C[Archival Nodes] --> D[Historical Data Processor]
E[Custom Parsers] --> F[Specialized Data Extractor]
B --> G[ClickHouse Cluster]
D --> G
F --> G
G --> H[Hot Storage - Recent Data]
G --> I[Warm Storage - Historical Data]
G --> J[Cold Storage - Archived Data]
H --> K[Real-time Analytics]
I --> L[Historical Analysis]
J --> M[Long-term Storage]
ClickHouse Cluster Configuration
graph TD
A[Load Balancer] --> B[ClickHouse Shard 1]
A --> C[ClickHouse Shard 2]
A --> D[ClickHouse Shard 3]
B --> E[Replica 1.1]
B --> F[Replica 1.2]
C --> G[Replica 2.1]
C --> H[Replica 2.2]
D --> I[Replica 3.1]
D --> J[Replica 3.2]
K[ZooKeeper] --> B
K --> C
K --> D
Schema Design
Core Tables Structure
erDiagram
TRANSACTIONS ||--o{ TOKEN_TRANSFERS : contains
TRANSACTIONS ||--o{ TOKEN_SWAPS : contains
TOKENS ||--o{ TOKEN_TRANSFERS : involved_in
TOKENS ||--o{ TOKEN_SWAPS : traded_in
WALLETS ||--o{ TRANSACTIONS : participates_in
DEX_POOLS ||--o{ TOKEN_SWAPS : facilitates
TRANSACTIONS {
string signature PK
uint64 slot
uint64 block_time
uint64 fee
uint8 success
array accounts
array instructions
array logs
string signer
string program_id
}
TOKEN_TRANSFERS {
string signature FK
string token_mint
string from_address
string to_address
uint256 amount
uint8 decimals
float64 ui_amount
}
TOKEN_SWAPS {
string signature FK
string dex_protocol
string token_in
string token_out
uint256 amount_in
uint256 amount_out
float64 price_impact
}
TOKENS {
string mint_address PK
string symbol
string name
uint8 decimals
uint256 total_supply
string metadata_uri
}
WALLETS {
string address PK
uint64 first_seen
uint64 last_activity
uint64 transaction_count
float64 total_volume
}
Table Definitions
1. transactions
Core table storing all Solana transactions with optimized partitioning.
CREATE TABLE transactions (
signature String,
slot UInt64,
block_time UInt64,
fee UInt64,
success UInt8,
accounts Array(String),
instructions Array(String),
logs Array(String),
signer String,
program_id String,
compute_units_consumed UInt32,
-- Partitioning by slot for optimal performance
date Date MATERIALIZED toDate(block_time)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/transactions', '{replica}')
PARTITION BY toYYYYMM(date)
ORDER BY (slot, signature)
SETTINGS index_granularity = 8192;
Purpose: Stores all Solana transactions with full details.
Key Points:
- Partitioned by month for efficient time-based queries
- Ordered by slot for optimal range scans
- Replicated across cluster for high availability
2. token_transfers
SPL token transfer events with decimal handling.
CREATE TABLE token_transfers (
signature String,
slot UInt64,
block_time UInt64,
token_mint String,
from_address String,
to_address String,
amount UInt256,
decimals UInt8,
ui_amount Float64,
date Date MATERIALIZED toDate(block_time)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/token_transfers', '{replica}')
PARTITION BY toYYYYMM(date)
ORDER BY (token_mint, block_time, signature)
SETTINGS index_granularity = 8192;
Purpose: Tracks all SPL token transfers with proper decimal handling.
3. token_swaps
DEX swap events across multiple protocols.
CREATE TABLE token_swaps (
signature String,
slot UInt64,
block_time UInt64,
dex_protocol String,
token_in String,
token_out String,
amount_in UInt256,
amount_out UInt256,
price_impact Float64,
pool_address String,
date Date MATERIALIZED toDate(block_time)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/token_swaps', '{replica}')
PARTITION BY toYYYYMM(date)
ORDER BY (dex_protocol, block_time, signature)
SETTINGS index_granularity = 8192;
Purpose: Tracks DEX swaps from Raydium, Meteora, PumpFun, etc.
4. tokens
Token metadata and supply information.
CREATE TABLE tokens (
mint_address String,
symbol String,
name String,
decimals UInt8,
total_supply UInt256,
metadata_uri String,
created_at UInt64,
creator String,
is_verified UInt8 DEFAULT 0
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/tokens', '{replica}')
ORDER BY mint_address
SETTINGS index_granularity = 8192;
Purpose: Token registry with metadata and verification status.
5. wallets
Wallet activity and statistics.
CREATE TABLE wallets (
address String,
first_seen UInt64,
last_activity UInt64,
transaction_count UInt64,
total_volume Float64,
unique_tokens UInt32,
is_contract UInt8 DEFAULT 0
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/wallets', '{replica}')
ORDER BY address
SETTINGS index_granularity = 8192;
Purpose: Wallet profiles and activity metrics.
6. webhook_subscriptions
User webhook configurations.
CREATE TABLE webhook_subscriptions (
webhook_id String,
user_id String,
webhook_url String,
filters String, -- JSON filters
status String,
created_at DateTime DEFAULT now(),
last_triggered DateTime
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/webhooks', '{replica}')
ORDER BY (user_id, created_at)
SETTINGS index_granularity = 8192;
Purpose: Manages user webhook subscriptions and filters.
Data Sources and Integration
RPC Node Strategy
SupaScan uses multiple RPC endpoints for redundancy and performance:
const rpcConfig = {
primary: 'https://api.mainnet-beta.solana.com',
fallbacks: [
'https://solana-api.projectserum.com',
'https://rpc.ankr.com/solana',
'https://solana-mainnet.g.alchemy.com/v2/demo'
],
archival: [
'https://api.mainnet-beta.solana.com', // Full historical data
'https://solana-mainnet.phantom.app' // Extended history
]
};
Key Benefits:
- Load Distribution: Spread requests across multiple providers
- Fault Tolerance: Automatic failover if primary fails
- Rate Limit Management: Distribute load to avoid limits
- Historical Access: Dedicated archival nodes for old data
Data Ingestion Pipeline
sequenceDiagram
participant RPC as RPC Nodes
participant Parser as Data Parser
participant Validator as Data Validator
participant ClickHouse as ClickHouse Cluster
participant Cache as Redis Cache
RPC->>Parser: Raw block data
Parser->>Parser: Extract transactions
Parser->>Parser: Parse token transfers
Parser->>Parser: Detect DEX swaps
Parser->>Validator: Structured data
Validator->>Validator: Validate schema
Validator->>Validator: Check business logic
Validator->>ClickHouse: Insert batch
ClickHouse->>Cache: Update cache
ClickHouse Optimizations
Compression and Storage
-- Enable advanced compression
ALTER TABLE transactions MODIFY SETTING compression = 'lz4hc';
ALTER TABLE token_transfers MODIFY SETTING compression = 'zstd';
-- Optimize for time-series queries
ALTER TABLE transactions MODIFY SETTING
min_bytes_for_wide_part = 100000000,
min_rows_for_wide_part = 1000000;
Materialized Views for Analytics
-- Real-time token volume aggregation
CREATE MATERIALIZED VIEW token_volume_1h
ENGINE = SummingMergeTree()
ORDER BY (token_mint, hour)
AS SELECT
token_mint,
toStartOfHour(toDateTime(block_time)) as hour,
sum(ui_amount) as volume_usd,
count() as transfer_count
FROM token_transfers
GROUP BY token_mint, hour;
-- Top wallets by activity
CREATE MATERIALIZED VIEW top_wallets_daily
ENGINE = ReplacingMergeTree()
ORDER BY (date, address)
AS SELECT
toDate(block_time) as date,
signer as address,
count() as tx_count,
sum(fee) as total_fees
FROM transactions
GROUP BY date, address;
Sparse Indexes for Performance
-- Bloom filter for token lookups
ALTER TABLE token_transfers ADD INDEX idx_token_mint token_mint TYPE bloom_filter GRANULARITY 1;
-- MinMax index for time ranges
ALTER TABLE transactions ADD INDEX idx_block_time block_time TYPE minmax GRANULARITY 1;
-- Set index for array columns
ALTER TABLE transactions ADD INDEX idx_accounts accounts TYPE set(100) GRANULARITY 1;
Query Patterns and Performance
Common Analytics Queries
-- Most liquid tokens (24h)
SELECT
token_mint,
sum(ui_amount) as volume_24h,
count() as transfer_count
FROM token_transfers
WHERE block_time > now() - INTERVAL 24 HOUR
GROUP BY token_mint
ORDER BY volume_24h DESC
LIMIT 100;
-- Wallet trading patterns
SELECT
signer,
count() as tx_count,
countIf(success = 1) as successful_txs,
sum(fee) as total_fees,
uniqExact(program_id) as unique_programs
FROM transactions
WHERE block_time > now() - INTERVAL 7 DAY
GROUP BY signer
HAVING tx_count > 100
ORDER BY tx_count DESC;
Time-Series Optimizations
-- Efficient time-based queries using partitioning
SELECT
toStartOfHour(toDateTime(block_time)) as hour,
count() as tx_count,
sum(fee) as total_fees
FROM transactions
WHERE date >= today() - 7 -- Uses partition pruning
GROUP BY hour
ORDER BY hour;
-- Token price impact analysis
SELECT
dex_protocol,
avg(price_impact) as avg_impact,
quantile(0.95)(price_impact) as p95_impact
FROM token_swaps
WHERE block_time > now() - INTERVAL 24 HOUR
GROUP BY dex_protocol;
Data Retention and Archival
TTL Policies
-- Automatic data retention
ALTER TABLE transactions MODIFY TTL block_time + INTERVAL 2 YEAR;
ALTER TABLE token_transfers MODIFY TTL block_time + INTERVAL 1 YEAR;
ALTER TABLE token_swaps MODIFY TTL block_time + INTERVAL 6 MONTH;
-- Move old data to cold storage
ALTER TABLE transactions MODIFY TTL
block_time + INTERVAL 1 YEAR TO VOLUME 'cold_storage';
Sharding Strategy
-- Shard by slot range for parallel processing
CREATE TABLE transactions_shard_1 AS transactions
ENGINE = Distributed('cluster', 'supascan', 'transactions', slot % 3 = 0);
CREATE TABLE transactions_shard_2 AS transactions
ENGINE = Distributed('cluster', 'supascan', 'transactions', slot % 3 = 1);
CREATE TABLE transactions_shard_3 AS transactions
ENGINE = Distributed('cluster', 'supascan', 'transactions', slot % 3 = 2);
Monitoring and Maintenance
Health Checks
-- Monitor cluster health
SELECT
hostName(),
uptime(),
version(),
memory_usage,
disk_space_used
FROM system.clusters
WHERE cluster = 'supascan_cluster';
-- Check replication lag
SELECT
table,
replica_name,
is_leader,
log_pointer,
absolute_delay
FROM system.replicas
WHERE database = 'supascan';
Performance Monitoring
-- Query performance analysis
SELECT
query,
query_duration_ms,
read_rows,
read_bytes,
memory_usage
FROM system.query_log
WHERE event_date = today()
ORDER BY query_duration_ms DESC
LIMIT 10;
-- Table size monitoring
SELECT
table,
formatReadableSize(sum(bytes)) as size,
count() as parts
FROM system.parts
WHERE database = 'supascan'
GROUP BY table
ORDER BY sum(bytes) DESC;
Best Practices
1. Use Appropriate Data Types
UInt64for slots and timestampsUInt256for large token amountsStringfor addresses and signaturesArray(String)for account lists
2. Optimize for Time-Series
- Always partition by date
- Use time-based ORDER BY clauses
- Leverage partition pruning in WHERE clauses
3. Batch Operations
- Insert data in batches of 1000-10000 rows
- Use async inserts for better throughput
- Compress data before transmission
4. Monitor Resource Usage
- Track memory consumption per query
- Monitor disk I/O patterns
- Set appropriate limits for concurrent queries
5. Backup and Recovery
- Regular automated backups
- Test recovery procedures
- Maintain multiple backup copies