SupaScan Architecture
System Overview
SupaScan is built on a high-performance, distributed architecture designed for real-time blockchain data indexing and analytics. The system processes millions of Solana transactions per second and provides sub-second query response times.
graph TB
subgraph "External Services"
SOL[Solana RPC Nodes]
TW[Twitter API]
TG[Telegram API]
REDIS[(Redis Cache)]
end
subgraph "Data Ingestion Layer"
PARSER[Blockchain Parsers]
INDEXER[Indexing Engine]
VALIDATOR[Data Validator]
QUEUE[Message Queue]
end
subgraph "Storage Layer"
CH[(ClickHouse Cluster)]
REPLICA[(Read Replicas)]
BACKUP[(Backup Storage)]
end
subgraph "API Layer"
REST[REST API Server]
SQL[SQL API Gateway]
WEBHOOK[Webhook Service]
AUTH[Auth Service]
end
subgraph "Application Layer"
PORTAL[Web Portal]
SUPAAPPS[SupaApps]
ANALYTICS[Analytics Engine]
ALERTS[Alert System]
end
SOL --> PARSER
PARSER --> INDEXER
INDEXER --> VALIDATOR
VALIDATOR --> QUEUE
QUEUE --> CH
CH --> REPLICA
CH --> BACKUP
REST --> CH
SQL --> CH
WEBHOOK --> CH
PORTAL --> REST
SUPAAPPS --> REST
ANALYTICS --> SQL
ALERTS --> WEBHOOK
TW --> ANALYTICS
TG --> ALERTS
REDIS --> REST
Core Design Principles
1. Real-time Processing
SupaScan is designed for ultra-low latency data processing:
- Stream Processing: Continuous data ingestion from Solana RPC
- Sub-second Indexing: New blocks indexed within 100ms
- Real-time Analytics: Live dashboards and alerts
- Event-driven Architecture: Immediate response to blockchain events
2. Horizontal Scalability
The system scales to handle massive data volumes:
- Distributed ClickHouse: Multi-node cluster for petabyte-scale data
- Microservices: Independent scaling of each component
- Load Balancing: Automatic traffic distribution
- Auto-scaling: Dynamic resource allocation based on load
3. Data Consistency
Ensures data integrity across the entire system:
- ACID Transactions: Critical operations are atomic
- Event Sourcing: Complete audit trail of all changes
- Data Validation: Multi-layer validation before storage
- Conflict Resolution: Handles concurrent updates gracefully
4. High Availability
Built for 99.99% uptime:
- Multi-region Deployment: Geographic redundancy
- Automatic Failover: Seamless switching to backup systems
- Health Monitoring: Continuous system health checks
- Graceful Degradation: Partial functionality during outages
ClickHouse Database Architecture
Cluster Configuration
SupaScan uses a distributed ClickHouse cluster optimized for Solana blockchain data:
graph TB
subgraph "ClickHouse Cluster"
subgraph "Shard 1"
CH1[ClickHouse Node 1]
CH2[ClickHouse Node 2]
end
subgraph "Shard 2"
CH3[ClickHouse Node 3]
CH4[ClickHouse Node 4]
end
subgraph "Shard 3"
CH5[ClickHouse Node 5]
CH6[ClickHouse Node 6]
end
ZK[ZooKeeper Cluster]
LB[Load Balancer]
end
LB --> CH1
LB --> CH2
LB --> CH3
LB --> CH4
LB --> CH5
LB --> CH6
CH1 <--> ZK
CH2 <--> ZK
CH3 <--> ZK
CH4 <--> ZK
CH5 <--> ZK
CH6 <--> ZK
Table Schema Design
Transactions Table
CREATE TABLE transactions (
signature String,
slot UInt64,
block_time UInt64,
fee UInt64,
success UInt8,
accounts Array(String),
instructions Array(String),
logs Array(String),
signer String,
program_id String,
-- Partitioning by slot for optimal performance
date Date MATERIALIZED toDate(block_time)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/transactions', '{replica}')
PARTITION BY toYYYYMM(date)
ORDER BY (slot, signature)
SETTINGS index_granularity = 8192;
Token Transfers Table
CREATE TABLE token_transfers (
signature String,
slot UInt64,
block_time UInt64,
token_mint String,
from_address String,
to_address String,
amount UInt256,
decimals UInt8,
ui_amount Float64,
date Date MATERIALIZED toDate(block_time)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/token_transfers', '{replica}')
PARTITION BY toYYYYMM(date)
ORDER BY (token_mint, slot, signature)
SETTINGS index_granularity = 8192;
Token Swaps Table
CREATE TABLE token_swaps (
signature String,
slot UInt64,
block_time UInt64,
wallet String,
token_in String,
token_out String,
amount_in UInt256,
amount_out UInt256,
price_impact Float64,
fee UInt64,
dex_protocol String,
pool_address String,
date Date MATERIALIZED toDate(block_time)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/token_swaps', '{replica}')
PARTITION BY toYYYYMM(date)
ORDER BY (dex_protocol, slot, signature)
SETTINGS index_granularity = 8192;
ClickHouse Optimizations
1. Data Compression
-- Using LZ4HC compression for optimal balance
ALTER TABLE transactions MODIFY SETTINGS compression = 'lz4hc';
2. Materialized Views for Aggregations
CREATE MATERIALIZED VIEW token_volume_hourly
ENGINE = SummingMergeTree()
ORDER BY (token_mint, hour)
AS SELECT
token_mint,
toStartOfHour(block_time) as hour,
sum(amount_in) as volume_in,
sum(amount_out) as volume_out,
count() as swap_count
FROM token_swaps
GROUP BY token_mint, hour;
3. Sparse Indexes
-- Custom indexes for common query patterns
ALTER TABLE transactions ADD INDEX idx_program_id program_id TYPE bloom_filter GRANULARITY 1;
ALTER TABLE token_transfers ADD INDEX idx_token_mint token_mint TYPE bloom_filter GRANULARITY 1;
4. TTL Policies
-- Automatic data retention
ALTER TABLE transactions MODIFY TTL date + INTERVAL 1 YEAR;
ALTER TABLE token_transfers MODIFY TTL date + INTERVAL 2 YEAR;
Sharding Strategy
By Slot Range
-- Distribute data across shards by slot ranges
-- Shard 1: slots 0-100M
-- Shard 2: slots 100M-200M
-- Shard 3: slots 200M-300M
By Token Mint Hash
-- Distribute token data by hash of mint address
-- Ensures even distribution and co-location of related data
Replication Strategy
3x Replication
- Each shard has 3 replicas for fault tolerance
- Automatic failover on node failure
- Read queries distributed across replicas
- Write queries replicated to all replicas
Consistency Model
- Eventual Consistency: Reads may be slightly behind writes
- Strong Consistency: Critical operations use synchronous replication
- Conflict Resolution: Last-write-wins with timestamp ordering
Component Architecture
classDiagram
class IndexingEngine {
+solanaClient: SolanaClient
+clickhouseClient: ClickHouseClient
+parseBlock()
+indexTransactions()
+validateData()
}
class ClickHouseCluster {
+shards: Array[Shard]
+replicas: Array[Replica]
+zookeeper: ZooKeeper
+executeQuery()
+insertData()
}
class DataValidator {
+validateTransaction()
+validateTokenTransfer()
+validateSwap()
+checkConsistency()
}
class AnalyticsEngine {
+clickhouseClient: ClickHouseClient
+calculateMetrics()
+generateReports()
+detectPatterns()
}
class WebhookService {
+clickhouseClient: ClickHouseClient
+evaluateFilters()
+sendNotification()
+manageSubscriptions()
}
IndexingEngine --> ClickHouseCluster
DataValidator --> ClickHouseCluster
AnalyticsEngine --> ClickHouseCluster
WebhookService --> ClickHouseCluster
Data Flow
Blockchain Indexing Flow
sequenceDiagram
participant Solana
participant Parser
participant Validator
participant Queue
participant ClickHouse
participant Analytics
Solana->>Parser: New block (slot N)
Parser->>Parser: Parse transactions
Parser->>Parser: Extract token transfers
Parser->>Parser: Extract swaps
Parser->>Validator: Validate data
Validator-->>Parser: Data valid
Parser->>Queue: Batch insert
Queue->>ClickHouse: Bulk insert
ClickHouse-->>Queue: Success
Queue->>Analytics: Trigger calculations
Analytics->>ClickHouse: Update materialized views
Real-time Query Flow
sequenceDiagram
participant User
participant API
participant LoadBalancer
participant ClickHouse
participant Cache
participant Response
User->>API: SQL Query
API->>Cache: Check cache
alt Cache hit
Cache-->>API: Cached result
else Cache miss
API->>LoadBalancer: Route query
LoadBalancer->>ClickHouse: Execute query
ClickHouse-->>LoadBalancer: Query result
LoadBalancer-->>API: Result
API->>Cache: Store result
end
API-->>User: Response
Webhook Alert Flow
sequenceDiagram
participant NewToken
participant Indexer
participant FilterEngine
participant WebhookService
participant UserEndpoint
NewToken->>Indexer: Token created
Indexer->>ClickHouse: Store token data
Indexer->>FilterEngine: Check filters
FilterEngine->>ClickHouse: Query active webhooks
ClickHouse-->>FilterEngine: Matching webhooks
FilterEngine->>WebhookService: Trigger alerts
WebhookService->>UserEndpoint: Send webhook
UserEndpoint-->>WebhookService: Acknowledgment
Solana Indexing Process
Block Processing Pipeline
1. Block Fetching
class BlockFetcher {
async fetchBlock(slot: number): Promise<Block> {
// Fetch block from multiple RPC endpoints for redundancy
const rpcEndpoints = this.getHealthyRpcEndpoints();
const block = await this.fetchWithRetry(rpcEndpoints, slot);
return this.validateBlock(block);
}
}
2. Transaction Parsing
class TransactionParser {
parseTransaction(tx: Transaction): ParsedTransaction {
return {
signature: tx.signature,
slot: tx.slot,
blockTime: tx.blockTime,
fee: tx.meta.fee,
success: tx.meta.err === null,
accounts: tx.transaction.message.accountKeys,
instructions: this.parseInstructions(tx.transaction.message.instructions),
logs: tx.meta.logMessages
};
}
}
3. Token Transfer Extraction
class TokenTransferExtractor {
extractTransfers(tx: ParsedTransaction): TokenTransfer[] {
const transfers: TokenTransfer[] = [];
for (const instruction of tx.instructions) {
if (this.isTokenTransfer(instruction)) {
transfers.push({
signature: tx.signature,
slot: tx.slot,
blockTime: tx.blockTime,
tokenMint: this.getTokenMint(instruction),
fromAddress: this.getFromAddress(instruction),
toAddress: this.getToAddress(instruction),
amount: this.getAmount(instruction),
decimals: this.getDecimals(instruction)
});
}
}
return transfers;
}
}
4. DEX Swap Detection
class DEXSwapDetector {
detectSwaps(tx: ParsedTransaction): DEXSwap[] {
const swaps: DEXSwap[] = [];
// Check for Raydium swaps
if (this.isRaydiumSwap(tx)) {
swaps.push(this.parseRaydiumSwap(tx));
}
// Check for Meteora swaps
if (this.isMeteoraSwap(tx)) {
swaps.push(this.parseMeteoraSwap(tx));
}
// Check for PumpFun swaps
if (this.isPumpFunSwap(tx)) {
swaps.push(this.parsePumpFunSwap(tx));
}
return swaps;
}
}
Data Validation
1. Schema Validation
class SchemaValidator {
validateTransaction(tx: ParsedTransaction): ValidationResult {
const errors: string[] = [];
if (!tx.signature || tx.signature.length !== 88) {
errors.push('Invalid signature format');
}
if (tx.slot <= 0) {
errors.push('Invalid slot number');
}
if (tx.fee < 0) {
errors.push('Invalid fee amount');
}
return {
isValid: errors.length === 0,
errors
};
}
}
2. Business Logic Validation
class BusinessValidator {
validateTokenTransfer(transfer: TokenTransfer): ValidationResult {
const errors: string[] = [];
// Check for duplicate transfers
if (await this.isDuplicateTransfer(transfer)) {
errors.push('Duplicate transfer detected');
}
// Validate token mint exists
if (!await this.tokenExists(transfer.tokenMint)) {
errors.push('Token mint does not exist');
}
// Check for suspicious amounts
if (this.isSuspiciousAmount(transfer.amount)) {
errors.push('Suspicious transfer amount');
}
return {
isValid: errors.length === 0,
errors
};
}
}
Performance Optimizations
1. Batch Processing
class BatchProcessor {
private batchSize = 1000;
private batchTimeout = 100; // ms
async processBatch(items: any[]): Promise<void> {
const batches = this.chunkArray(items, this.batchSize);
await Promise.all(
batches.map(batch => this.processBatchChunk(batch))
);
}
}
2. Connection Pooling
class ClickHouseConnectionPool {
private pool: Connection[];
private maxConnections = 100;
async getConnection(): Promise<Connection> {
if (this.pool.length > 0) {
return this.pool.pop()!;
}
return this.createNewConnection();
}
releaseConnection(conn: Connection): void {
if (this.pool.length < this.maxConnections) {
this.pool.push(conn);
} else {
conn.close();
}
}
}
3. Query Optimization
-- Use appropriate indexes
CREATE INDEX idx_token_transfers_mint_time
ON token_transfers (token_mint, block_time);
-- Use materialized views for common aggregations
CREATE MATERIALIZED VIEW daily_token_volume
ENGINE = SummingMergeTree()
ORDER BY (token_mint, date)
AS SELECT
token_mint,
toDate(block_time) as date,
sum(amount) as daily_volume
FROM token_transfers
GROUP BY token_mint, date;
-- Use projection for specific query patterns
ALTER TABLE token_swaps ADD PROJECTION projection_wallet_swaps
(
SELECT wallet, dex_protocol, sum(amount_in) as total_volume
GROUP BY wallet, dex_protocol
);
SupaScan Module Structure
supascan/
├── src/
│ ├── index.ts # Main application entry
│ ├── config/
│ │ ├── clickhouse.ts # ClickHouse configuration
│ │ ├── solana.ts # Solana RPC configuration
│ │ └── redis.ts # Redis configuration
│ ├── services/
│ │ ├── indexing/
│ │ │ ├── blockFetcher.ts # Block fetching service
│ │ │ ├── transactionParser.ts # Transaction parsing
│ │ │ ├── tokenExtractor.ts # Token transfer extraction
│ │ │ └── swapDetector.ts # DEX swap detection
│ │ ├── database/
│ │ │ ├── clickhouse.ts # ClickHouse client
│ │ │ ├── connectionPool.ts # Connection pooling
│ │ │ └── migrations.ts # Schema migrations
│ │ ├── analytics/
│ │ │ ├── metricsCalculator.ts # Real-time metrics
│ │ │ ├── patternDetector.ts # Pattern detection
│ │ │ └── reportGenerator.ts # Report generation
│ │ └── webhooks/
│ │ ├── filterEngine.ts # Webhook filtering
│ │ ├── notificationService.ts # Notification delivery
│ │ └── subscriptionManager.ts # Webhook management
│ ├── api/
│ │ ├── rest/
│ │ │ ├── transactions.ts # Transaction endpoints
│ │ │ ├── tokens.ts # Token endpoints
│ │ │ └── wallets.ts # Wallet endpoints
│ │ ├── sql/
│ │ │ ├── queryExecutor.ts # SQL query execution
│ │ │ └── queryValidator.ts # Query validation
│ │ └── webhooks/
│ │ ├── alertManager.ts # Alert management
│ │ └── payloadBuilder.ts # Payload construction
│ ├── supapps/
│ │ ├── walletProfiler.ts # Wallet analysis app
│ │ ├── kolWatch.ts # Influencer tracking
│ │ ├── pnlDetective.ts # P&L analysis
│ │ ├── rebirthMonitor.ts # Token revival detection
│ │ ├── farmerClassifier.ts # Airdrop farmer detection
│ │ └── apeDetector.ts # Early meme coin detection
│ ├── utils/
│ │ ├── validators.ts # Data validation
│ │ ├── formatters.ts # Data formatting
│ │ ├── retry.ts # Retry mechanisms
│ │ └── logger.ts # Logging configuration
│ └── types/
│ ├── blockchain.ts # Blockchain data types
│ ├── clickhouse.ts # ClickHouse types
│ └── api.ts # API types
├── migrations/
│ ├── 001_create_transactions.sql
│ ├── 002_create_token_transfers.sql
│ ├── 003_create_token_swaps.sql
│ └── 004_create_materialized_views.sql
├── docker/
│ ├── clickhouse/
│ │ ├── config.xml
│ │ └── users.xml
│ ├── redis/
│ │ └── redis.conf
│ └── docker-compose.yml
└── docs/
├── api/
├── clickhouse/
└── deployment/
Key Design Patterns
1. Event Sourcing Pattern
Used for maintaining complete audit trail:
class EventStore {
async appendEvent(streamId: string, event: DomainEvent): Promise<void> {
await this.clickhouse.insert('events', {
stream_id: streamId,
event_type: event.type,
event_data: JSON.stringify(event.data),
timestamp: Date.now(),
version: await this.getNextVersion(streamId)
});
}
}
2. CQRS Pattern
Separate read and write models:
// Write side - optimized for indexing
class TransactionWriteModel {
async indexTransaction(tx: Transaction): Promise<void> {
await this.clickhouse.insert('transactions', tx);
}
}
// Read side - optimized for queries
class TransactionReadModel {
async getTransactionsByWallet(wallet: string): Promise<Transaction[]> {
return this.clickhouse.query(`
SELECT * FROM transactions
WHERE has(accounts, '${wallet}')
ORDER BY slot DESC
`);
}
}
3. Repository Pattern
Abstract data access:
interface TransactionRepository {
save(transaction: Transaction): Promise<void>;
findBySignature(signature: string): Promise<Transaction | null>;
findByWallet(wallet: string, limit: number): Promise<Transaction[]>;
}
class ClickHouseTransactionRepository implements TransactionRepository {
constructor(private clickhouse: ClickHouseClient) {}
async save(transaction: Transaction): Promise<void> {
await this.clickhouse.insert('transactions', transaction);
}
}
4. Observer Pattern
Used for real-time notifications:
class TokenCreationObserver {
private subscribers: TokenCreationSubscriber[] = [];
subscribe(subscriber: TokenCreationSubscriber): void {
this.subscribers.push(subscriber);
}
notify(tokenCreation: TokenCreation): void {
this.subscribers.forEach(sub => sub.onTokenCreated(tokenCreation));
}
}
Performance Optimizations
ClickHouse Specific Optimizations
1. Data Compression
-- Use LZ4HC for optimal compression ratio
ALTER TABLE transactions MODIFY SETTINGS compression = 'lz4hc';
-- Use ZSTD for maximum compression
ALTER TABLE historical_data MODIFY SETTINGS compression = 'zstd';
2. Partitioning Strategy
-- Partition by month for optimal query performance
PARTITION BY toYYYYMM(date)
-- Partition by token mint hash for even distribution
PARTITION BY cityHash64(token_mint) % 12
3. Materialized Views
-- Pre-aggregated data for common queries
CREATE MATERIALIZED VIEW hourly_token_volume
ENGINE = SummingMergeTree()
ORDER BY (token_mint, hour)
AS SELECT
token_mint,
toStartOfHour(block_time) as hour,
sum(amount) as volume,
count() as transfer_count
FROM token_transfers
GROUP BY token_mint, hour;
4. Sparse Indexes
-- Bloom filter indexes for high-cardinality columns
ALTER TABLE transactions ADD INDEX idx_program_id
program_id TYPE bloom_filter GRANULARITY 1;
-- MinMax indexes for range queries
ALTER TABLE token_transfers ADD INDEX idx_amount
amount TYPE minmax GRANULARITY 1;
Caching Strategy
1. Multi-Level Caching
class CacheManager {
private l1Cache = new Map(); // In-memory cache
private l2Cache: Redis; // Distributed cache
private l3Cache: ClickHouse; // Database cache
async get<T>(key: string): Promise<T | null> {
// L1 Cache (fastest)
if (this.l1Cache.has(key)) {
return this.l1Cache.get(key);
}
// L2 Cache (fast)
const l2Value = await this.l2Cache.get(key);
if (l2Value) {
this.l1Cache.set(key, l2Value);
return l2Value;
}
// L3 Cache (slower but persistent)
const l3Value = await this.l3Cache.query(`SELECT * FROM cache WHERE key = '${key}'`);
if (l3Value) {
await this.l2Cache.set(key, l3Value, 3600); // 1 hour TTL
this.l1Cache.set(key, l3Value);
return l3Value;
}
return null;
}
}
2. Cache Warming
class CacheWarmer {
async warmPopularQueries(): Promise<void> {
const popularQueries = [
'SELECT * FROM token_swaps WHERE dex_protocol = "raydium" ORDER BY block_time DESC LIMIT 100',
'SELECT token_mint, sum(amount) as volume FROM token_transfers WHERE block_time > now() - INTERVAL 24 HOUR GROUP BY token_mint ORDER BY volume DESC LIMIT 50'
];
await Promise.all(
popularQueries.map(query => this.cacheManager.get(query))
);
}
}
Security Architecture
Data Access Control
class AccessControl {
async checkPermission(userId: string, resource: string, action: string): Promise<boolean> {
const user = await this.getUser(userId);
const permissions = await this.getUserPermissions(userId);
return permissions.some(p =>
p.resource === resource &&
p.actions.includes(action) &&
p.expiresAt > Date.now()
);
}
}
Query Sanitization
class QuerySanitizer {
sanitizeQuery(query: string): string {
// Remove dangerous SQL keywords
const dangerousKeywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER'];
let sanitized = query;
dangerousKeywords.forEach(keyword => {
const regex = new RegExp(`\\b${keyword}\\b`, 'gi');
sanitized = sanitized.replace(regex, '');
});
return sanitized;
}
}
Rate Limiting
class RateLimiter {
private limits = new Map<string, { count: number, resetTime: number }>();
async checkLimit(userId: string, limit: number, windowMs: number): Promise<boolean> {
const now = Date.now();
const userLimit = this.limits.get(userId);
if (!userLimit || now > userLimit.resetTime) {
this.limits.set(userId, { count: 1, resetTime: now + windowMs });
return true;
}
if (userLimit.count >= limit) {
return false;
}
userLimit.count++;
return true;
}
}
Monitoring and Observability
Health Checks
class HealthChecker {
async checkClickHouse(): Promise<HealthStatus> {
try {
await this.clickhouse.query('SELECT 1');
return { status: 'healthy', latency: Date.now() - start };
} catch (error) {
return { status: 'unhealthy', error: error.message };
}
}
async checkSolanaRPC(): Promise<HealthStatus> {
try {
const slot = await this.solana.getSlot();
return { status: 'healthy', currentSlot: slot };
} catch (error) {
return { status: 'unhealthy', error: error.message };
}
}
}
Metrics Collection
class MetricsCollector {
private metrics = new Map<string, number>();
incrementCounter(name: string, value: number = 1): void {
this.metrics.set(name, (this.metrics.get(name) || 0) + value);
}
recordHistogram(name: string, value: number): void {
// Record histogram data for latency, size, etc.
}
getMetrics(): Record<string, number> {
return Object.fromEntries(this.metrics);
}
}