Database

QuestDB

Overview

QuestDB is a high-performance open-source SQL database specialized for time-series data. It provides compatibility with PostgreSQL Wire Protocol, allowing access through existing PostgreSQL client libraries. With its pure columnar architecture, QuestDB delivers exceptional performance in financial, IoT, and monitoring applications.

Details

Key Features

  • Ultra-fast Ingestion: Supports millions of records per second with InfluxDB Line Protocol
  • PostgreSQL Compatibility: High compatibility with existing tools through PostgreSQL Wire Protocol support
  • Pure Columnar: Columnar storage engine optimized for time-series analytics
  • SQL Extensions: Time-series specific SQL functions (SAMPLE BY, LATEST ON)
  • Zero-copy: Efficient data access through memory-mapped files
  • ZFS Compression: Added in v8.0 (2024), providing 6x storage efficiency and improved query performance

Architecture

  • Schema-less Ingestion: Dynamic table creation and column addition
  • Partitioning: Automatic time-based partitioning
  • Indexing: High-efficiency indexing with SYMBOL data type
  • Parallel Processing: Parallel query execution leveraging multi-core CPUs
  • WAL (Write-Ahead Log): Data durability and replication capabilities

Advantages and Disadvantages

Advantages

  • Exceptional Performance: 10-150x faster ingestion than TimescaleDB, 6.5x faster than InfluxDB
  • PostgreSQL Compatibility: Easy migration from existing PostgreSQL applications
  • Low Learning Curve: Intuitive operations with standard SQL plus time-series specific functions
  • Lightweight Operations: High performance without complex configuration
  • Rich Client Support: Support for Java, Python, Node.js, C#, Go, Rust, and more
  • Real-time Analytics: Ultra-fast queries enable real-time dashboard construction

Disadvantages

  • Limited PostgreSQL Features: Some PostgreSQL-specific features are not supported
  • Transaction Constraints: Limitations on complex transaction patterns
  • Data Type Limitations: Not all PostgreSQL data types are supported
  • Metadata Queries: Some metadata queries for DB tools may not work properly

Key Links

Code Examples

Installation & Setup

# Start with Docker
docker run -p 9000:9000 -p 9009:9009 -p 8812:8812 questdb/questdb

# macOS with Homebrew
brew install questdb
questdb start

# Linux (.tar.gz)
wget https://github.com/questdb/questdb/releases/download/7.4.2/questdb-7.4.2-rt-linux-amd64.tar.gz
tar -xzf questdb-*.tar.gz
cd questdb-*
./questdb.sh start

Basic Operations (Data Insertion & Queries)

-- Create table
CREATE TABLE cpu_metrics (
    timestamp TIMESTAMP,
    cpu_id SYMBOL,
    usage DOUBLE,
    temperature DOUBLE
) timestamp(timestamp) PARTITION BY DAY;

-- Insert data (PostgreSQL Wire Protocol)
INSERT INTO cpu_metrics VALUES 
    (now(), 'cpu-001', 45.2, 68.5),
    (now(), 'cpu-002', 52.1, 71.2);

-- Time-series query (SAMPLE BY)
SELECT timestamp, cpu_id, avg(usage) as avg_usage
FROM cpu_metrics 
WHERE timestamp >= dateadd('h', -24, now())
SAMPLE BY 1h;

-- Get latest values (LATEST ON)
SELECT * FROM cpu_metrics 
LATEST ON timestamp PARTITION BY cpu_id;

Data Modeling

-- IoT sensor data table
CREATE TABLE sensor_readings (
    ts TIMESTAMP,
    sensor_id SYMBOL CAPACITY 10000 CACHE,
    location SYMBOL,
    temperature DOUBLE,
    humidity DOUBLE,
    pressure DOUBLE
) timestamp(ts) PARTITION BY DAY;

-- Financial data model
CREATE TABLE trades (
    timestamp TIMESTAMP,
    symbol SYMBOL,
    price DOUBLE,
    volume LONG,
    side SYMBOL
) timestamp(timestamp) PARTITION BY DAY;

-- Add index
CREATE INDEX trades_symbol_ts ON trades (symbol, timestamp);

Performance Optimization

-- SYMBOL type optimization
CREATE TABLE optimized_table (
    ts TIMESTAMP,
    device SYMBOL CAPACITY 1000000 NOCACHE,  -- For large unique values
    status SYMBOL CACHE,                     -- For few values
    value DOUBLE
) timestamp(ts) PARTITION BY DAY;

-- Batch insertion optimization
INSERT BATCH 10000 INTO metrics 
SELECT * FROM source_table;

-- O3 (Out of Order) configuration
ALTER TABLE metrics SET PARAM o3MaxLag = '10s';

Practical Examples

// Java (PostgreSQL JDBC)
import java.sql.*;
import java.util.Properties;

public class QuestDBExample {
    public static void main(String[] args) throws SQLException {
        String url = "jdbc:postgresql://localhost:8812/qdb";
        Properties props = new Properties();
        props.setProperty("user", "admin");
        props.setProperty("password", "quest");
        
        try (Connection conn = DriverManager.getConnection(url, props)) {
            // Time-series data query
            String query = """
                SELECT ts, symbol, avg(price) as avg_price 
                FROM trades 
                WHERE ts >= dateadd('d', -7, now()) 
                SAMPLE BY 1h
                """;
            
            try (PreparedStatement ps = conn.prepareStatement(query);
                 ResultSet rs = ps.executeQuery()) {
                while (rs.next()) {
                    System.out.printf("Time: %s, Symbol: %s, Price: %.2f%n",
                        rs.getTimestamp("ts"), 
                        rs.getString("symbol"),
                        rs.getDouble("avg_price"));
                }
            }
        }
    }
}
# Python (psycopg3)
import psycopg

async def fetch_metrics():
    async with await psycopg.AsyncConnection.connect(
        host='localhost', port=8812, user='admin', 
        password='quest', dbname='qdb'
    ) as conn:
        async with conn.cursor() as cur:
            await cur.execute("""
                SELECT ts, sensor_id, avg(temperature) as avg_temp
                FROM sensor_readings 
                WHERE ts >= now() - INTERVAL '1 DAY'
                SAMPLE BY 10m
            """)
            async for row in cur:
                print(f"Sensor: {row[1]}, Avg Temp: {row[2]:.1f}°C")

Best Practices

-- 1. Partitioning strategy
-- High-frequency data: daily, low-frequency: monthly
CREATE TABLE high_freq_data (...) PARTITION BY DAY;
CREATE TABLE low_freq_data (...) PARTITION BY MONTH;

-- 2. SYMBOL optimization
-- Set expected unique value count in advance
CREATE TABLE events (
    ts TIMESTAMP,
    event_type SYMBOL CAPACITY 100 CACHE,      -- Few types
    user_id SYMBOL CAPACITY 1000000 NOCACHE,   -- Many users
    data VARCHAR
) timestamp(ts);

-- 3. Query performance
-- Always specify time range in WHERE clause
SELECT * FROM metrics 
WHERE ts BETWEEN '2024-01-01' AND '2024-01-02'
AND sensor_id = 'temp-001';

-- 4. Leverage bulk operations
-- Fast CSV import with COPY statement
COPY trades FROM '/path/to/trades.csv' WITH HEADER true;