Database

OpenTSDB

Overview

OpenTSDB is a distributed, scalable time-series database built on top of Apache HBase. Developed by StumbleUpon in 2010, it has the capability to efficiently store and process billions of data points. It is optimized for systems that handle large volumes of time-series data such as IoT devices, server monitoring, and application metrics.

Details

Key Features

  • Unlimited Scalability: Leverages HBase's distributed architecture to scale linearly by simply adding nodes
  • High Cardinality: Flexible data identification through tags (key-value pairs)
  • Raw Data Retention: Stores original data without aggregation by default
  • Powerful Query Capabilities: Complex aggregation and filtering via HTTP API or command line
  • Plugin System: Supports custom extensions for authentication, search, real-time distribution, and more
  • Command Line Tools: Comprehensive toolset for data import, querying, UID management, and more

Architecture

OpenTSDB consists of three main components:

  • tCollector: Deployed on each server to periodically collect metrics data
  • TSD (Time Series Daemon): Receives data, stores it in HBase, and handles query processing
  • HBase: Acts as the backend storage system

Data Model

Time-series data is identified by the following elements:

  • Metric Name: The measurement target (e.g., cpu.usage, memory.free)
  • Tags: Key=value pairs for identification (e.g., host=web01, region=us-east)
  • Timestamp: The time of the data point
  • Value: The measured value

For efficient storage, string names are mapped to unique binary IDs (UIDs).

Pros and Cons

Pros

  • Exceptional Scalability: Capable of handling millions of writes per second
  • Mature HBase Ecosystem: Rich operational knowledge and tooling
  • Flexible Tag System: Supports high cardinality
  • Long-term Storage: No limits on data retention period
  • Open Source: Free to use with an active community
  • RESTful API: Easy integration and querying
  • Plugin Support: High customizability

Cons

  • Complex Setup: Requires HBase cluster construction and operation
  • High Hardware Requirements: Minimum 4GB RAM recommended, production requires even more resources
  • High Learning Curve: Knowledge of both HBase and OpenTSDB required
  • Real-time Constraints: Batch-oriented, unsuitable for extremely low-latency use cases
  • No SQL Support: Need to learn proprietary query syntax
  • Less Modern Compared to InfluxDB or TimescaleDB: Design is older compared to newer time-series databases

Reference Pages

Code Examples

Installation and Setup

Prerequisites

# Java 8 or higher required
java -version

# HBase 0.94 or higher required
# ZooKeeper cluster must be set up beforehand

Simple Setup Using Docker

# Start OpenTSDB container
docker run -d \
  --name opentsdb \
  -p 4242:4242 \
  petergrace/opentsdb-docker

# Access web interface
# http://localhost:4242

Configuration File (/etc/opentsdb/opentsdb.conf)

# HBase connection settings
tsd.storage.hbase.zk_quorum = localhost:2181
tsd.storage.hbase.zk_basedir = /hbase

# Port settings
tsd.network.port = 4242

# Automatic creation of new metrics
tsd.core.auto_create_metrics = true

# Enable metadata cache
tsd.core.meta.enable_realtime_ts = true

Basic Operations (Data Insertion and Querying)

Data Insertion (HTTP API)

# Insert single data point
curl -X POST http://localhost:4242/api/put \
  -H "Content-Type: application/json" \
  -d '{
    "metric": "cpu.usage",
    "timestamp": 1609459200,
    "value": 45.2,
    "tags": {
      "host": "web01",
      "region": "us-east"
    }
  }'

# Bulk insertion of multiple data points
curl -X POST http://localhost:4242/api/put \
  -H "Content-Type: application/json" \
  -d '[
    {
      "metric": "memory.usage",
      "timestamp": 1609459200,
      "value": 78.5,
      "tags": {"host": "web01", "type": "physical"}
    },
    {
      "metric": "memory.usage", 
      "timestamp": 1609459260,
      "value": 79.1,
      "tags": {"host": "web01", "type": "physical"}
    }
  ]'

Insertion via Telnet Interface

# Connect with telnet client
telnet localhost 4242

# Data format: put <metric> <timestamp> <value> <tag1=value1> [<tag2=value2>...]
put cpu.usage 1609459200 45.2 host=web01 region=us-east
put memory.free 1609459260 2048 host=web01 type=available

Data Querying

# Basic query
curl "http://localhost:4242/api/query?start=1h-ago&m=avg:cpu.usage{host=web01}"

# Multiple metrics query
curl "http://localhost:4242/api/query?start=1d-ago&m=avg:cpu.usage{host=*}&m=avg:memory.usage{host=*}"

# Query using aggregation functions
curl "http://localhost:4242/api/query?start=6h-ago&m=avg:rate:network.bytes.in{interface=eth0}"

Data Modeling

Effective Tag Design

// Good example: Appropriate tag granularity
{
  "metric": "http.requests",
  "tags": {
    "method": "GET",
    "status": "200", 
    "endpoint": "api",
    "datacenter": "us-east-1"
  }
}

// Example to avoid: Too high cardinality
{
  "metric": "http.requests",
  "tags": {
    "user_id": "12345",  // User IDs could reach millions
    "request_id": "abc123" // Request IDs grow infinitely
  }
}

Metric Naming Conventions

# Hierarchical naming (dot-separated)
system.cpu.usage
system.memory.free
app.response_time
database.connections.active
cache.hits_per_second

# Category prefixes
prod.web.cpu.usage
dev.api.response_time
monitoring.alerts.count

Metrics Collection

Application Integration Example (Java)

import net.opentsdb.client.*;

public class MetricsCollector {
    private OpenTSDBClient client;
    
    public MetricsCollector() {
        this.client = new OpenTSDBClient("localhost", 4242);
    }
    
    public void recordMetric(String metric, double value, Map<String, String> tags) {
        DataPoint point = new DataPoint(metric, System.currentTimeMillis() / 1000, value, tags);
        client.put(point);
    }
    
    // Example: Send CPU usage
    public void sendCpuUsage() {
        Map<String, String> tags = new HashMap<>();
        tags.put("host", "app-server-01");
        tags.put("environment", "production");
        
        double cpuUsage = getCpuUsage(); // Get CPU usage from system
        recordMetric("system.cpu.usage", cpuUsage, tags);
    }
}

Python Client Example

import requests
import time
import json

class OpenTSDBClient:
    def __init__(self, host='localhost', port=4242):
        self.base_url = f"http://{host}:{port}"
    
    def put(self, metric, value, tags, timestamp=None):
        if timestamp is None:
            timestamp = int(time.time())
        
        data = {
            "metric": metric,
            "timestamp": timestamp,
            "value": value,
            "tags": tags
        }
        
        response = requests.post(
            f"{self.base_url}/api/put",
            json=data,
            headers={'Content-Type': 'application/json'}
        )
        return response.status_code == 204

# Usage example
client = OpenTSDBClient()
client.put("temperature", 23.5, {"sensor": "room1", "building": "office"})
client.put("humidity", 65.2, {"sensor": "room1", "building": "office"})

Practical Examples

Building System Monitoring Dashboard

# Grafana integration settings
# Configure OpenTSDB data source
curl -X POST http://admin:admin@localhost:3000/api/datasources \
  -H "Content-Type: application/json" \
  -d '{
    "name": "OpenTSDB",
    "type": "opentsdb",
    "url": "http://localhost:4242",
    "access": "proxy"
  }'

Alert Configuration

# Alerts using Nagios plugin
./check_tsd -H localhost -p 4242 -m cpu.usage -t host=web01 -w 80 -c 90

# Custom alert script
curl "http://localhost:4242/api/query?start=5m-ago&m=avg:cpu.usage{host=web01}" | \
jq '.[] | if .dps | to_entries | .[-1].value > 90 then "CRITICAL: CPU > 90%" else "OK" end'

Best Practices

Performance Optimization

# Configuration file optimization
# Adjust batch size
tsd.storage.flush_interval = 1000

# Enable compression
tsd.storage.enable_compaction = true

# Adjust cache size
tsd.core.meta.cache.enable = true
tsd.core.meta.cache.size = 1000000

Operations Monitoring

# Monitor OpenTSDB itself
curl "http://localhost:4242/api/stats"

# Check HBase status
echo "status" | hbase shell

# Log monitoring
tail -f /var/log/opentsdb/opentsdb.log

Data Retention Policy

# Delete old data (HBase TTL settings)
echo "alter 'tsdb', {NAME => 'f', TTL => 31536000}" | hbase shell  # Retain for 1 year

# Run data compaction
tsdb fsck --fix-duplicates --compact

Scaling Strategy

# Load balancing configuration for multiple TSD instances
# HAProxy configuration example
backend opentsdb_cluster
    balance roundrobin
    server tsd1 192.168.1.10:4242 check
    server tsd2 192.168.1.11:4242 check
    server tsd3 192.168.1.12:4242 check