Technology Catalog | Developer's Catalog

Database

InfluxDB

Overview

InfluxDB is an open-source NoSQL database specialized for time series data. Designed as a scalable datastore for metrics, events, and real-time analytics, it efficiently stores and queries time series data from IoT sensors, application monitoring, business metrics, and more.

Details

InfluxDB was developed by InfluxData in 2013. Optimized for time series data characteristics (time-ordered, high write frequency, range query-centric), it serves as the core component of the TICK stack (Telegraf, InfluxDB, Chronograf, Kapacitor). It provides SQL-like query language "Flux" and HTTP APIs, achieving high performance.

Key features of InfluxDB:

Purpose-built for time series data
High-speed write and read performance
SQL-like query language (Flux)
Schema-less design
Automatic data retention management
High-precision timestamps
Tag and field data model
Horizontal scaling (cluster edition)
Real-time aggregation and downsampling
RESTful HTTP API

Advantages and Disadvantages

Advantages

High performance: Optimized high-speed writes and reads for time series data
Ease of use: SQL-like query language with low learning curve
Automatic management: Automated data retention and downsampling
Rich functionality: Statistical functions, time window aggregation, forecasting
Ecosystem: Complete solution with TICK stack
APIs: Language-agnostic access via HTTP API
Visualization: High compatibility with visualization tools like Grafana

Disadvantages

Specialized use: Not suitable for non-time series data
Memory usage: Can consume large amounts of memory
Complexity: Complex configuration for advanced features
Learning curve: Need to learn Flux query language
Licensing: Enterprise features are commercial

Key Links

Code Examples

Installation & Setup

# Run with Docker (recommended)
docker run -d --name influxdb \
  -p 8086:8086 \
  -v influxdb-storage:/var/lib/influxdb2 \
  -e DOCKER_INFLUXDB_INIT_MODE=setup \
  -e DOCKER_INFLUXDB_INIT_USERNAME=admin \
  -e DOCKER_INFLUXDB_INIT_PASSWORD=password123 \
  -e DOCKER_INFLUXDB_INIT_ORG=myorg \
  -e DOCKER_INFLUXDB_INIT_BUCKET=mybucket \
  influxdb:2.7

# Ubuntu/Debian
wget -qO- https://repos.influxdata.com/influxdata-archive_compat.key | sudo apt-key add -
echo "deb https://repos.influxdata.com/ubuntu stable main" | sudo tee /etc/apt/sources.list.d/influxdb.list
sudo apt update && sudo apt install influxdb2

# Red Hat/CentOS
cat > /etc/yum.repos.d/influxdb.repo << EOF
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key
EOF
sudo yum install influxdb2

# macOS (Homebrew)
brew install influxdb

# Start service
sudo systemctl start influxdb
sudo systemctl enable influxdb

# Initial setup
influx setup

Basic Operations (HTTP API)

# Create organization and bucket
curl -X POST "http://localhost:8086/api/v2/orgs" \
  -H "Authorization: Token YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "myorg"
  }'

curl -X POST "http://localhost:8086/api/v2/buckets" \
  -H "Authorization: Token YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "sensors",
    "orgID": "YOUR_ORG_ID",
    "retentionRules": [
      {
        "type": "expire",
        "everySeconds": 2592000
      }
    ]
  }'

# Write data (Line Protocol)
curl -X POST "http://localhost:8086/api/v2/write?org=myorg&bucket=sensors" \
  -H "Authorization: Token YOUR_TOKEN" \
  -H "Content-Type: text/plain" \
  -d 'temperature,location=room1,sensor=DHT22 value=23.5 1640995200000000000
humidity,location=room1,sensor=DHT22 value=65.2 1640995200000000000
cpu_usage,host=server1,region=us-east value=85.3 1640995260000000000'

# Read data (Flux query)
curl -X POST "http://localhost:8086/api/v2/query?org=myorg" \
  -H "Authorization: Token YOUR_TOKEN" \
  -H "Content-Type: application/vnd.flux" \
  -d 'from(bucket: "sensors")
    |> range(start: -1h)
    |> filter(fn: (r) => r._measurement == "temperature")
    |> filter(fn: (r) => r.location == "room1")'

CLI Operations

# Configure InfluxDB CLI
influx config create \
  --config-name myconfig \
  --host-url http://localhost:8086 \
  --org myorg \
  --token YOUR_TOKEN \
  --active

# Write data
influx write \
  --bucket sensors \
  --precision s \
  'temperature,location=room2 value=24.1 1640995320'

# Write data from file
cat > data.txt << EOF
temperature,location=room1 value=23.5 1640995200
temperature,location=room2 value=24.1 1640995260
humidity,location=room1 value=65.2 1640995200
humidity,location=room2 value=67.8 1640995260
EOF

influx write --bucket sensors --file data.txt

# Execute Flux query
influx query 'from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "temperature")
  |> mean()'

# List buckets
influx bucket list

# List organizations
influx org list

Flux Query Language

// Basic range query
from(bucket: "sensors")
  |> range(start: -24h)
  |> filter(fn: (r) => r._measurement == "temperature")
  |> filter(fn: (r) => r.location == "room1")

// Aggregation and grouping
from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "cpu_usage")
  |> group(columns: ["host"])
  |> mean()

// Time window aggregation
from(bucket: "sensors")
  |> range(start: -6h)
  |> filter(fn: (r) => r._measurement == "temperature")
  |> aggregateWindow(every: 10m, fn: mean)

// Join multiple measurements
temp = from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "temperature")

humidity = from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "humidity")

join(tables: {temp: temp, humidity: humidity}, on: ["_time", "location"])

// Statistical functions
from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "cpu_usage")
  |> group(columns: ["host"])
  |> aggregateWindow(every: 5m, fn: mean)
  |> percentile(percentile: 0.95)

// Data transformation
from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "temperature")
  |> map(fn: (r) => ({
      r with
      _value: (r._value * 9.0 / 5.0) + 32.0,
      unit: "°F"
    }))

Data Retention and Downsampling

# Task for automatic downsampling
influx task create --file - << EOF
option task = {name: "downsample-5m", every: 1h}

from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "temperature")
  |> aggregateWindow(every: 5m, fn: mean)
  |> to(bucket: "sensors_5m")
EOF

# Set data retention period
influx bucket update \
  --id YOUR_BUCKET_ID \
  --retention 720h  # 30 days

# Delete old data
influx delete \
  --bucket sensors \
  --start 2023-01-01T00:00:00Z \
  --stop 2023-01-31T23:59:59Z \
  --predicate '_measurement="old_data"'

Practical Examples

// IoT sensor monitoring
from(bucket: "iot")
  |> range(start: -15m)
  |> filter(fn: (r) => r._measurement == "sensor_data")
  |> filter(fn: (r) => r._field == "temperature")
  |> group(columns: ["device_id"])
  |> aggregateWindow(every: 1m, fn: last)
  |> map(fn: (r) => ({
      r with
      alert: if r._value > 30.0 then "HIGH" else "NORMAL"
    }))

// Application performance monitoring
from(bucket: "metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "http_requests")
  |> filter(fn: (r) => r._field == "response_time")
  |> group(columns: ["endpoint", "status_code"])
  |> aggregateWindow(every: 5m, fn: mean)
  |> filter(fn: (r) => r._value > 1000.0)  // Response time > 1 second

// Business metrics analysis
sales = from(bucket: "business")
  |> range(start: -30d)
  |> filter(fn: (r) => r._measurement == "sales")
  |> filter(fn: (r) => r._field == "amount")
  |> aggregateWindow(every: 1d, fn: sum)

revenue = from(bucket: "business")
  |> range(start: -30d)
  |> filter(fn: (r) => r._measurement == "revenue")
  |> filter(fn: (r) => r._field == "total")
  |> aggregateWindow(every: 1d, fn: sum)

join(tables: {sales: sales, revenue: revenue}, on: ["_time"])
  |> map(fn: (r) => ({
      _time: r._time,
      avg_order_value: r.revenue_total / r.sales_amount
    }))

// Forecasting
from(bucket: "sensors")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "energy_consumption")
  |> aggregateWindow(every: 1h, fn: mean)
  |> holtWinters(n: 24, seasonality: 24)  // 24-hour forecast

Python Client

from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS
import datetime

# Client connection
client = InfluxDBClient(
    url="http://localhost:8086",
    token="YOUR_TOKEN",
    org="myorg"
)

# Write data
write_api = client.write_api(write_options=SYNCHRONOUS)

# Create and write point
point = Point("temperature") \
    .tag("location", "room1") \
    .tag("sensor", "DHT22") \
    .field("value", 23.5) \
    .time(datetime.datetime.utcnow())

write_api.write(bucket="sensors", record=point)

# Bulk write
points = []
for i in range(100):
    point = Point("cpu_usage") \
        .tag("host", f"server{i%5}") \
        .field("value", 50 + i % 40) \
        .time(datetime.datetime.utcnow() - datetime.timedelta(minutes=i))
    points.append(point)

write_api.write(bucket="metrics", record=points)

# Read data
query_api = client.query_api()
query = '''
from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "temperature")
  |> filter(fn: (r) => r.location == "room1")
'''

result = query_api.query(query)

for table in result:
    for record in table.records:
        print(f"Time: {record.get_time()}, Value: {record.get_value()}")

# Convert to pandas DataFrame
df = query_api.query_data_frame(query)
print(df.head())

# Close client
client.close()

Configuration & Optimization

# influxdb.conf key settings
[http]
bind-address = ":8086"
auth-enabled = true

[meta]
dir = "/var/lib/influxdb/meta"
retention-autocreate = true

[data]
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
series-id-set-cache-size = 100

[cluster]
shard-writer-timeout = "5s"
write-timeout = "10s"

[retention]
enabled = true
check-interval = "30m"

[shard-precreation]
enabled = true
check-interval = "10m"
advance-period = "30m"

[monitor]
store-enabled = true
store-database = "_internal"

[admin]
enabled = true
bind-address = ":8083"

[subscriber]
enabled = true
http-timeout = "30s"

[continuous_queries]
enabled = true
log-enabled = true
run-interval = "1s"

Monitoring and Maintenance

# System statistics
influx query 'from(bucket: "_monitoring")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "influxdb_database")
  |> last()'

# Performance monitoring
curl "http://localhost:8086/metrics"

# Backup
influx backup /path/to/backup

# Restore
influx restore /path/to/backup

# Data integrity check
influx inspect verify-seriesfile /var/lib/influxdb/data

# TSM file information
influx inspect dump-tsm /var/lib/influxdb/data/mydb/autogen/1/000000001-000000001.tsm