Grafana

最も人気の高いオープンソース監視・可視化プラットフォーム。動的ダッシュボード、機械学習統合、多様なデータソース対応が特徴。Gartnerマジッククアドラントでリーダーに認定。

監視サーバーデータ可視化ダッシュボード監視プラットフォームオープンソースPrometheus連携時系列データ

監視サーバー

Grafana

概要

Grafanaは最も人気の高いオープンソース監視・可視化プラットフォームです。動的ダッシュボード、機械学習統合、多様なデータソース対応が特徴で、Gartnerマジッククアドラントでリーダーに認定されています。採用率94%で圧倒的なシェアを誇り、AI搭載インサイト、observability as codeで進化を続ける次世代監視プラットフォームです。

詳細

Grafanaは2014年にTorkel Ödegaardによって開発が開始され、現在では採用率94%で圧倒的なシェアを維持しています。Gartnerマジッククアドラントでリーダー認定、4.6/5.0の高評価を獲得し、AI搭載インサイト、observability as codeで進化を続けています。バージョン11.3+では更なる機能強化が図られ、12.0でのメジャーアップデートが計画されています。

主要な技術的特徴

  • マルチデータソース対応: Prometheus、InfluxDB、Elasticsearch等幅広い対応
  • リッチなビジュアライゼーション: 豊富なパネルタイプとカスタマイズ機能
  • アラート機能: 高度なアラートルールとNotification管理
  • プラグインアーキテクチャ: 拡張可能なプラグインエコシステム
  • プロビジョニング: コードによるダッシュボード・データソース管理

用途

  • システム監視ダッシュボード
  • アプリケーション性能監視(APM)
  • ビジネスメトリクス可視化
  • IoTデータ監視
  • ログ分析とトラブルシューティング

メリット・デメリット

メリット

  • 圧倒的な人気: 94%の高い採用率と豊富なコミュニティ
  • 豊富なデータソース: 80+のデータソースプラグイン
  • 直感的UI: ユーザーフレンドリーなダッシュボード作成
  • プラグインエコシステム: 豊富なプラグインによる拡張性
  • エンタープライズ機能: 商用版での高度な機能提供
  • オープンソース: 無料で高機能な監視プラットフォーム

デメリット

  • データソース依存: データ保存機能なし、外部データソースが必須
  • パフォーマンス: 大量データでのレンダリング負荷
  • 設定複雑さ: 高度な機能利用時の設定複雑性
  • セキュリティ管理: 多数のプラグインでのセキュリティ管理
  • 商用機能制限: 一部機能はEnterprise版でのみ利用可能

参考ページ

書き方の例

基本的なGrafana設定

# grafana.ini
[server]
# Protocol (http, https, h2, socket)
protocol = http
http_addr = 0.0.0.0
http_port = 3000
domain = localhost
root_url = %(protocol)s://%(domain)s:%(http_port)s/

# Security
[security]
admin_user = admin
admin_password = admin123
secret_key = your_secret_key_here
disable_gravatar = false
allow_embedding = true

# Database設定
[database]
type = sqlite3
host = 127.0.0.1:3306
name = grafana
user = root
password = password
ssl_mode = disable
path = grafana.db

# Session設定
[session]
provider = file
provider_config = sessions
cookie_name = grafana_sess
cookie_secure = false
session_life_time = 86400

# Analytics設定
[analytics]
reporting_enabled = true
check_for_updates = true
google_analytics_ua_id = 

# アラート設定
[alerting]
enabled = true
execute_alerts = true
error_or_timeout = alerting
nodata_or_nullvalues = no_data
concurrent_render_limit = 5

# SMTP設定
[smtp]
enabled = false
host = localhost:587
user = 
password = 
from_address = [email protected]
from_name = Grafana

# ログ設定
[log]
mode = console file
level = info
filters = rendering:debug

# パフォーマンス設定
[dashboards]
versions_to_keep = 20
min_refresh_interval = 5s

# プラグイン設定
[plugins]
enable_alpha = false
app_tls_skip_verify_insecure = false

データソース設定(Prometheus)

# datasources.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true
    jsonData:
      httpMethod: POST
      prometheusType: Prometheus
      prometheusVersion: 2.45.0
      queryTimeout: 60s
      timeInterval: 15s
      exemplarTraceIdDestinations:
        - name: trace_id
          datasourceUid: jaeger-uid
    secureJsonData:
      httpHeaderValue1: Bearer your_token_here

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      maxLines: 1000
      derivedFields:
        - matcherRegex: "trace_id=(\\w+)"
          name: TraceID
          url: "$${__value.raw}"
          datasourceUid: jaeger-uid

  - name: Jaeger
    type: jaeger
    access: proxy
    url: http://jaeger:14268
    uid: jaeger-uid
    jsonData:
      tracesToLogs:
        datasourceUid: loki-uid
        tags: ['job', 'instance', 'pod', 'namespace']
        mappedTags: [{ key: 'service.name', value: 'service' }]
        mapTagNamesEnabled: true
        filterByTraceID: true
        filterBySpanID: false

  - name: InfluxDB
    type: influxdb
    access: proxy
    url: http://influxdb:8086
    database: telegraf
    user: admin
    password: admin
    jsonData:
      httpMode: GET
      keepCookies: []

ダッシュボードプロビジョニング設定

# dashboards.yml
apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /etc/grafana/dashboards

  - name: 'system-monitoring'
    orgId: 1
    folder: 'System Monitoring'
    folderUid: 'system-monitoring'
    type: file
    disableDeletion: true
    updateIntervalSeconds: 30
    allowUiUpdates: false
    options:
      path: /etc/grafana/dashboards/system
      foldersFromFilesStructure: true

  - name: 'application-monitoring'
    orgId: 1
    folder: 'Application Monitoring'
    folderUid: 'app-monitoring'
    type: file
    options:
      path: /etc/grafana/dashboards/applications

サンプルダッシュボード設定

{
  "annotations": {
    "list": [
      {
        "datasource": {
          "type": "prometheus",
          "uid": "prometheus-uid"
        },
        "enable": true,
        "expr": "increase(deployment_version_change[1m]) > 0",
        "iconColor": "red",
        "name": "Deployments",
        "titleFormat": "Deployment",
        "textFormat": "New version deployed"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 1,
  "id": null,
  "links": [
    {
      "asDropdown": false,
      "icon": "external link",
      "includeVars": true,
      "keepTime": true,
      "tags": ["kubernetes"],
      "targetBlank": true,
      "title": "Kubernetes Dashboards",
      "type": "dashboards"
    }
  ],
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "prometheus-uid"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "prometheus-uid"
          },
          "expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
          "interval": "",
          "legendFormat": "{{instance}}",
          "refId": "A"
        }
      ],
      "title": "CPU Usage",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "prometheus-uid"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "custom": {
            "align": "auto",
            "displayMode": "auto",
            "inspect": false
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "Status"
            },
            "properties": [
              {
                "id": "custom.displayMode",
                "value": "color-background"
              },
              {
                "id": "mappings",
                "value": [
                  {
                    "options": {
                      "0": {
                        "color": "red",
                        "index": 0,
                        "text": "Down"
                      },
                      "1": {
                        "color": "green",
                        "index": 1,
                        "text": "Up"
                      }
                    },
                    "type": "value"
                  }
                ]
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "id": 2,
      "options": {
        "showHeader": true
      },
      "pluginVersion": "8.3.3",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "prometheus-uid"
          },
          "expr": "up",
          "format": "table",
          "instant": true,
          "refId": "A"
        }
      ],
      "title": "Service Status",
      "transformations": [
        {
          "id": "organize",
          "options": {
            "excludeByName": {
              "__name__": true,
              "Time": true
            },
            "indexByName": {},
            "renameByName": {
              "Value": "Status",
              "instance": "Instance",
              "job": "Job"
            }
          }
        }
      ],
      "type": "table"
    }
  ],
  "refresh": "30s",
  "schemaVersion": 34,
  "style": "dark",
  "tags": ["monitoring", "infrastructure"],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "prometheus-uid"
        },
        "definition": "label_values(up, instance)",
        "hide": 0,
        "includeAll": true,
        "label": "Instance",
        "multi": true,
        "name": "instance",
        "options": [],
        "query": {
          "query": "label_values(up, instance)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      }
    ]
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "Infrastructure Overview",
  "uid": "infrastructure-overview",
  "version": 1,
  "weekStart": ""
}

アラートルール設定

# alert-rules.yml
groups:
  - name: system.rules
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "High CPU usage detected"
          description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: critical
          service: system
        annotations:
          summary: "High memory usage detected"
          description: "Memory usage is above 90% on {{ $labels.instance }}"

      - alert: DiskSpaceLow
        expr: 100 - ((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes) > 85
        for: 5m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "Low disk space"
          description: "Disk usage is above 85% on {{ $labels.instance }}"

      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
          service: "{{ $labels.job }}"
        annotations:
          summary: "Service is down"
          description: "{{ $labels.job }} on {{ $labels.instance }} has been down for more than 1 minute"

Notification設定

{
  "name": "Slack Notifications",
  "type": "slack",
  "settings": {
    "url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
    "channel": "#alerts",
    "username": "Grafana",
    "iconEmoji": ":exclamation:",
    "iconUrl": "",
    "title": "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}",
    "text": "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}"
  }
}

Docker Compose設定

version: '3.8'

services:
  grafana:
    image: grafana/grafana:11.3.0
    container_name: grafana
    restart: unless-stopped
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-simple-json-datasource,grafana-worldmap-panel
      - GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=true
      - GF_FEATURE_TOGGLES_ENABLE=publicDashboards
    ports:
      - "3000:3000"
    volumes:
      - grafana-storage:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
      - ./grafana/dashboards:/etc/grafana/dashboards
    networks:
      - monitoring

  prometheus:
    image: prom/prometheus:v2.45.0
    container_name: prometheus
    restart: unless-stopped
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=15d'
      - '--web.enable-lifecycle'
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus:/etc/prometheus
      - prometheus-storage:/prometheus
    networks:
      - monitoring

volumes:
  grafana-storage:
  prometheus-storage:

networks:
  monitoring:
    driver: bridge

カスタムプラグイン開発例

// Custom Panel Plugin
import { PanelPlugin } from '@grafana/data';
import { SimpleOptions } from './types';
import { SimplePanel } from './SimplePanel';

export const plugin = new PanelPlugin<SimpleOptions>(SimplePanel).setPanelOptions((builder) => {
  return builder
    .addTextInput({
      path: 'text',
      name: 'Simple text option',
      description: 'Description of panel option',
      defaultValue: 'Default value of text input option',
    })
    .addBooleanSwitch({
      path: 'showSeriesCount',
      name: 'Show series counter',
      defaultValue: false,
    })
    .addColorPicker({
      path: 'seriesCountSize',
      name: 'Series counter size',
      defaultValue: 'sm',
    });
});

// Panel Component
import React from 'react';
import { PanelProps } from '@grafana/data';
import { SimpleOptions } from 'types';

interface Props extends PanelProps<SimpleOptions> {}

export const SimplePanel: React.FC<Props> = ({ options, data, width, height }) => {
  return (
    <div
      style={{
        width,
        height,
        display: 'flex',
        flexDirection: 'column',
        justifyContent: 'center',
        alignItems: 'center',
      }}
    >
      <div>Simple Panel</div>
      <div>{options.text}</div>
      {options.showSeriesCount && <div>Number of series: {data.series.length}</div>}
    </div>
  );
};

Terraform設定例

# Terraform Provider設定
terraform {
  required_providers {
    grafana = {
      source  = "grafana/grafana"
      version = "~> 1.40.0"
    }
  }
}

provider "grafana" {
  url  = "http://localhost:3000"
  auth = "admin:admin123"
}

# データソース作成
resource "grafana_data_source" "prometheus" {
  type       = "prometheus"
  name       = "Prometheus"
  url        = "http://prometheus:9090"
  is_default = true

  json_data_encoded = jsonencode({
    httpMethod   = "POST"
    queryTimeout = "60s"
    timeInterval = "15s"
  })
}

# フォルダー作成
resource "grafana_folder" "monitoring" {
  title = "Infrastructure Monitoring"
}

# ダッシュボード作成
resource "grafana_dashboard" "system_overview" {
  folder      = grafana_folder.monitoring.id
  config_json = file("${path.module}/dashboards/system-overview.json")
}

# アラート通知チャンネル
resource "grafana_notification_channel" "slack" {
  name = "slack-alerts"
  type = "slack"

  settings = {
    url       = var.slack_webhook_url
    channel   = "#alerts"
    username  = "Grafana"
    iconEmoji = ":exclamation:"
  }
}

# アラートルール
resource "grafana_rule_group" "system_alerts" {
  name             = "system.rules"
  folder_uid       = grafana_folder.monitoring.uid
  interval_seconds = 60

  rule {
    name      = "HighCPUUsage"
    condition = "A"
    for       = "5m"

    data {
      ref_id = "A"
      
      relative_time_range {
        from = 600
        to   = 0
      }

      datasource {
        uid = grafana_data_source.prometheus.uid
      }

      model = jsonencode({
        expr    = "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
        refId   = "A"
      })
    }

    annotations = {
      summary     = "High CPU usage detected"
      description = "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"
    }

    labels = {
      severity = "warning"
      service  = "system"
    }
  }
}