Grafana
最も人気の高いオープンソース監視・可視化プラットフォーム。動的ダッシュボード、機械学習統合、多様なデータソース対応が特徴。Gartnerマジッククアドラントでリーダーに認定。
監視サーバー
Grafana
概要
Grafanaは最も人気の高いオープンソース監視・可視化プラットフォームです。動的ダッシュボード、機械学習統合、多様なデータソース対応が特徴で、Gartnerマジッククアドラントでリーダーに認定されています。採用率94%で圧倒的なシェアを誇り、AI搭載インサイト、observability as codeで進化を続ける次世代監視プラットフォームです。
詳細
Grafanaは2014年にTorkel Ödegaardによって開発が開始され、現在では採用率94%で圧倒的なシェアを維持しています。Gartnerマジッククアドラントでリーダー認定、4.6/5.0の高評価を獲得し、AI搭載インサイト、observability as codeで進化を続けています。バージョン11.3+では更なる機能強化が図られ、12.0でのメジャーアップデートが計画されています。
主要な技術的特徴
- マルチデータソース対応: Prometheus、InfluxDB、Elasticsearch等幅広い対応
- リッチなビジュアライゼーション: 豊富なパネルタイプとカスタマイズ機能
- アラート機能: 高度なアラートルールとNotification管理
- プラグインアーキテクチャ: 拡張可能なプラグインエコシステム
- プロビジョニング: コードによるダッシュボード・データソース管理
用途
- システム監視ダッシュボード
- アプリケーション性能監視(APM)
- ビジネスメトリクス可視化
- IoTデータ監視
- ログ分析とトラブルシューティング
メリット・デメリット
メリット
- 圧倒的な人気: 94%の高い採用率と豊富なコミュニティ
- 豊富なデータソース: 80+のデータソースプラグイン
- 直感的UI: ユーザーフレンドリーなダッシュボード作成
- プラグインエコシステム: 豊富なプラグインによる拡張性
- エンタープライズ機能: 商用版での高度な機能提供
- オープンソース: 無料で高機能な監視プラットフォーム
デメリット
- データソース依存: データ保存機能なし、外部データソースが必須
- パフォーマンス: 大量データでのレンダリング負荷
- 設定複雑さ: 高度な機能利用時の設定複雑性
- セキュリティ管理: 多数のプラグインでのセキュリティ管理
- 商用機能制限: 一部機能はEnterprise版でのみ利用可能
参考ページ
書き方の例
基本的なGrafana設定
# grafana.ini
[server]
# Protocol (http, https, h2, socket)
protocol = http
http_addr = 0.0.0.0
http_port = 3000
domain = localhost
root_url = %(protocol)s://%(domain)s:%(http_port)s/
# Security
[security]
admin_user = admin
admin_password = admin123
secret_key = your_secret_key_here
disable_gravatar = false
allow_embedding = true
# Database設定
[database]
type = sqlite3
host = 127.0.0.1:3306
name = grafana
user = root
password = password
ssl_mode = disable
path = grafana.db
# Session設定
[session]
provider = file
provider_config = sessions
cookie_name = grafana_sess
cookie_secure = false
session_life_time = 86400
# Analytics設定
[analytics]
reporting_enabled = true
check_for_updates = true
google_analytics_ua_id =
# アラート設定
[alerting]
enabled = true
execute_alerts = true
error_or_timeout = alerting
nodata_or_nullvalues = no_data
concurrent_render_limit = 5
# SMTP設定
[smtp]
enabled = false
host = localhost:587
user =
password =
from_address = [email protected]
from_name = Grafana
# ログ設定
[log]
mode = console file
level = info
filters = rendering:debug
# パフォーマンス設定
[dashboards]
versions_to_keep = 20
min_refresh_interval = 5s
# プラグイン設定
[plugins]
enable_alpha = false
app_tls_skip_verify_insecure = false
データソース設定(Prometheus)
# datasources.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
jsonData:
httpMethod: POST
prometheusType: Prometheus
prometheusVersion: 2.45.0
queryTimeout: 60s
timeInterval: 15s
exemplarTraceIdDestinations:
- name: trace_id
datasourceUid: jaeger-uid
secureJsonData:
httpHeaderValue1: Bearer your_token_here
- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
maxLines: 1000
derivedFields:
- matcherRegex: "trace_id=(\\w+)"
name: TraceID
url: "$${__value.raw}"
datasourceUid: jaeger-uid
- name: Jaeger
type: jaeger
access: proxy
url: http://jaeger:14268
uid: jaeger-uid
jsonData:
tracesToLogs:
datasourceUid: loki-uid
tags: ['job', 'instance', 'pod', 'namespace']
mappedTags: [{ key: 'service.name', value: 'service' }]
mapTagNamesEnabled: true
filterByTraceID: true
filterBySpanID: false
- name: InfluxDB
type: influxdb
access: proxy
url: http://influxdb:8086
database: telegraf
user: admin
password: admin
jsonData:
httpMode: GET
keepCookies: []
ダッシュボードプロビジョニング設定
# dashboards.yml
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /etc/grafana/dashboards
- name: 'system-monitoring'
orgId: 1
folder: 'System Monitoring'
folderUid: 'system-monitoring'
type: file
disableDeletion: true
updateIntervalSeconds: 30
allowUiUpdates: false
options:
path: /etc/grafana/dashboards/system
foldersFromFilesStructure: true
- name: 'application-monitoring'
orgId: 1
folder: 'Application Monitoring'
folderUid: 'app-monitoring'
type: file
options:
path: /etc/grafana/dashboards/applications
サンプルダッシュボード設定
{
"annotations": {
"list": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus-uid"
},
"enable": true,
"expr": "increase(deployment_version_change[1m]) > 0",
"iconColor": "red",
"name": "Deployments",
"titleFormat": "Deployment",
"textFormat": "New version deployed"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [
{
"asDropdown": false,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": ["kubernetes"],
"targetBlank": true,
"title": "Kubernetes Dashboards",
"type": "dashboards"
}
],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus-uid"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus-uid"
},
"expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
"interval": "",
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"title": "CPU Usage",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus-uid"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": "auto",
"displayMode": "auto",
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "Status"
},
"properties": [
{
"id": "custom.displayMode",
"value": "color-background"
},
{
"id": "mappings",
"value": [
{
"options": {
"0": {
"color": "red",
"index": 0,
"text": "Down"
},
"1": {
"color": "green",
"index": 1,
"text": "Up"
}
},
"type": "value"
}
]
}
]
}
]
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 2,
"options": {
"showHeader": true
},
"pluginVersion": "8.3.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus-uid"
},
"expr": "up",
"format": "table",
"instant": true,
"refId": "A"
}
],
"title": "Service Status",
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"__name__": true,
"Time": true
},
"indexByName": {},
"renameByName": {
"Value": "Status",
"instance": "Instance",
"job": "Job"
}
}
}
],
"type": "table"
}
],
"refresh": "30s",
"schemaVersion": 34,
"style": "dark",
"tags": ["monitoring", "infrastructure"],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "All",
"value": "$__all"
},
"datasource": {
"type": "prometheus",
"uid": "prometheus-uid"
},
"definition": "label_values(up, instance)",
"hide": 0,
"includeAll": true,
"label": "Instance",
"multi": true,
"name": "instance",
"options": [],
"query": {
"query": "label_values(up, instance)",
"refId": "PrometheusVariableQueryEditor-VariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Infrastructure Overview",
"uid": "infrastructure-overview",
"version": 1,
"weekStart": ""
}
アラートルール設定
# alert-rules.yml
groups:
- name: system.rules
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
service: system
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: critical
service: system
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 90% on {{ $labels.instance }}"
- alert: DiskSpaceLow
expr: 100 - ((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes) > 85
for: 5m
labels:
severity: warning
service: system
annotations:
summary: "Low disk space"
description: "Disk usage is above 85% on {{ $labels.instance }}"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
service: "{{ $labels.job }}"
annotations:
summary: "Service is down"
description: "{{ $labels.job }} on {{ $labels.instance }} has been down for more than 1 minute"
Notification設定
{
"name": "Slack Notifications",
"type": "slack",
"settings": {
"url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
"channel": "#alerts",
"username": "Grafana",
"iconEmoji": ":exclamation:",
"iconUrl": "",
"title": "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}",
"text": "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}"
}
}
Docker Compose設定
version: '3.8'
services:
grafana:
image: grafana/grafana:11.3.0
container_name: grafana
restart: unless-stopped
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin123
- GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-simple-json-datasource,grafana-worldmap-panel
- GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=true
- GF_FEATURE_TOGGLES_ENABLE=publicDashboards
ports:
- "3000:3000"
volumes:
- grafana-storage:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
- ./grafana/dashboards:/etc/grafana/dashboards
networks:
- monitoring
prometheus:
image: prom/prometheus:v2.45.0
container_name: prometheus
restart: unless-stopped
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=15d'
- '--web.enable-lifecycle'
ports:
- "9090:9090"
volumes:
- ./prometheus:/etc/prometheus
- prometheus-storage:/prometheus
networks:
- monitoring
volumes:
grafana-storage:
prometheus-storage:
networks:
monitoring:
driver: bridge
カスタムプラグイン開発例
// Custom Panel Plugin
import { PanelPlugin } from '@grafana/data';
import { SimpleOptions } from './types';
import { SimplePanel } from './SimplePanel';
export const plugin = new PanelPlugin<SimpleOptions>(SimplePanel).setPanelOptions((builder) => {
return builder
.addTextInput({
path: 'text',
name: 'Simple text option',
description: 'Description of panel option',
defaultValue: 'Default value of text input option',
})
.addBooleanSwitch({
path: 'showSeriesCount',
name: 'Show series counter',
defaultValue: false,
})
.addColorPicker({
path: 'seriesCountSize',
name: 'Series counter size',
defaultValue: 'sm',
});
});
// Panel Component
import React from 'react';
import { PanelProps } from '@grafana/data';
import { SimpleOptions } from 'types';
interface Props extends PanelProps<SimpleOptions> {}
export const SimplePanel: React.FC<Props> = ({ options, data, width, height }) => {
return (
<div
style={{
width,
height,
display: 'flex',
flexDirection: 'column',
justifyContent: 'center',
alignItems: 'center',
}}
>
<div>Simple Panel</div>
<div>{options.text}</div>
{options.showSeriesCount && <div>Number of series: {data.series.length}</div>}
</div>
);
};
Terraform設定例
# Terraform Provider設定
terraform {
required_providers {
grafana = {
source = "grafana/grafana"
version = "~> 1.40.0"
}
}
}
provider "grafana" {
url = "http://localhost:3000"
auth = "admin:admin123"
}
# データソース作成
resource "grafana_data_source" "prometheus" {
type = "prometheus"
name = "Prometheus"
url = "http://prometheus:9090"
is_default = true
json_data_encoded = jsonencode({
httpMethod = "POST"
queryTimeout = "60s"
timeInterval = "15s"
})
}
# フォルダー作成
resource "grafana_folder" "monitoring" {
title = "Infrastructure Monitoring"
}
# ダッシュボード作成
resource "grafana_dashboard" "system_overview" {
folder = grafana_folder.monitoring.id
config_json = file("${path.module}/dashboards/system-overview.json")
}
# アラート通知チャンネル
resource "grafana_notification_channel" "slack" {
name = "slack-alerts"
type = "slack"
settings = {
url = var.slack_webhook_url
channel = "#alerts"
username = "Grafana"
iconEmoji = ":exclamation:"
}
}
# アラートルール
resource "grafana_rule_group" "system_alerts" {
name = "system.rules"
folder_uid = grafana_folder.monitoring.uid
interval_seconds = 60
rule {
name = "HighCPUUsage"
condition = "A"
for = "5m"
data {
ref_id = "A"
relative_time_range {
from = 600
to = 0
}
datasource {
uid = grafana_data_source.prometheus.uid
}
model = jsonencode({
expr = "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
refId = "A"
})
}
annotations = {
summary = "High CPU usage detected"
description = "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"
}
labels = {
severity = "warning"
service = "system"
}
}
}