Elasticsearch

Distributed RESTful search and analytics engine. Specialized in full-text search, log analysis, and metrics analysis. Features real-time search and scalability.

Database ServerSearch EngineDistributed SystemNoSQLReal-time SearchAnalytics EngineLog AnalyticsAI/ML Support

Database Server

Elasticsearch

Overview

Elasticsearch is a distributed, real-time search and analytics engine built on top of Apache Lucene, serving as a high-performance NoSQL database. Developed by Elastic N.V. since 2010, it has become the widely adopted core of search and analytics platforms worldwide. It provides comprehensive functionality required for modern data-driven applications, including millisecond-level high-speed search for large datasets, real-time analytics, vector search, and generative AI integration. With intuitive operation through RESTful APIs and unlimited scalability through horizontal scaling, it serves as the foundation for mission-critical systems across organizations of all sizes, from startups to large enterprises.

Details

Elasticsearch 2025 edition has evolved significantly as the data analytics foundation for the generative AI era. The latest version integrates native support for RAG (Retrieval Augmented Generation), high-precision vector search, real-time streaming analytics, and security analytics SIEM functionality, functioning as a comprehensive data platform beyond the traditional search engine framework. Through managed service offerings on Elastic Cloud, it achieves enterprise-level availability and security while minimizing infrastructure management overhead. Flexible deployment across Docker, Kubernetes, on-premises, and multi-cloud environments supports data volumes from terabytes to petabytes. Machine Learning features for anomaly detection, trend prediction, and automated operational optimization provide value as an intelligent data platform beyond mere data storage.

Key Features

  • Distributed Architecture: High availability and scalability through automatic sharding and replication
  • Real-time Search: Millisecond-level high-speed full-text search and faceted search
  • Vector Search: Semantic search through integration with AI and machine learning models
  • RESTful API: Intuitive HTTP-based API design
  • Diverse Data Types: Comprehensive support for text, numeric, geo-location, time-series, JSON, and more
  • Elastic Stack Integration: Complete integrated ecosystem with Kibana, Logstash, and Beats

Pros and Cons

Pros

  • Extremely fast search and analytics performance for large-scale data
  • Flexible data modeling and rapid development through schema-less design
  • Virtually unlimited scalability through horizontal scaling
  • Advanced analytics capabilities through rich Query DSL and aggregation functions
  • Comprehensive data pipeline construction through Elastic Stack ecosystem
  • Active open-source community and enterprise support

Cons

  • Learning cost for operation and configuration due to distributed system complexity
  • Limited ACID transaction support due to real-time focus
  • High memory usage and storage costs for large amounts of data
  • Performance issues and data loss risk from cluster design mistakes
  • High commercial licensing costs for advanced features (Elastic License restrictions)
  • Temporary data consistency challenges due to replication delays

Reference Pages

Code Examples

Installation and Basic Setup

# Environment setup using Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - ES_JAVA_OPTS=-Xms2g -Xmx2g
      - xpack.security.enabled=false  # For development environment
      - cluster.name=docker-cluster
      - node.name=es01
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - es_data:/usr/share/elasticsearch/data
    ports:
      - "9200:9200"
      - "9300:9300"
    restart: unless-stopped

  kibana:
    image: docker.elastic.co/kibana/kibana:8.15.0
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
      - SERVER_NAME=kibana
      - SERVER_HOST=0.0.0.0
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch
    restart: unless-stopped

volumes:
  es_data:
    driver: local
EOF

# Start services
docker-compose up -d

# Verify operation
curl http://localhost:9200
curl http://localhost:9200/_cluster/health

# Native installation on Linux environment
# Add Elastic official repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Install Elasticsearch
sudo apt update
sudo apt install elasticsearch

# Edit configuration file
sudo nano /etc/elasticsearch/elasticsearch.yml

# Basic configuration example
cat > /etc/elasticsearch/elasticsearch.yml << 'EOF'
cluster.name: my-cluster
node.name: node-1
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
EOF

# Start service
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service
sudo systemctl status elasticsearch.service

Basic Index Operations and Data Management

# Create index with mapping definition
curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "standard"
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "price": {
        "type": "double"
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      "location": {
        "type": "geo_point"
      },
      "stock_count": {
        "type": "integer"
      },
      "is_active": {
        "type": "boolean"
      }
    }
  },
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "index": {
      "refresh_interval": "1s"
    }
  }
}'

# Insert single document
curl -X POST "localhost:9200/products/_doc/1" -H 'Content-Type: application/json' -d'
{
  "name": "MacBook Pro 16-inch",
  "description": "High-performance laptop with Apple M3 Max",
  "price": 3980,
  "category": "laptop",
  "tags": ["apple", "m3", "professional"],
  "created_at": "2025-01-15 10:30:00",
  "location": {
    "lat": 37.7749,
    "lon": -122.4194
  },
  "stock_count": 15,
  "is_active": true
}'

# Bulk insert
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "products", "_id" : "2" } }
{ "name": "ThinkPad X1 Carbon", "description": "Lightweight business laptop", "price": 2890, "category": "laptop", "tags": ["lenovo", "business"], "created_at": "2025-01-15 11:00:00", "stock_count": 8, "is_active": true }
{ "index" : { "_index" : "products", "_id" : "3" } }
{ "name": "iPhone 15 Pro", "description": "Latest iPhone model", "price": 1598, "category": "smartphone", "tags": ["apple", "5g"], "created_at": "2025-01-15 11:15:00", "stock_count": 25, "is_active": true }
{ "index" : { "_index" : "products", "_id" : "4" } }
{ "name": "Surface Pro 9", "description": "2-in-1 tablet PC", "price": 1898, "category": "tablet", "tags": ["microsoft", "2in1"], "created_at": "2025-01-15 11:30:00", "stock_count": 12, "is_active": true }
'

# Get document
curl -X GET "localhost:9200/products/_doc/1"

# Update document
curl -X POST "localhost:9200/products/_update/1" -H 'Content-Type: application/json' -d'
{
  "doc": {
    "price": 3780,
    "stock_count": 12
  }
}'

# Check index information
curl -X GET "localhost:9200/products"
curl -X GET "localhost:9200/products/_mapping"
curl -X GET "localhost:9200/products/_settings"

# Index statistics
curl -X GET "localhost:9200/products/_stats"

Advanced Search Queries and Aggregations

# Basic full-text search
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "description": "laptop"
    }
  }
}'

# Complex search query (bool query)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "range": { "price": { "gte": 1000, "lte": 3000 } } }
      ],
      "should": [
        { "match": { "name": "MacBook" } },
        { "match": { "name": "ThinkPad" } }
      ],
      "filter": [
        { "term": { "category": "laptop" } },
        { "term": { "is_active": true } }
      ],
      "must_not": [
        { "range": { "stock_count": { "lte": 5 } } }
      ]
    }
  },
  "sort": [
    { "price": { "order": "desc" } },
    { "created_at": { "order": "desc" } }
  ],
  "size": 10,
  "from": 0
}'

# Fuzzy search and wildcard search
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "should": [
        {
          "fuzzy": {
            "name": {
              "value": "MacBok",
              "fuzziness": "AUTO"
            }
          }
        },
        {
          "wildcard": {
            "name": "*Book*"
          }
        }
      ]
    }
  }
}'

# Aggregation analysis
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "max_price": {
          "max": {
            "field": "price"
          }
        },
        "total_stock": {
          "sum": {
            "field": "stock_count"
          }
        }
      }
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 1000 },
          { "from": 1000, "to": 2000 },
          { "from": 2000, "to": 3000 },
          { "from": 3000 }
        ]
      }
    },
    "daily_sales": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day",
        "format": "yyyy-MM-dd"
      }
    }
  }
}'

# Geo distance search
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "geo_distance": {
      "distance": "10km",
      "location": {
        "lat": 37.7749,
        "lon": -122.4194
      }
    }
  }
}'

Index Templates and Lifecycle Management

# Create index template
curl -X PUT "localhost:9200/_index_template/logs_template" -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs_policy",
      "index.lifecycle.rollover_alias": "logs-alias"
    },
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "level": {
          "type": "keyword"
        },
        "message": {
          "type": "text",
          "analyzer": "standard"
        },
        "service": {
          "type": "keyword"
        },
        "host": {
          "type": "keyword"
        },
        "request_id": {
          "type": "keyword"
        }
      }
    }
  },
  "priority": 100
}'

# Create ILM (Index Lifecycle Management) policy
curl -X PUT "localhost:9200/_ilm/policy/logs_policy" -H 'Content-Type: application/json' -d'
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "10GB",
            "max_age": "7d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "set_priority": {
            "priority": 50
          },
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}'

# Create alias
curl -X PUT "localhost:9200/logs-000001" -H 'Content-Type: application/json' -d'
{
  "aliases": {
    "logs-alias": {
      "is_write_index": true
    }
  }
}'

# Snapshot configuration
curl -X PUT "localhost:9200/_snapshot/backup_repository" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/usr/share/elasticsearch/backup",
    "compress": true
  }
}'

# Create snapshot
curl -X PUT "localhost:9200/_snapshot/backup_repository/snapshot_1" -H 'Content-Type: application/json' -d'
{
  "indices": "products,logs-*",
  "ignore_unavailable": true,
  "include_global_state": false
}'

Performance Optimization and Monitoring

# Check cluster health status
curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:9200/_cluster/stats?pretty"
curl -X GET "localhost:9200/_nodes/stats?pretty"

# Index optimization
curl -X POST "localhost:9200/products/_forcemerge?max_num_segments=1"

# Clear cache
curl -X POST "localhost:9200/_cache/clear"
curl -X POST "localhost:9200/products/_cache/clear?field_data=true&query=true"

# Dynamic update of index settings
curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "refresh_interval": "30s",
    "number_of_replicas": 2,
    "max_result_window": 50000
  }
}'

# Slow query log configuration
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "index.search.slowlog.threshold.query.warn": "10s",
    "index.search.slowlog.threshold.query.info": "5s",
    "index.search.slowlog.threshold.query.debug": "2s",
    "index.search.slowlog.threshold.fetch.warn": "1s"
  }
}'

# Query execution with profiling
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "profile": true,
  "query": {
    "bool": {
      "must": [
        { "match": { "description": "laptop" } }
      ],
      "filter": [
        { "range": { "price": { "gte": 2000 } } }
      ]
    }
  }
}'

# Hot threads analysis
curl -X GET "localhost:9200/_nodes/hot_threads"

# Task management
curl -X GET "localhost:9200/_tasks?detailed=true&actions=*search*"

# Memory usage check
curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"

Security and Access Control

# Enable security settings (elasticsearch.yml)
cat >> /etc/elasticsearch/elasticsearch.yml << 'EOF'
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true
EOF

# Set built-in user passwords
/usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive

# Create role
curl -X POST "localhost:9200/_security/role/products_read_only" -H 'Content-Type: application/json' -u elastic:password -d'
{
  "cluster": [],
  "indices": [
    {
      "names": [ "products" ],
      "privileges": [ "read" ],
      "field_security": {
        "grant": [ "name", "description", "price", "category" ]
      }
    }
  ]
}'

# Create user
curl -X POST "localhost:9200/_security/user/product_viewer" -H 'Content-Type: application/json' -u elastic:password -d'
{
  "password" : "product123",
  "roles" : [ "products_read_only" ],
  "full_name" : "Product Viewer",
  "email" : "[email protected]"
}'

# Create API key
curl -X POST "localhost:9200/_security/api_key" -H 'Content-Type: application/json' -u elastic:password -d'
{
  "name": "products_api_key",
  "expiration": "1d",
  "role_descriptors": {
    "products_access": {
      "cluster": ["monitor"],
      "index": [
        {
          "names": ["products"],
          "privileges": ["read", "write"]
        }
      ]
    }
  }
}'

# IP allowlist configuration
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -u elastic:password -d'
{
  "persistent": {
    "xpack.security.http.filter.allow": "192.168.1.0/24,10.0.0.0/8",
    "xpack.security.http.filter.deny": "_all"
  }
}'

Application Integration Examples

# Python Elasticsearch client
from elasticsearch import Elasticsearch
from datetime import datetime
import json

class ElasticsearchManager:
    def __init__(self, hosts=['localhost:9200'], **kwargs):
        """Initialize Elasticsearch client"""
        self.es = Elasticsearch(
            hosts=hosts,
            # Authentication settings
            basic_auth=('elastic', 'password'),  # Basic authentication
            # api_key=('id', 'api_key'),         # API key authentication
            
            # SSL settings
            verify_certs=False,
            ssl_show_warn=False,
            
            # Connection settings
            timeout=30,
            max_retries=3,
            retry_on_timeout=True,
            **kwargs
        )
    
    def check_connection(self):
        """Check connection"""
        try:
            info = self.es.info()
            print(f"Elasticsearch connection successful: {info['version']['number']}")
            return True
        except Exception as e:
            print(f"Connection error: {e}")
            return False
    
    def create_index(self, index_name, mapping=None, settings=None):
        """Create index"""
        body = {}
        if mapping:
            body['mappings'] = mapping
        if settings:
            body['settings'] = settings
        
        try:
            if not self.es.indices.exists(index=index_name):
                self.es.indices.create(index=index_name, body=body)
                print(f"Created index '{index_name}'")
            else:
                print(f"Index '{index_name}' already exists")
        except Exception as e:
            print(f"Index creation error: {e}")
    
    def index_document(self, index_name, doc_id, document):
        """Insert document"""
        try:
            result = self.es.index(
                index=index_name,
                id=doc_id,
                body=document
            )
            return result
        except Exception as e:
            print(f"Document insertion error: {e}")
            return None
    
    def bulk_index(self, index_name, documents):
        """Bulk insert"""
        from elasticsearch.helpers import bulk
        
        actions = []
        for doc_id, document in documents.items():
            action = {
                '_index': index_name,
                '_id': doc_id,
                '_source': document
            }
            actions.append(action)
        
        try:
            result = bulk(self.es, actions)
            print(f"Bulk insert completed: {result[0]} successful")
            return result
        except Exception as e:
            print(f"Bulk insert error: {e}")
            return None
    
    def search(self, index_name, query, size=10, from_=0, sort=None):
        """Execute search"""
        body = {
            'query': query,
            'size': size,
            'from': from_
        }
        
        if sort:
            body['sort'] = sort
        
        try:
            result = self.es.search(index=index_name, body=body)
            return result
        except Exception as e:
            print(f"Search error: {e}")
            return None
    
    def aggregate(self, index_name, aggregations, query=None):
        """Execute aggregation"""
        body = {
            'size': 0,
            'aggs': aggregations
        }
        
        if query:
            body['query'] = query
        
        try:
            result = self.es.search(index=index_name, body=body)
            return result['aggregations']
        except Exception as e:
            print(f"Aggregation error: {e}")
            return None

# Implementation example
if __name__ == "__main__":
    # Initialize Elasticsearch manager
    es_manager = ElasticsearchManager()
    
    # Check connection
    if not es_manager.check_connection():
        exit(1)
    
    # Create index
    product_mapping = {
        'properties': {
            'name': {'type': 'text', 'analyzer': 'standard'},
            'description': {'type': 'text', 'analyzer': 'english'},
            'price': {'type': 'double'},
            'category': {'type': 'keyword'},
            'tags': {'type': 'keyword'},
            'created_at': {'type': 'date'},
            'stock_count': {'type': 'integer'},
            'is_active': {'type': 'boolean'}
        }
    }
    
    es_manager.create_index('products', mapping=product_mapping)
    
    # Insert sample data
    sample_products = {
        '1': {
            'name': 'MacBook Pro 16-inch',
            'description': 'High-performance laptop with Apple M3 Max',
            'price': 3980,
            'category': 'laptop',
            'tags': ['apple', 'm3', 'professional'],
            'created_at': datetime.now().isoformat(),
            'stock_count': 15,
            'is_active': True
        },
        '2': {
            'name': 'ThinkPad X1 Carbon',
            'description': 'Lightweight business laptop',
            'price': 2890,
            'category': 'laptop',
            'tags': ['lenovo', 'business'],
            'created_at': datetime.now().isoformat(),
            'stock_count': 8,
            'is_active': True
        }
    }
    
    es_manager.bulk_index('products', sample_products)
    
    # Execute search
    search_query = {
        'bool': {
            'must': [
                {'match': {'description': 'laptop'}}
            ],
            'filter': [
                {'range': {'price': {'gte': 2000}}}
            ]
        }
    }
    
    results = es_manager.search('products', search_query)
    print(f"Search results: {results['hits']['total']['value']} items")
    
    # Execute aggregation
    aggs = {
        'categories': {
            'terms': {
                'field': 'category',
                'size': 10
            },
            'aggs': {
                'avg_price': {
                    'avg': {'field': 'price'}
                }
            }
        }
    }
    
    agg_results = es_manager.aggregate('products', aggs)
    print(f"Aggregation results: {json.dumps(agg_results, indent=2)}")