Elasticsearch

Distributed RESTful search and analytics engine. Based on Apache Lucene, provides real-time search, log analysis, and business analytics capabilities. Returned to open source with AGPLv3 license in 2024.

Search EngineDistributedFull-text SearchReal-timeAnalyticsScalableRESTfulNoSQL

Server

Elasticsearch

Overview

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It provides full-text search, structured search, and analytics capabilities, enabling real-time search and analysis of large volumes of data. With its RESTful API and scalable distributed architecture, it delivers high availability and performance.

Details

Elasticsearch has evolved significantly with its 2024 return to open source under the AGPLv3 license, marking a new chapter in its development. As a mature distributed search platform, it offers comprehensive capabilities including real-time search, near real-time updates, complex query processing, and distributed parallel processing for high throughput. With machine learning integration for anomaly detection, advanced aggregation capabilities, and robust cluster management, Elasticsearch serves as a foundation for modern search and analytics applications across various industries.

Key Features

  • Distributed & Scalable: Automatic sharding and replication with dynamic node management
  • High-Speed Search & Analytics: Lucene-based high-performance indexing with real-time search capabilities
  • Rich Functionality: Full-text search, fuzzy search, geospatial data search, and time-series data processing
  • RESTful API: JSON-based API with comprehensive query DSL
  • Machine Learning: Built-in anomaly detection and data analysis capabilities
  • Cluster Management: Automatic load balancing and failover mechanisms

Pros and Cons

Pros

  • Powerful distributed architecture with automatic scaling and high availability
  • Real-time search capabilities with near real-time indexing and updates
  • Comprehensive search features including complex queries and aggregations
  • Strong ecosystem with Logstash, Kibana, and Beats integration
  • Machine learning capabilities for advanced analytics and anomaly detection
  • Active community support with extensive documentation and resources
  • Return to open source licensing improving adoption flexibility

Cons

  • High resource consumption requiring careful memory and storage management
  • Complex configuration and tuning requirements for optimal performance
  • Cluster management complexity increases with scale and distributed deployments
  • Learning curve for query DSL and advanced configuration options
  • Version compatibility challenges and upgrade complexity in production environments
  • Potential vendor lock-in concerns despite open source licensing

Reference Pages

Code Examples

Setup and Installation

# Ubuntu/Debian installation
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

sudo apt update
sudo apt install elasticsearch

# Enable and start service
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

# CentOS/RHEL installation
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

cat <<EOF | sudo tee /etc/yum.repos.d/elasticsearch.repo
[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md
EOF

sudo yum install --enablerepo=elasticsearch elasticsearch

# Docker deployment
docker run --name es01 \
  --net elastic \
  -p 9200:9200 \
  -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -it docker.elastic.co/elasticsearch/elasticsearch:8.12.0

# Docker with persistent data
docker run --name es01 \
  --net elastic \
  -p 9200:9200 \
  -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -v es_data:/usr/share/elasticsearch/data \
  -it docker.elastic.co/elasticsearch/elasticsearch:8.12.0

Index Creation and Document Management

# Create index with mapping
curl -X PUT "localhost:9200/my-index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 2
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "standard"
      },
      "content": {
        "type": "text",
        "analyzer": "standard"
      },
      "timestamp": {
        "type": "date"
      },
      "location": {
        "type": "geo_point"
      },
      "tags": {
        "type": "keyword"
      },
      "price": {
        "type": "double"
      }
    }
  }
}'

# Add single document
curl -X POST "localhost:9200/my-index/_doc/1" -H 'Content-Type: application/json' -d'
{
  "title": "Introduction to Elasticsearch",
  "content": "Elasticsearch is a powerful search engine",
  "timestamp": "2024-01-15T10:00:00",
  "location": {
    "lat": 40.7589,
    "lon": -73.9851
  },
  "tags": ["search", "elasticsearch", "tutorial"],
  "price": 29.99
}'

# Bulk operations
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_index":"my-index","_id":"1"}}
{"title":"Article 1","content":"This is the first article","timestamp":"2024-01-15T10:00:00","tags":["tech","search"]}
{"index":{"_index":"my-index","_id":"2"}}
{"title":"Article 2","content":"This is the second article","timestamp":"2024-01-15T11:00:00","tags":["tutorial","guide"]}
{"index":{"_index":"my-index","_id":"3"}}
{"title":"Article 3","content":"This is the third article","timestamp":"2024-01-15T12:00:00","tags":["advanced","tips"]}
'

# Update document
curl -X POST "localhost:9200/my-index/_update/1" -H 'Content-Type: application/json' -d'
{
  "doc": {
    "content": "Elasticsearch is a very powerful search engine"
  }
}'

# Delete document
curl -X DELETE "localhost:9200/my-index/_doc/1"

# Get document
curl -X GET "localhost:9200/my-index/_doc/1"

Search Query Implementation

# Basic search queries
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  }
}'

# Text search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "content": "Elasticsearch"
    }
  }
}'

# Phrase search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_phrase": {
      "content": "powerful search engine"
    }
  }
}'

# Boolean search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "Elasticsearch"}}
      ],
      "filter": [
        {"range": {"timestamp": {"gte": "2024-01-01"}}}
      ],
      "must_not": [
        {"match": {"content": "deprecated"}}
      ],
      "should": [
        {"match": {"tags": "tutorial"}}
      ]
    }
  }
}'

# Fuzzy search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "Elasticsarch",
        "fuzziness": "AUTO"
      }
    }
  }
}'

# Wildcard search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "wildcard": {
      "title": "*search*"
    }
  }
}'

# Range queries
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "price": {
        "gte": 10,
        "lte": 50
      }
    }
  }
}'

Performance Optimization

# Index settings optimization
curl -X PUT "localhost:9200/my-index/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "max_result_window": 50000
  }
}'

# Force merge optimization
curl -X POST "localhost:9200/my-index/_forcemerge?max_num_segments=1"

# Clear cache
curl -X POST "localhost:9200/my-index/_cache/clear"

# Index statistics
curl -X GET "localhost:9200/my-index/_stats"

# Node statistics
curl -X GET "localhost:9200/_nodes/stats"

# Cluster health
curl -X GET "localhost:9200/_cluster/health?level=indices&pretty"

Integration and Framework Connectivity

# Python Elasticsearch client
from elasticsearch import Elasticsearch

# Connect to Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

# Index a document
doc = {
    'title': 'Python Integration Example',
    'content': 'Using Python client for Elasticsearch',
    'timestamp': '2024-01-15T10:00:00',
    'tags': ['python', 'elasticsearch', 'integration']
}

es.index(index='my-index', id=1, document=doc)

# Search documents
query = {
    'query': {
        'match': {
            'content': 'python'
        }
    }
}

response = es.search(index='my-index', body=query)
for hit in response['hits']['hits']:
    print(f"ID: {hit['_id']}, Score: {hit['_score']}, Title: {hit['_source']['title']}")

# Aggregation example
agg_query = {
    'size': 0,
    'aggs': {
        'tags_count': {
            'terms': {
                'field': 'tags'
            }
        },
        'avg_price': {
            'avg': {
                'field': 'price'
            }
        }
    }
}

response = es.search(index='my-index', body=agg_query)
print("Tag counts:", response['aggregations']['tags_count']['buckets'])
print("Average price:", response['aggregations']['avg_price']['value'])
// Node.js Elasticsearch client
const { Client } = require('@elastic/elasticsearch');

const client = new Client({ node: 'http://localhost:9200' });

async function indexDocument() {
  const doc = {
    title: 'Node.js Integration Example',
    content: 'Using Node.js client for Elasticsearch',
    timestamp: '2024-01-15T10:00:00',
    tags: ['nodejs', 'elasticsearch', 'javascript']
  };

  const response = await client.index({
    index: 'my-index',
    id: 1,
    body: doc
  });

  console.log('Document indexed:', response.body);
}

async function searchDocuments() {
  const response = await client.search({
    index: 'my-index',
    body: {
      query: {
        match: {
          content: 'nodejs'
        }
      },
      highlight: {
        fields: {
          content: {}
        }
      }
    }
  });

  console.log('Search results:');
  response.body.hits.hits.forEach(hit => {
    console.log(`ID: ${hit._id}, Title: ${hit._source.title}`);
    if (hit.highlight) {
      console.log('Highlighted:', hit.highlight.content);
    }
  });
}

// Execute functions
indexDocument().then(() => searchDocuments());
// Java Elasticsearch client
import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch.core.*;
import co.elastic.clients.json.jackson.JacksonJsonpMapper;
import co.elastic.clients.transport.ElasticsearchTransport;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;

public class ElasticsearchExample {
    public static void main(String[] args) throws Exception {
        // Create REST client
        RestClient restClient = RestClient.builder(
            new HttpHost("localhost", 9200)).build();

        // Create transport with Jackson mapper
        ElasticsearchTransport transport = new RestClientTransport(
            restClient, new JacksonJsonpMapper());

        // Create API client
        ElasticsearchClient client = new ElasticsearchClient(transport);

        // Index a document
        Product product = new Product("java-example", "Java Integration", 
            "Using Java client for Elasticsearch", 39.99);

        IndexResponse response = client.index(IndexRequest.of(i -> i
            .index("my-index")
            .id(product.getId())
            .document(product)
        ));

        System.out.println("Indexed document: " + response.result());

        // Search documents
        SearchResponse<Product> search = client.search(SearchRequest.of(s -> s
            .index("my-index")
            .query(q -> q
                .match(t -> t
                    .field("content")
                    .query("java")
                )
            )
        ), Product.class);

        for (Hit<Product> hit : search.hits().hits()) {
            Product p = hit.source();
            System.out.println("Found: " + p.getTitle() + " (Score: " + hit.score() + ")");
        }

        // Close resources
        transport.close();
        restClient.close();
    }
}

class Product {
    private String id;
    private String title;
    private String content;
    private double price;
    
    // Constructors, getters, setters...
}

Advanced Features and Cluster Management

# Create index template
curl -X PUT "localhost:9200/_index_template/logs_template" -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs_policy"
    },
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "level": {
          "type": "keyword"
        },
        "message": {
          "type": "text"
        },
        "service": {
          "type": "keyword"
        }
      }
    }
  }
}'

# Snapshot and restore
curl -X PUT "localhost:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/mount/backups/my_backup"
  }
}'

curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
  "indices": "my-index",
  "ignore_unavailable": true,
  "include_global_state": false
}'

# Restore from snapshot
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "my-index",
  "ignore_unavailable": true,
  "include_global_state": false
}'

# Machine Learning anomaly detection
curl -X PUT "localhost:9200/_ml/anomaly_detectors/my_job" -H 'Content-Type: application/json' -d'
{
  "analysis_config": {
    "bucket_span": "10m",
    "detectors": [
      {
        "function": "mean",
        "field_name": "response_time"
      }
    ]
  },
  "data_description": {
    "time_field": "@timestamp"
  }
}'

# Security configuration with API keys
curl -X POST "localhost:9200/_security/api_key" -u elastic:password -H 'Content-Type: application/json' -d'
{
  "name": "my-api-key",
  "role_descriptors": {
    "my_role": {
      "cluster": ["monitor"],
      "indices": [
        {
          "names": ["my-index"],
          "privileges": ["read", "write"]
        }
      ]
    }
  }
}'

# Watcher alerting
curl -X PUT "localhost:9200/_watcher/watch/error_count_watch" -H 'Content-Type: application/json' -d'
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": ["logs-*"],
        "body": {
          "query": {
            "bool": {
              "filter": [
                {"range": {"@timestamp": {"gte": "now-1m"}}},
                {"term": {"level": "ERROR"}}
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 10
      }
    }
  },
  "actions": {
    "send_email": {
      "email": {
        "to": ["[email protected]"],
        "subject": "High error rate detected",
        "body": "Found {{ctx.payload.hits.total}} errors in the last minute"
      }
    }
  }
}'

Elasticsearch is a powerful and versatile search and analytics platform that excels in handling large-scale data with real-time search capabilities. Its rich feature set, combined with strong ecosystem integration and machine learning capabilities, makes it an excellent choice for modern search, logging, and analytics applications.