Elasticsearch
Distributed RESTful search and analytics engine. Based on Apache Lucene, provides real-time search, log analysis, and business analytics capabilities. Returned to open source with AGPLv3 license in 2024.
Server
Elasticsearch
Overview
Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It provides full-text search, structured search, and analytics capabilities, enabling real-time search and analysis of large volumes of data. With its RESTful API and scalable distributed architecture, it delivers high availability and performance.
Details
Elasticsearch has evolved significantly with its 2024 return to open source under the AGPLv3 license, marking a new chapter in its development. As a mature distributed search platform, it offers comprehensive capabilities including real-time search, near real-time updates, complex query processing, and distributed parallel processing for high throughput. With machine learning integration for anomaly detection, advanced aggregation capabilities, and robust cluster management, Elasticsearch serves as a foundation for modern search and analytics applications across various industries.
Key Features
- Distributed & Scalable: Automatic sharding and replication with dynamic node management
- High-Speed Search & Analytics: Lucene-based high-performance indexing with real-time search capabilities
- Rich Functionality: Full-text search, fuzzy search, geospatial data search, and time-series data processing
- RESTful API: JSON-based API with comprehensive query DSL
- Machine Learning: Built-in anomaly detection and data analysis capabilities
- Cluster Management: Automatic load balancing and failover mechanisms
Pros and Cons
Pros
- Powerful distributed architecture with automatic scaling and high availability
- Real-time search capabilities with near real-time indexing and updates
- Comprehensive search features including complex queries and aggregations
- Strong ecosystem with Logstash, Kibana, and Beats integration
- Machine learning capabilities for advanced analytics and anomaly detection
- Active community support with extensive documentation and resources
- Return to open source licensing improving adoption flexibility
Cons
- High resource consumption requiring careful memory and storage management
- Complex configuration and tuning requirements for optimal performance
- Cluster management complexity increases with scale and distributed deployments
- Learning curve for query DSL and advanced configuration options
- Version compatibility challenges and upgrade complexity in production environments
- Potential vendor lock-in concerns despite open source licensing
Reference Pages
- Elasticsearch Official Website
- Elasticsearch Documentation
- Elasticsearch GitHub Repository
- Elastic Stack Documentation
Code Examples
Setup and Installation
# Ubuntu/Debian installation
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install elasticsearch
# Enable and start service
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch
# CentOS/RHEL installation
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
cat <<EOF | sudo tee /etc/yum.repos.d/elasticsearch.repo
[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md
EOF
sudo yum install --enablerepo=elasticsearch elasticsearch
# Docker deployment
docker run --name es01 \
--net elastic \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-it docker.elastic.co/elasticsearch/elasticsearch:8.12.0
# Docker with persistent data
docker run --name es01 \
--net elastic \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-v es_data:/usr/share/elasticsearch/data \
-it docker.elastic.co/elasticsearch/elasticsearch:8.12.0
Index Creation and Document Management
# Create index with mapping
curl -X PUT "localhost:9200/my-index" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "standard"
},
"content": {
"type": "text",
"analyzer": "standard"
},
"timestamp": {
"type": "date"
},
"location": {
"type": "geo_point"
},
"tags": {
"type": "keyword"
},
"price": {
"type": "double"
}
}
}
}'
# Add single document
curl -X POST "localhost:9200/my-index/_doc/1" -H 'Content-Type: application/json' -d'
{
"title": "Introduction to Elasticsearch",
"content": "Elasticsearch is a powerful search engine",
"timestamp": "2024-01-15T10:00:00",
"location": {
"lat": 40.7589,
"lon": -73.9851
},
"tags": ["search", "elasticsearch", "tutorial"],
"price": 29.99
}'
# Bulk operations
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_index":"my-index","_id":"1"}}
{"title":"Article 1","content":"This is the first article","timestamp":"2024-01-15T10:00:00","tags":["tech","search"]}
{"index":{"_index":"my-index","_id":"2"}}
{"title":"Article 2","content":"This is the second article","timestamp":"2024-01-15T11:00:00","tags":["tutorial","guide"]}
{"index":{"_index":"my-index","_id":"3"}}
{"title":"Article 3","content":"This is the third article","timestamp":"2024-01-15T12:00:00","tags":["advanced","tips"]}
'
# Update document
curl -X POST "localhost:9200/my-index/_update/1" -H 'Content-Type: application/json' -d'
{
"doc": {
"content": "Elasticsearch is a very powerful search engine"
}
}'
# Delete document
curl -X DELETE "localhost:9200/my-index/_doc/1"
# Get document
curl -X GET "localhost:9200/my-index/_doc/1"
Search Query Implementation
# Basic search queries
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
# Text search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"content": "Elasticsearch"
}
}
}'
# Phrase search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match_phrase": {
"content": "powerful search engine"
}
}
}'
# Boolean search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{"match": {"title": "Elasticsearch"}}
],
"filter": [
{"range": {"timestamp": {"gte": "2024-01-01"}}}
],
"must_not": [
{"match": {"content": "deprecated"}}
],
"should": [
{"match": {"tags": "tutorial"}}
]
}
}
}'
# Fuzzy search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"fuzzy": {
"title": {
"value": "Elasticsarch",
"fuzziness": "AUTO"
}
}
}
}'
# Wildcard search
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"wildcard": {
"title": "*search*"
}
}
}'
# Range queries
curl -X GET "localhost:9200/my-index/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"range": {
"price": {
"gte": 10,
"lte": 50
}
}
}
}'
Performance Optimization
# Index settings optimization
curl -X PUT "localhost:9200/my-index/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"number_of_replicas": 1,
"refresh_interval": "30s",
"max_result_window": 50000
}
}'
# Force merge optimization
curl -X POST "localhost:9200/my-index/_forcemerge?max_num_segments=1"
# Clear cache
curl -X POST "localhost:9200/my-index/_cache/clear"
# Index statistics
curl -X GET "localhost:9200/my-index/_stats"
# Node statistics
curl -X GET "localhost:9200/_nodes/stats"
# Cluster health
curl -X GET "localhost:9200/_cluster/health?level=indices&pretty"
Integration and Framework Connectivity
# Python Elasticsearch client
from elasticsearch import Elasticsearch
# Connect to Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
# Index a document
doc = {
'title': 'Python Integration Example',
'content': 'Using Python client for Elasticsearch',
'timestamp': '2024-01-15T10:00:00',
'tags': ['python', 'elasticsearch', 'integration']
}
es.index(index='my-index', id=1, document=doc)
# Search documents
query = {
'query': {
'match': {
'content': 'python'
}
}
}
response = es.search(index='my-index', body=query)
for hit in response['hits']['hits']:
print(f"ID: {hit['_id']}, Score: {hit['_score']}, Title: {hit['_source']['title']}")
# Aggregation example
agg_query = {
'size': 0,
'aggs': {
'tags_count': {
'terms': {
'field': 'tags'
}
},
'avg_price': {
'avg': {
'field': 'price'
}
}
}
}
response = es.search(index='my-index', body=agg_query)
print("Tag counts:", response['aggregations']['tags_count']['buckets'])
print("Average price:", response['aggregations']['avg_price']['value'])
// Node.js Elasticsearch client
const { Client } = require('@elastic/elasticsearch');
const client = new Client({ node: 'http://localhost:9200' });
async function indexDocument() {
const doc = {
title: 'Node.js Integration Example',
content: 'Using Node.js client for Elasticsearch',
timestamp: '2024-01-15T10:00:00',
tags: ['nodejs', 'elasticsearch', 'javascript']
};
const response = await client.index({
index: 'my-index',
id: 1,
body: doc
});
console.log('Document indexed:', response.body);
}
async function searchDocuments() {
const response = await client.search({
index: 'my-index',
body: {
query: {
match: {
content: 'nodejs'
}
},
highlight: {
fields: {
content: {}
}
}
}
});
console.log('Search results:');
response.body.hits.hits.forEach(hit => {
console.log(`ID: ${hit._id}, Title: ${hit._source.title}`);
if (hit.highlight) {
console.log('Highlighted:', hit.highlight.content);
}
});
}
// Execute functions
indexDocument().then(() => searchDocuments());
// Java Elasticsearch client
import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch.core.*;
import co.elastic.clients.json.jackson.JacksonJsonpMapper;
import co.elastic.clients.transport.ElasticsearchTransport;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
public class ElasticsearchExample {
public static void main(String[] args) throws Exception {
// Create REST client
RestClient restClient = RestClient.builder(
new HttpHost("localhost", 9200)).build();
// Create transport with Jackson mapper
ElasticsearchTransport transport = new RestClientTransport(
restClient, new JacksonJsonpMapper());
// Create API client
ElasticsearchClient client = new ElasticsearchClient(transport);
// Index a document
Product product = new Product("java-example", "Java Integration",
"Using Java client for Elasticsearch", 39.99);
IndexResponse response = client.index(IndexRequest.of(i -> i
.index("my-index")
.id(product.getId())
.document(product)
));
System.out.println("Indexed document: " + response.result());
// Search documents
SearchResponse<Product> search = client.search(SearchRequest.of(s -> s
.index("my-index")
.query(q -> q
.match(t -> t
.field("content")
.query("java")
)
)
), Product.class);
for (Hit<Product> hit : search.hits().hits()) {
Product p = hit.source();
System.out.println("Found: " + p.getTitle() + " (Score: " + hit.score() + ")");
}
// Close resources
transport.close();
restClient.close();
}
}
class Product {
private String id;
private String title;
private String content;
private double price;
// Constructors, getters, setters...
}
Advanced Features and Cluster Management
# Create index template
curl -X PUT "localhost:9200/_index_template/logs_template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"index.lifecycle.name": "logs_policy"
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"level": {
"type": "keyword"
},
"message": {
"type": "text"
},
"service": {
"type": "keyword"
}
}
}
}
}'
# Snapshot and restore
curl -X PUT "localhost:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/mount/backups/my_backup"
}
}'
curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
"indices": "my-index",
"ignore_unavailable": true,
"include_global_state": false
}'
# Restore from snapshot
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore" -H 'Content-Type: application/json' -d'
{
"indices": "my-index",
"ignore_unavailable": true,
"include_global_state": false
}'
# Machine Learning anomaly detection
curl -X PUT "localhost:9200/_ml/anomaly_detectors/my_job" -H 'Content-Type: application/json' -d'
{
"analysis_config": {
"bucket_span": "10m",
"detectors": [
{
"function": "mean",
"field_name": "response_time"
}
]
},
"data_description": {
"time_field": "@timestamp"
}
}'
# Security configuration with API keys
curl -X POST "localhost:9200/_security/api_key" -u elastic:password -H 'Content-Type: application/json' -d'
{
"name": "my-api-key",
"role_descriptors": {
"my_role": {
"cluster": ["monitor"],
"indices": [
{
"names": ["my-index"],
"privileges": ["read", "write"]
}
]
}
}
}'
# Watcher alerting
curl -X PUT "localhost:9200/_watcher/watch/error_count_watch" -H 'Content-Type: application/json' -d'
{
"trigger": {
"schedule": {
"interval": "1m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": ["logs-*"],
"body": {
"query": {
"bool": {
"filter": [
{"range": {"@timestamp": {"gte": "now-1m"}}},
{"term": {"level": "ERROR"}}
]
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 10
}
}
},
"actions": {
"send_email": {
"email": {
"to": ["[email protected]"],
"subject": "High error rate detected",
"body": "Found {{ctx.payload.hits.total}} errors in the last minute"
}
}
}
}'
Elasticsearch is a powerful and versatile search and analytics platform that excels in handling large-scale data with real-time search capabilities. Its rich feature set, combined with strong ecosystem integration and machine learning capabilities, makes it an excellent choice for modern search, logging, and analytics applications.