Database
Elasticsearch
Overview
Elasticsearch is a distributed, RESTful search and analytics engine. Built on Apache Lucene, it provides real-time full-text search, structured data search, and analytics capabilities. It stores documents in JSON format and provides access through HTTP APIs, making it easy to implement search functionality in diverse applications.
Details
Elasticsearch was developed by Shay Banon in 2010 and is currently developed and maintained by Elastic N.V. It excels in horizontal scaling and can form clusters with multiple nodes to process petabyte-scale data. As the core component of the ELK stack (Elasticsearch, Logstash, Kibana), it is widely used for log analysis and metrics monitoring.
Key features of Elasticsearch:
- Distributed architecture
- Real-time search and analytics
- RESTful HTTP API
- Schema-free (JSON)
- Powerful Query DSL
- Faceted search and aggregation capabilities
- Geospatial search
- Machine learning capabilities
- High availability and automatic failover
- Horizontal scaling
Advantages and Disadvantages
Advantages
- Fast search: High-speed full-text search through inverted indexes
- Scalability: Horizontal scaling for handling large amounts of data
- Real-time: Near real-time search and analytics
- Rich API: Easy access through RESTful APIs
- Flexibility: Schema-less with dynamic field addition
- Analytics: Powerful aggregation capabilities
- Ecosystem: Integration with Kibana and Logstash
Disadvantages
- Resource consumption: High memory and disk usage
- Complexity: Complex configuration and tuning
- Consistency: Eventual consistency (limited ACID properties)
- Backup: Difficult to backup large volumes of data
- Licensing: Some features are commercial (Elastic License)
Key Links
Code Examples
Installation & Setup
# Run with Docker (recommended)
docker run -d --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0
# Ubuntu/Debian
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update && sudo apt install elasticsearch
# Red Hat/CentOS
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
cat > /etc/yum.repos.d/elasticsearch.repo << EOF
[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md
EOF
sudo yum install --enablerepo=elasticsearch elasticsearch
# macOS (Homebrew)
brew tap elastic/tap
brew install elastic/tap/elasticsearch-full
# Start service
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch
# Verify installation
curl -X GET "localhost:9200/"
Basic Operations (CRUD)
# Create index
curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"name": { "type": "text", "analyzer": "standard" },
"description": { "type": "text" },
"price": { "type": "double" },
"category": { "type": "keyword" },
"created_at": { "type": "date" },
"tags": { "type": "keyword" },
"location": { "type": "geo_point" }
}
}
}'
# Create document (Create)
curl -X POST "localhost:9200/products/_doc/1" -H 'Content-Type: application/json' -d'
{
"name": "Wireless Earphones",
"description": "High-quality Bluetooth earphones",
"price": 89.00,
"category": "Electronics",
"created_at": "2024-01-15T10:30:00Z",
"tags": ["audio", "wireless", "bluetooth"],
"location": { "lat": 40.7128, "lon": -74.0060 }
}'
# Auto-generate ID for document
curl -X POST "localhost:9200/products/_doc" -H 'Content-Type: application/json' -d'
{
"name": "Smartphone",
"description": "Latest 5G smartphone",
"price": 890.00,
"category": "Electronics",
"created_at": "2024-01-16T14:20:00Z",
"tags": ["smartphone", "5g", "mobile"]
}'
# Read document (Read)
curl -X GET "localhost:9200/products/_doc/1"
# Update document (Update)
curl -X POST "localhost:9200/products/_update/1" -H 'Content-Type: application/json' -d'
{
"doc": {
"price": 79.00,
"tags": ["audio", "wireless", "bluetooth", "sale"]
}
}'
# Script-based update
curl -X POST "localhost:9200/products/_update/1" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.price = ctx._source.price * 0.9"
}
}'
# Delete document (Delete)
curl -X DELETE "localhost:9200/products/_doc/1"
Search Queries
# Simple search
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name": "earphones"
}
}
}'
# Complex search (Bool Query)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{ "match": { "name": "smartphone" } }
],
"filter": [
{ "range": { "price": { "gte": 500, "lte": 1000 } } },
{ "term": { "category": "Electronics" } }
],
"must_not": [
{ "term": { "tags": "discontinued" } }
]
}
},
"sort": [
{ "price": { "order": "asc" } },
{ "created_at": { "order": "desc" } }
],
"size": 10,
"from": 0
}'
# Wildcard search
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"wildcard": {
"name": "*phone*"
}
}
}'
# Fuzzy search
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"fuzzy": {
"name": {
"value": "smatphone",
"fuzziness": "AUTO"
}
}
}
}'
# Geospatial search
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"geo_distance": {
"distance": "10km",
"location": {
"lat": 40.7128,
"lon": -74.0060
}
}
}
}'
Index Management
# List indices
curl -X GET "localhost:9200/_cat/indices?v"
# Get index information
curl -X GET "localhost:9200/products"
# Delete index
curl -X DELETE "localhost:9200/products"
# Create index template
curl -X PUT "localhost:9200/_index_template/products_template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["products-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"name": { "type": "text" },
"price": { "type": "double" },
"created_at": { "type": "date" }
}
}
}
}'
# Bulk indexing
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "products", "_id" : "2" } }
{ "name" : "Laptop", "price" : 890.00, "category" : "Computer" }
{ "index" : { "_index" : "products", "_id" : "3" } }
{ "name" : "Mouse", "price" : 29.00, "category" : "Accessories" }
{ "update" : { "_index" : "products", "_id" : "1" } }
{ "doc" : { "price" : 85.00 } }
{ "delete" : { "_index" : "products", "_id" : "4" } }
'
Aggregations
# Category aggregation
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
}
}
}
}'
# Price statistics
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
},
"price_histogram": {
"histogram": {
"field": "price",
"interval": 100
}
}
}
}'
# Date histogram
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "created_at",
"calendar_interval": "1M",
"format": "yyyy-MM"
}
}
}
}'
# Nested aggregations
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "category"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"max_price": {
"max": {
"field": "price"
}
}
}
}
}
}'
Practical Examples
# Full-text search with highlighting
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"multi_match": {
"query": "high quality wireless",
"fields": ["name^2", "description"],
"type": "best_fields"
}
},
"highlight": {
"fields": {
"name": {},
"description": {}
}
}
}'
# Search suggestions
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"suggest": {
"my_suggestion": {
"text": "earph",
"term": {
"field": "name"
}
}
}
}'
# Multi-index search
curl -X GET "localhost:9200/products,users/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
# Scroll search (for large datasets)
curl -X GET "localhost:9200/products/_search?scroll=1m" -H 'Content-Type: application/json' -d'
{
"size": 1000,
"query": {
"match_all": {}
}
}'
# Result filtering
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
},
"_source": ["name", "price", "category"]
}'
Analysis and Mapping
# Create custom analyzer
curl -X PUT "localhost:9200/english_products" -H 'Content-Type: application/json' -d'
{
"settings": {
"analysis": {
"analyzer": {
"english_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"stop",
"snowball"
]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "english_analyzer"
},
"description": {
"type": "text",
"analyzer": "english_analyzer"
}
}
}
}'
# Dynamic mapping templates
curl -X PUT "localhost:9200/dynamic_products" -H 'Content-Type: application/json' -d'
{
"mappings": {
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"match": "*_id",
"mapping": {
"type": "keyword"
}
}
},
{
"strings_as_text": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
]
}
}'
Performance Optimization
# Update index settings
curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"refresh_interval": "30s",
"number_of_replicas": 0
}
}'
# Force merge (optimization)
curl -X POST "localhost:9200/products/_forcemerge?max_num_segments=1"
# Clear cache
curl -X POST "localhost:9200/products/_cache/clear"
# Index statistics
curl -X GET "localhost:9200/products/_stats"
# Node information
curl -X GET "localhost:9200/_nodes/stats"
# Cluster health
curl -X GET "localhost:9200/_cluster/health"
Machine Learning and Pipelines
# Create ingest pipeline
curl -X PUT "localhost:9200/_ingest/pipeline/product_pipeline" -H 'Content-Type: application/json' -d'
{
"description": "product data processing pipeline",
"processors": [
{
"set": {
"field": "processed_at",
"value": "{{_ingest.timestamp}}"
}
},
{
"uppercase": {
"field": "category"
}
},
{
"script": {
"source": "ctx.price_category = ctx.price > 500 ? \"high\" : \"low\""
}
}
]
}'
# Ingest with pipeline
curl -X POST "localhost:9200/products/_doc?pipeline=product_pipeline" -H 'Content-Type: application/json' -d'
{
"name": "Tablet",
"price": 650.00,
"category": "electronics"
}'
Cluster Management
# Check shard allocation
curl -X GET "localhost:9200/_cat/shards/products?v"
# List nodes
curl -X GET "localhost:9200/_cat/nodes?v"
# Reroute index
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"move": {
"index": "products",
"shard": 0,
"from_node": "node1",
"to_node": "node2"
}
}
]
}'
# Create snapshot
curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true"
# Restore snapshot
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore"