GitHub Overview
elastic/elasticsearch
Free and Open Source, Distributed, RESTful Search Engine
Topics
Star History
Database
Elasticsearch + Vector Search
Overview
Elasticsearch, originally known as a distributed search and analytics engine specialized in text search, has added k-NN (k-nearest neighbor) search functionality since version 7.3, enabling vector search capabilities. As it can be integrated into existing Elasticsearch infrastructure, it enables hybrid search combining text and vector search.
Details
Developed by Elastic in 2010, Elasticsearch is now one of the most widely used search engines worldwide. With the addition of vector search capabilities, it can now support machine learning-based applications such as image search, similar document search, and recommendation systems, in addition to traditional text search.
Key features of Elasticsearch vector search:
- Vector search capability within existing Elasticsearch clusters
- Fast similarity search using k-NN algorithm
- HNSW algorithm support (v8.0+)
- Hybrid search combining text and vector search
- Scalable distributed architecture
- Easy integration via RESTful API
- Visualization and monitoring with Kibana
- Rich plugin ecosystem
- Multi-language client libraries
Vector Search Implementation
- Store vector data using dense_vector field type
- Similarity search using script_score or knn queries
- Support for cosine similarity, dot product, L2 distance metrics
- Dimension limit at index time (max 2048 dimensions)
Pros and Cons
Pros
- Leverage existing infrastructure: Low additional cost if already using Elasticsearch
- Hybrid search: Advanced search combining text and vector search
- Mature ecosystem: Rich tools, plugins, and documentation available
- High scalability: Horizontal scaling through distributed architecture
- Enterprise features: Comprehensive security, monitoring, and backup features
- Easy integration: Simple integration with various applications via RESTful API
Cons
- Not vector-specialized: May have inferior performance compared to dedicated vector databases
- Dimension limitations: Limited to maximum 2048 dimensions
- Memory consumption: Memory usage increases with large vector data
- License costs: Advanced features under Elastic License with commercial restrictions
- Learning curve: Need to understand Elasticsearch concepts
Key Links
Example Usage
Installation & Setup
# Run with Docker
docker run -d --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.3
# Install Elasticsearch client (Python)
pip install elasticsearch
# Node.js
npm install @elastic/elasticsearch
Basic Vector Search Operations
from elasticsearch import Elasticsearch
import numpy as np
# Connection
es = Elasticsearch(['http://localhost:9200'])
# Create index with vector field
index_mapping = {
"mappings": {
"properties": {
"title": {"type": "text"},
"content": {"type": "text"},
"embedding": {
"type": "dense_vector",
"dims": 768,
"index": True,
"similarity": "cosine"
}
}
}
}
es.indices.create(index="documents", body=index_mapping)
# Add document
doc = {
"title": "Elasticsearch Vector Search",
"content": "How to implement vector search with Elasticsearch",
"embedding": np.random.rand(768).tolist()
}
es.index(index="documents", body=doc)
# k-NN search
query_vector = np.random.rand(768).tolist()
search_query = {
"knn": {
"field": "embedding",
"query_vector": query_vector,
"k": 10,
"num_candidates": 100
}
}
results = es.search(index="documents", body=search_query)
Hybrid Search
# Combining text and vector search
hybrid_query = {
"query": {
"script_score": {
"query": {
"match": {
"content": "vector search"
}
},
"script": {
"source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
"params": {
"query_vector": query_vector
}
}
}
}
}
results = es.search(index="documents", body=hybrid_query)
Using HNSW Index (v8.0+)
# Index using HNSW algorithm
hnsw_mapping = {
"mappings": {
"properties": {
"embedding": {
"type": "dense_vector",
"dims": 768,
"index": True,
"similarity": "l2_norm",
"index_options": {
"type": "hnsw",
"m": 16,
"ef_construction": 200
}
}
}
}
}
es.indices.create(index="hnsw_index", body=hnsw_mapping)
Batch Processing
from elasticsearch.helpers import bulk
# Bulk index multiple documents
documents = []
for i in range(1000):
doc = {
"_index": "documents",
"_source": {
"title": f"Document {i}",
"content": f"Content of document {i}",
"embedding": np.random.rand(768).tolist()
}
}
documents.append(doc)
bulk(es, documents)
Aggregation and Filtering
# Vector search with filtering
filtered_search = {
"knn": {
"field": "embedding",
"query_vector": query_vector,
"k": 10,
"num_candidates": 100,
"filter": {
"term": {
"category": "technology"
}
}
}
}
results = es.search(index="documents", body=filtered_search)