GitHub Overview

elastic/elasticsearch

Free and Open Source, Distributed, RESTful Search Engine

Stars73,370
Watchers2,665
Forks25,351
Created:February 8, 2010
Language:Java
License:Other

Topics

elasticsearchjavasearch-engine

Star History

elastic/elasticsearch Star History
Data as of: 7/30/2025, 02:37 AM

Database

Elasticsearch + Vector Search

Overview

Elasticsearch, originally known as a distributed search and analytics engine specialized in text search, has added k-NN (k-nearest neighbor) search functionality since version 7.3, enabling vector search capabilities. As it can be integrated into existing Elasticsearch infrastructure, it enables hybrid search combining text and vector search.

Details

Developed by Elastic in 2010, Elasticsearch is now one of the most widely used search engines worldwide. With the addition of vector search capabilities, it can now support machine learning-based applications such as image search, similar document search, and recommendation systems, in addition to traditional text search.

Key features of Elasticsearch vector search:

  • Vector search capability within existing Elasticsearch clusters
  • Fast similarity search using k-NN algorithm
  • HNSW algorithm support (v8.0+)
  • Hybrid search combining text and vector search
  • Scalable distributed architecture
  • Easy integration via RESTful API
  • Visualization and monitoring with Kibana
  • Rich plugin ecosystem
  • Multi-language client libraries

Vector Search Implementation

  • Store vector data using dense_vector field type
  • Similarity search using script_score or knn queries
  • Support for cosine similarity, dot product, L2 distance metrics
  • Dimension limit at index time (max 2048 dimensions)

Pros and Cons

Pros

  • Leverage existing infrastructure: Low additional cost if already using Elasticsearch
  • Hybrid search: Advanced search combining text and vector search
  • Mature ecosystem: Rich tools, plugins, and documentation available
  • High scalability: Horizontal scaling through distributed architecture
  • Enterprise features: Comprehensive security, monitoring, and backup features
  • Easy integration: Simple integration with various applications via RESTful API

Cons

  • Not vector-specialized: May have inferior performance compared to dedicated vector databases
  • Dimension limitations: Limited to maximum 2048 dimensions
  • Memory consumption: Memory usage increases with large vector data
  • License costs: Advanced features under Elastic License with commercial restrictions
  • Learning curve: Need to understand Elasticsearch concepts

Key Links

Example Usage

Installation & Setup

# Run with Docker
docker run -d --name elasticsearch \
  -p 9200:9200 -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.11.3

# Install Elasticsearch client (Python)
pip install elasticsearch

# Node.js
npm install @elastic/elasticsearch

Basic Vector Search Operations

from elasticsearch import Elasticsearch
import numpy as np

# Connection
es = Elasticsearch(['http://localhost:9200'])

# Create index with vector field
index_mapping = {
    "mappings": {
        "properties": {
            "title": {"type": "text"},
            "content": {"type": "text"},
            "embedding": {
                "type": "dense_vector",
                "dims": 768,
                "index": True,
                "similarity": "cosine"
            }
        }
    }
}

es.indices.create(index="documents", body=index_mapping)

# Add document
doc = {
    "title": "Elasticsearch Vector Search",
    "content": "How to implement vector search with Elasticsearch",
    "embedding": np.random.rand(768).tolist()
}

es.index(index="documents", body=doc)

# k-NN search
query_vector = np.random.rand(768).tolist()

search_query = {
    "knn": {
        "field": "embedding",
        "query_vector": query_vector,
        "k": 10,
        "num_candidates": 100
    }
}

results = es.search(index="documents", body=search_query)

Hybrid Search

# Combining text and vector search
hybrid_query = {
    "query": {
        "script_score": {
            "query": {
                "match": {
                    "content": "vector search"
                }
            },
            "script": {
                "source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
                "params": {
                    "query_vector": query_vector
                }
            }
        }
    }
}

results = es.search(index="documents", body=hybrid_query)

Using HNSW Index (v8.0+)

# Index using HNSW algorithm
hnsw_mapping = {
    "mappings": {
        "properties": {
            "embedding": {
                "type": "dense_vector",
                "dims": 768,
                "index": True,
                "similarity": "l2_norm",
                "index_options": {
                    "type": "hnsw",
                    "m": 16,
                    "ef_construction": 200
                }
            }
        }
    }
}

es.indices.create(index="hnsw_index", body=hnsw_mapping)

Batch Processing

from elasticsearch.helpers import bulk

# Bulk index multiple documents
documents = []
for i in range(1000):
    doc = {
        "_index": "documents",
        "_source": {
            "title": f"Document {i}",
            "content": f"Content of document {i}",
            "embedding": np.random.rand(768).tolist()
        }
    }
    documents.append(doc)

bulk(es, documents)

Aggregation and Filtering

# Vector search with filtering
filtered_search = {
    "knn": {
        "field": "embedding",
        "query_vector": query_vector,
        "k": 10,
        "num_candidates": 100,
        "filter": {
            "term": {
                "category": "technology"
            }
        }
    }
}

results = es.search(index="documents", body=filtered_search)