GitHub Overview
weaviate/weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
Topics
Star History
Overview
Weaviate is an open-source vector database that enables efficient data storage, search, and retrieval using machine learning models and vector representations. It supports semantic search, recommendation, and classification capabilities, providing intuitive data operations through its GraphQL API.
Details
GraphQL Support
One of Weaviate's key features is its comprehensive GraphQL API support:
-
Three Main Functions:
Get
: Data search when the class name is knownExplore
: Fuzzy search when schema and class names are unknownAggregate
: Metadata search and data aggregation
-
Module Extensions: Ability to add GraphQL filters and custom properties (
_additional
)
Knowledge Graph Capabilities
While being a vector database, Weaviate also provides knowledge graph functionality:
- Class-Property Structure: Each data object has an attached vector, enabling complex filtering through GraphQL
- Relationship Management: Express relationships between objects with GraphQL reference resolution support
- Semantic Interpretation: Semantically interprets schemas (ontologies), allowing searches by concepts rather than formal entities
Hybrid Search
Powerful search functionality combining vector search with traditional inverted indexes:
- Filter by scalar values (text, numbers, etc.) simultaneously with vector search
- Utilize both search methods in a single query
- Combine BM25 keyword search with vector search
Docker Support
Weaviate supports Docker for everything from local development to production environments:
services:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:latest
ports:
- 8080:8080
- 50051:50051
environment:
ENABLE_MODULES: 'text2vec-transformers,generative-openai'
Advantages
- Speed: Millisecond-scale 10-NN search on millions of objects
- Flexibility: Automatic vectorization at import or upload of pre-vectorized data
- Production-Ready: Designed for scaling, replication, and security
- Multi-Modal: Supports various data types including text, images, and audio
- Distributed Architecture: High availability through sharding, replication, and RAFT consensus
Disadvantages
- Learning Curve: Requires understanding of GraphQL and vector search concepts
- Resource Consumption: Large datasets require significant memory and storage
- Module Dependencies: Advanced features may require integration with external AI services
Key Links
Code Examples
Basic GraphQL Query
{
Get {
Article(
nearText: {
concepts: ["AI technology"]
distance: 0.6
}
limit: 5
) {
title
content
_additional {
distance
certainty
}
}
}
}
Python Client Usage
import weaviate
# Initialize Weaviate client
client = weaviate.Client("http://localhost:8080")
# Search with nearText
result = client.query.get(
"Article",
["title", "content"]
).with_near_text({
"concepts": ["machine learning", "deep learning"],
"distance": 0.7
}).with_limit(10).do()
print(result)
Hybrid Search Example
# Combine vector search with filtering
where_filter = {
"path": ["category"],
"operator": "Equal",
"valueText": "technology"
}
result = client.query.get(
"Article",
["title", "content", "category"]
).with_near_text({
"concepts": ["AI innovation"]
}).with_where(where_filter).with_limit(5).do()
Docker Compose Setup
version: '3.8'
services:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:latest
restart: on-failure:0
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
ENABLE_MODULES: 'text2vec-transformers,generative-openai'
CLUSTER_HOSTNAME: 'node1'
volumes:
- weaviate_data:/var/lib/weaviate
volumes:
weaviate_data: