GitHub Overview

valkey-io/valkey

A flexible distributed key-value database that is optimized for caching and other realtime workloads.

Stars22,372
Watchers119
Forks952
Created:March 22, 2024
Language:C
License:Other

Topics

cachedatabasekey-valuekey-value-storenosqlredisvalkeyvalkey-client

Star History

valkey-io/valkey Star History
Data as of: 7/30/2025, 02:37 AM

Database

Valkey + Vector Search

Overview

Valkey is an open-source fork of Redis, developed under the Linux Foundation. Like Redis, it provides fast processing as an in-memory data store, with vector search capabilities available through extension modules. While maintaining Redis compatibility, new features are being added through community-driven development.

Details

Valkey was born in 2024 as a Linux Foundation project following Redis's license change. Forked from Redis 7.2.4, it maintains Redis compatibility while evolving independently. Vector search functionality is provided through a module system similar to Redis, enabling fast in-memory vector search.

Key features of Valkey vector search:

  • Redis-compatible in-memory vector search
  • Flat index and HNSW algorithms
  • Real-time index updates
  • Hybrid search (vector + text + numeric)
  • Multiple distance metrics
  • Horizontal scaling support
  • High availability and replication
  • Real-time notifications via Pub/Sub
  • Transaction support
  • Open source (BSD-3-Clause)

Architecture Features

  • Single-threaded event loop
  • Asynchronous I/O
  • Replication and failover
  • Distributed with cluster mode

Pros and Cons

Pros

  • Fully open source: BSD-3-Clause license allows free commercial use
  • Redis compatible: Existing Redis applications work as-is
  • High performance: Low latency through in-memory processing
  • Community-driven: Transparent development under Linux Foundation
  • Future-proof: Support from major companies (AWS, Google, Oracle, etc.)
  • Flexible licensing: No licensing change concerns

Cons

  • New project: Ecosystem still developing
  • Memory cost: Keeps all data in memory
  • Module compatibility: Some Redis modules awaiting porting
  • Documentation: Less documentation compared to Redis
  • Enterprise support: Limited commercial support options

Key Links

Usage Examples

Setup and Installation

# Run with Docker
docker run -d --name valkey \
  -p 6379:6379 \
  valkey/valkey:latest

# Install vector search module (e.g., VectorSearch compatible module)
docker exec valkey valkey-cli MODULE LOAD /path/to/vector_module.so

Basic Operations with Python

import redis
import numpy as np
import json

# Valkey connection (Redis compatible)
client = redis.Redis(host='localhost', port=6379, decode_responses=True)

# Create vector index
def create_vector_index():
    try:
        client.execute_command(
            'FT.CREATE', 'vector_idx',
            'ON', 'HASH',
            'PREFIX', '1', 'doc:',
            'SCHEMA',
            'title', 'TEXT',
            'content', 'TEXT',
            'embedding', 'VECTOR', 'HNSW', '6',
            'TYPE', 'FLOAT32',
            'DIM', '768',
            'DISTANCE_METRIC', 'COSINE',
            'M', '16',
            'EF_CONSTRUCTION', '200'
        )
        print("Vector index created successfully")
    except Exception as e:
        print(f"Index already exists or error: {e}")

# Insert document
def insert_document(doc_id, title, content, embedding):
    # Convert vector to binary format
    embedding_bytes = np.array(embedding, dtype=np.float32).tobytes()
    
    client.hset(
        f'doc:{doc_id}',
        mapping={
            'title': title,
            'content': content,
            'embedding': embedding_bytes
        }
    )

# Vector search
def vector_search(query_vector, limit=10):
    query_bytes = np.array(query_vector, dtype=np.float32).tobytes()
    
    results = client.execute_command(
        'FT.SEARCH', 'vector_idx',
        f'*=>[KNN {limit} @embedding $vec AS score]',
        'PARAMS', '2', 'vec', query_bytes,
        'SORTBY', 'score',
        'RETURN', '3', 'title', 'content', 'score',
        'DIALECT', '2'
    )
    
    # Parse results
    documents = []
    if results[0] > 0:
        for i in range(1, len(results), 2):
            doc_id = results[i]
            fields = results[i + 1]
            doc = {
                'id': doc_id,
                'title': fields[fields.index('title') + 1],
                'content': fields[fields.index('content') + 1],
                'score': float(fields[fields.index('score') + 1])
            }
            documents.append(doc)
    
    return documents

# Usage example
create_vector_index()

# Insert sample data
embedding = np.random.rand(768).astype(np.float32)
insert_document(
    '1',
    'Valkey Vector Search',
    'Open source in-memory vector search',
    embedding
)

# Execute search
query_embedding = np.random.rand(768).astype(np.float32)
results = vector_search(query_embedding)

for doc in results:
    print(f"Title: {doc['title']}, Score: {doc['score']:.4f}")

Hybrid Search and Filtering

# Combined text and vector search
def hybrid_search(text_query, query_vector, category=None, limit=10):
    query_bytes = np.array(query_vector, dtype=np.float32).tobytes()
    
    # Build filter conditions
    filter_clause = f"@content:{text_query}"
    if category:
        filter_clause += f" @category:{{{category}}}"
    
    results = client.execute_command(
        'FT.SEARCH', 'vector_idx',
        f'({filter_clause})=>[KNN {limit} @embedding $vec AS score]',
        'PARAMS', '2', 'vec', query_bytes,
        'SORTBY', 'score',
        'RETURN', '4', 'title', 'content', 'category', 'score',
        'DIALECT', '2'
    )
    
    return parse_search_results(results)

# Real-time updates and Pub/Sub
def setup_realtime_notifications():
    pubsub = client.pubsub()
    pubsub.subscribe('vector_updates')
    
    def handle_updates():
        for message in pubsub.listen():
            if message['type'] == 'message':
                data = json.loads(message['data'])
                print(f"Vector updated: {data['doc_id']}")
                # Update index as needed
    
    return handle_updates

# Batch processing
def batch_insert_vectors(documents):
    pipeline = client.pipeline()
    
    for doc in documents:
        doc_id = doc['id']
        embedding_bytes = np.array(doc['embedding'], dtype=np.float32).tobytes()
        
        pipeline.hset(
            f'doc:{doc_id}',
            mapping={
                'title': doc['title'],
                'content': doc['content'],
                'embedding': embedding_bytes,
                'category': doc.get('category', 'general')
            }
        )
    
    # Execute batch
    pipeline.execute()

Performance Optimization

# Memory usage optimization
def optimize_memory_usage():
    # Check memory usage
    info = client.info('memory')
    print(f"Used memory: {info['used_memory_human']}")
    
    # Remove unnecessary keys
    client.execute_command('MEMORY', 'DOCTOR')
    
    # Set eviction policy
    client.config_set('maxmemory-policy', 'allkeys-lru')
    client.config_set('maxmemory', '4gb')

# Rebuild index
def rebuild_index():
    # Drop existing index
    try:
        client.execute_command('FT.DROPINDEX', 'vector_idx', 'DD')
    except:
        pass
    
    # Create new index with optimized parameters
    create_vector_index()
    
    # Reindex existing data
    cursor = '0'
    while cursor != 0:
        cursor, keys = client.scan(cursor, match='doc:*', count=100)
        # Process each document

# Cluster configuration (enhanced in future versions)
def setup_cluster_config():
    # Replication settings
    client.execute_command('REPLICAOF', 'master-host', '6379')
    
    # Persistence settings
    client.config_set('save', '900 1 300 10 60 10000')
    client.config_set('appendonly', 'yes')