GitHub Overview

mongodb/mongo

The MongoDB Database

Stars27,365
Watchers1,255
Forks5,657
Created:January 15, 2009
Language:C++
License:Other

Topics

c-plus-plusdatabasemongodbnosql

Star History

mongodb/mongo Star History
Data as of: 7/30/2025, 02:37 AM

Database

MongoDB Atlas Vector Search

Overview

MongoDB Atlas Vector Search is a vector search capability integrated into MongoDB's popular document-oriented NoSQL database. Offered as part of MongoDB Atlas (cloud managed service), it enables vector search within existing MongoDB infrastructure. By combining the flexibility of document databases with vector search, it facilitates building RAG applications and recommendation systems.

Details

MongoDB was developed by 10gen (now MongoDB Inc.) in 2009 and has become one of the most popular NoSQL databases. MongoDB Atlas Vector Search was introduced in 2023, enabling storage and search of vector embeddings. It leverages an Apache Lucene-based search engine to provide advanced similarity search capabilities.

Key features of MongoDB Atlas Vector Search:

  • Integration of document database and vector search
  • Approximate Nearest Neighbor (ANN) algorithm
  • Support for high-dimensional vectors (up to 4096 dimensions)
  • Similarity calculations using Euclidean distance, cosine similarity, and dot product
  • Pre-filtering and post-filtering
  • Automatic index management
  • Advanced search features through Atlas Search integration
  • Vector search across globally distributed clusters
  • Multi-tenant support
  • Enterprise-grade security

Architecture Features

  • Apache Lucene-based search engine
  • Distributed replica set architecture
  • Automatic sharding and load balancing
  • Real-time data synchronization

Pros and Cons

Pros

  • Integrated solution: Manage structured, unstructured, and vector data on one platform
  • Ease of development: Leverage existing MongoDB APIs and toolchain
  • Flexible data model: Combination of document flexibility and vector search
  • Managed service: No infrastructure management required
  • Global scale: Easy deployment across regions worldwide
  • Enterprise features: Comprehensive security, auditing, and compliance capabilities
  • Rich ecosystem: Integration with many frameworks and tools

Cons

  • Cost: High operational costs due to managed service
  • Vendor lock-in: Atlas-only features make migration difficult
  • Performance: May lag behind dedicated vector databases
  • Feature limitations: Limited features compared to vector-specific databases
  • Learning curve: Requires knowledge of both MongoDB and vector search

Key Links

Usage Examples

Setup and Index Creation

// MongoDB connection (Node.js)
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb+srv://username:[email protected]');

// Vector search index definition
const vectorSearchIndex = {
  name: "vector_index",
  type: "vectorSearch",
  definition: {
    fields: [
      {
        type: "vector",
        path: "embedding",
        numDimensions: 768,
        similarity: "cosine"
      },
      {
        type: "filter",
        path: "category"
      },
      {
        type: "filter", 
        path: "metadata.year"
      }
    ]
  }
};

// Create index (execute via Atlas UI or API)

Document Insertion

async function insertDocuments() {
  const db = client.db('vectordb');
  const collection = db.collection('documents');
  
  // Insert documents
  const documents = [
    {
      title: "MongoDB Vector Search",
      content: "How to implement vector search with MongoDB",
      embedding: Array(768).fill(0).map(() => Math.random()),
      category: "database",
      metadata: {
        year: 2024,
        author: "MongoDB Team"
      }
    }
  ];
  
  await collection.insertMany(documents);
}

Vector Search Execution

async function vectorSearch(queryVector) {
  const db = client.db('vectordb');
  const collection = db.collection('documents');
  
  // Vector search pipeline
  const pipeline = [
    {
      $vectorSearch: {
        index: "vector_index",
        path: "embedding",
        queryVector: queryVector,
        numCandidates: 100,
        limit: 10,
        filter: {
          category: "database"
        }
      }
    },
    {
      $project: {
        title: 1,
        content: 1,
        score: { $meta: "vectorSearchScore" }
      }
    }
  ];
  
  const results = await collection.aggregate(pipeline).toArray();
  return results;
}

Python Implementation

from pymongo import MongoClient
import numpy as np
from datetime import datetime

# Connection
client = MongoClient("mongodb+srv://username:[email protected]")
db = client.vectordb
collection = db.documents

# Insert documents
documents = [
    {
        "title": "MongoDB Atlas Vector Search",
        "content": "Enabling large-scale vector search",
        "embedding": np.random.rand(768).tolist(),
        "category": "database",
        "metadata": {
            "year": 2024,
            "tags": ["nosql", "vector", "search"]
        },
        "created_at": datetime.now()
    }
]

collection.insert_many(documents)

# Vector search
def vector_search(query_vector, filters=None):
    pipeline = [
        {
            "$vectorSearch": {
                "index": "vector_index",
                "path": "embedding",
                "queryVector": query_vector,
                "numCandidates": 200,
                "limit": 10
            }
        }
    ]
    
    # Add filters
    if filters:
        pipeline[0]["$vectorSearch"]["filter"] = filters
    
    # Projection
    pipeline.append({
        "$project": {
            "title": 1,
            "content": 1,
            "category": 1,
            "score": {"$meta": "vectorSearchScore"}
        }
    })
    
    results = list(collection.aggregate(pipeline))
    return results

# Execute search
query_embedding = np.random.rand(768).tolist()
results = vector_search(
    query_embedding,
    filters={"category": "database", "metadata.year": {"$gte": 2023}}
)

Hybrid Search

# Combined text and vector search
def hybrid_search(text_query, query_vector):
    pipeline = [
        {
            "$vectorSearch": {
                "index": "vector_index",
                "path": "embedding",
                "queryVector": query_vector,
                "numCandidates": 100,
                "limit": 20
            }
        },
        {
            "$match": {
                "$text": {"$search": text_query}
            }
        },
        {
            "$project": {
                "title": 1,
                "content": 1,
                "vectorScore": {"$meta": "vectorSearchScore"},
                "textScore": {"$meta": "textScore"},
                "combinedScore": {
                    "$add": [
                        {"$multiply": [{"$meta": "vectorSearchScore"}, 0.7]},
                        {"$multiply": [{"$meta": "textScore"}, 0.3]}
                    ]
                }
            }
        },
        {
            "$sort": {"combinedScore": -1}
        },
        {
            "$limit": 10
        }
    ]
    
    return list(collection.aggregate(pipeline))

Batch Processing and Optimization

# Batch embedding processing
def batch_embed_and_insert(texts, embeddings):
    documents = []
    for i, (text, embedding) in enumerate(zip(texts, embeddings)):
        doc = {
            "content": text,
            "embedding": embedding.tolist(),
            "metadata": {
                "batch_id": datetime.now().isoformat(),
                "index": i
            }
        }
        documents.append(doc)
    
    # Bulk insert
    if documents:
        collection.insert_many(documents, ordered=False)

# Get index statistics
def get_index_stats():
    stats = db.command("collStats", "documents", indexDetails=True)
    return stats