MongoDB

Document-oriented NoSQL database. Stores data in JSON-like document format. Features scalability and developer-friendly design.

Database ServerNoSQL DatabaseDocument-OrientedDistributed SystemScalableBig DataReal-time AnalyticsCloud Native

Database Server

MongoDB

Overview

MongoDB is a document-oriented NoSQL database developed by 10gen (now MongoDB Inc.) in 2009. Storing data in JSON-like BSON format, it provides flexible schema design, automatic sharding, and high availability through replica sets, making it an ideal database system optimized for modern application development. MongoDB 8.0 (released in 2024) delivers significant evolution with 25% performance improvements, enhanced Queryable Encryption, background compression functionality, and 60% faster time series data processing. Combined with MongoDB Atlas cloud services, it enables global-scale automatic scaling, security enhancements, and multi-cloud support, widely adopted from startups to large enterprises. Through developer-friendly APIs and a rich ecosystem, it supports rapid application development and operations.

Details

MongoDB 8.0 has evolved as a comprehensive data platform that breaks through the limitations of traditional NoSQL databases for the 2025 edition. The latest version significantly enhances real-time analytics, machine learning integration, Queryable Encryption, time series data optimization, and AI-ready vector search capabilities. MongoDB Atlas delivers 50% faster automatic scaling, 5x faster real-time resource responsiveness, and unified full-text and vector search through Atlas Search, providing all necessary features for modern data-driven applications. Additionally, Relational Migrator supports migration from existing RDBMSs, Atlas Data Federation enables data lake integration, and comprehensive operational tools including Charts, Compass, and Ops Manager address enterprise-level requirements. ACID-compliant multi-document transactions, unlimited scalability through distributed architecture, and real-time data change monitoring via Change Streams function as the foundation for mission-critical systems.

Key Features

  • Document-Oriented: Intuitive and flexible data modeling through JSON-like BSON
  • Automatic Sharding: Automatic data distribution and query load balancing
  • High Availability: Automatic failover and data redundancy through replica sets
  • ACID Compliance: Strong consistency through multi-document transactions
  • Atlas Cloud: Zero operational overhead and global deployment through managed services
  • Real-time Analytics: High-speed data processing through aggregation pipelines and Time Series

Pros and Cons

Pros

  • 30-50% faster read/write performance than relational databases with horizontal scalability
  • Flexible schema design supporting agile development and microservices
  • Automatic scaling, backup, and security operations through Atlas Cloud
  • Rich programming language drivers and framework integrations
  • Real-time data change detection and distribution through Change Streams
  • Built-in support for geospatial data, full-text search, and time series data

Cons

  • RDBMSs are superior for complex JOIN operations and transaction processing
  • Higher memory usage than RDBMSs and increased storage costs
  • Learning costs for operations and debugging due to distributed system complexity
  • Potential data consistency risks from schema-less design
  • MongoDB Atlas licensing costs and cloud provider dependency
  • Performance overhead and design trade-offs for ACID guarantees

Reference Pages

Code Examples

Installation and Basic Setup

# MongoDB Community Edition installation on Ubuntu/Debian
# Import MongoDB official GPG key
wget -qO - https://www.mongodb.org/static/pgp/server-8.0.asc | sudo apt-key add -

# Add MongoDB repository
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu $(lsb_release -cs)/mongodb-org/8.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-8.0.list

# Update package database and install MongoDB
sudo apt-get update
sudo apt-get install -y mongodb-org

# Start and enable MongoDB service
sudo systemctl enable mongod
sudo systemctl start mongod
sudo systemctl status mongod

# MongoDB environment setup using Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  mongodb:
    image: mongo:8.0
    container_name: mongodb
    restart: unless-stopped
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: password123
      MONGO_INITDB_DATABASE: myapp
    ports:
      - "27017:27017"
    volumes:
      - mongodb_data:/data/db
      - mongodb_config:/data/configdb
      - ./mongod.conf:/etc/mongod.conf
    command: --config /etc/mongod.conf

  mongo-express:
    image: mongo-express:latest
    container_name: mongo-express
    restart: unless-stopped
    ports:
      - "8081:8081"
    environment:
      ME_CONFIG_MONGODB_ADMINUSERNAME: admin
      ME_CONFIG_MONGODB_ADMINPASSWORD: password123
      ME_CONFIG_MONGODB_URL: mongodb://admin:password123@mongodb:27017/
      ME_CONFIG_BASICAUTH: false
    depends_on:
      - mongodb

volumes:
  mongodb_data:
  mongodb_config:
EOF

# Start services
docker-compose up -d

# Verify operation
curl http://localhost:8081

# MongoDB configuration optimization
cat > mongod.conf << 'EOF'
# Network settings
net:
  port: 27017
  bindIp: 127.0.0.1

# Storage settings
storage:
  dbPath: /data/db
  journal:
    enabled: true
  wiredTiger:
    engineConfig:
      cacheSizeGB: 2
    collectionConfig:
      blockCompressor: snappy
    indexConfig:
      prefixCompression: true

# System logging
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# Process management
processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid

# Security settings
security:
  authorization: enabled

# Replication settings
replication:
  replSetName: rs0

# Sharding settings
# sharding:
#   clusterRole: shardsvr
EOF

Basic Database Operations and CRUD

// MongoDB Shell connection
mongosh "mongodb://localhost:27017"

// Database and collection creation
use ecommerce
db.createCollection("products")

// Document insertion (single)
db.products.insertOne({
  name: "MacBook Pro 16-inch",
  description: "High-performance laptop with Apple M3 Max chip",
  price: 2499.99,
  category: "laptops",
  brand: "Apple",
  specifications: {
    processor: "Apple M3 Max",
    memory: "32GB",
    storage: "1TB SSD",
    display: "16-inch Retina"
  },
  tags: ["professional", "creative", "developer"],
  inStock: true,
  quantity: 25,
  ratings: {
    average: 4.8,
    count: 1247
  },
  createdAt: new Date(),
  updatedAt: new Date()
})

// Bulk document insertion
db.products.insertMany([
  {
    name: "Dell XPS 13",
    description: "Ultra-portable laptop for business professionals",
    price: 1299.99,
    category: "laptops",
    brand: "Dell",
    specifications: {
      processor: "Intel Core i7",
      memory: "16GB",
      storage: "512GB SSD",
      display: "13.3-inch FHD+"
    },
    tags: ["business", "portable", "productivity"],
    inStock: true,
    quantity: 18,
    ratings: { average: 4.5, count: 832 },
    createdAt: new Date(),
    updatedAt: new Date()
  },
  {
    name: "iPhone 15 Pro",
    description: "Latest iPhone with A17 Pro chip",
    price: 999.99,
    category: "smartphones",
    brand: "Apple",
    specifications: {
      processor: "A17 Pro",
      memory: "8GB",
      storage: "256GB",
      display: "6.1-inch Super Retina XDR"
    },
    tags: ["flagship", "5g", "camera"],
    inStock: true,
    quantity: 42,
    ratings: { average: 4.9, count: 2156 },
    createdAt: new Date(),
    updatedAt: new Date()
  }
])

// Basic queries
// Find all products
db.products.find()

// Find by category
db.products.find({ category: "laptops" })

// Find with price range
db.products.find({
  price: { $gte: 1000, $lte: 2000 }
})

// Complex query with multiple conditions
db.products.find({
  $and: [
    { category: "laptops" },
    { "specifications.memory": "16GB" },
    { inStock: true },
    { quantity: { $gt: 10 } }
  ]
})

// Text search
db.products.createIndex({ name: "text", description: "text" })
db.products.find({ $text: { $search: "laptop professional" } })

// Document update operations
// Update single document
db.products.updateOne(
  { name: "MacBook Pro 16-inch" },
  {
    $set: {
      price: 2399.99,
      updatedAt: new Date()
    },
    $inc: { quantity: -1 }
  }
)

// Update multiple documents
db.products.updateMany(
  { category: "laptops" },
  {
    $set: { "specifications.warranty": "2 years" },
    $currentDate: { updatedAt: true }
  }
)

// Upsert operation
db.products.updateOne(
  { name: "Surface Pro 9" },
  {
    $set: {
      name: "Surface Pro 9",
      description: "2-in-1 tablet PC",
      price: 1299.99,
      category: "tablets",
      brand: "Microsoft",
      inStock: true,
      quantity: 15,
      createdAt: new Date(),
      updatedAt: new Date()
    }
  },
  { upsert: true }
)

// Document deletion
// Delete single document
db.products.deleteOne({ name: "Old Product" })

// Delete multiple documents
db.products.deleteMany({ 
  $and: [
    { quantity: 0 },
    { inStock: false }
  ]
})

// Index creation and optimization
db.products.createIndex({ category: 1 })
db.products.createIndex({ price: 1 })
db.products.createIndex({ "ratings.average": -1 })
db.products.createIndex({ category: 1, price: 1 }) // Compound index

// Show indexes
db.products.getIndexes()

Advanced Aggregation and Analytics

// Aggregation pipeline examples
// Basic aggregation: Group by category with statistics
db.products.aggregate([
  {
    $group: {
      _id: "$category",
      count: { $sum: 1 },
      averagePrice: { $avg: "$price" },
      maxPrice: { $max: "$price" },
      minPrice: { $min: "$price" },
      totalQuantity: { $sum: "$quantity" }
    }
  },
  { $sort: { averagePrice: -1 } }
])

// Complex aggregation with multiple stages
db.products.aggregate([
  // Stage 1: Filter active products
  {
    $match: {
      inStock: true,
      quantity: { $gt: 0 }
    }
  },
  // Stage 2: Add calculated fields
  {
    $addFields: {
      priceCategory: {
        $switch: {
          branches: [
            { case: { $lt: ["$price", 500] }, then: "Budget" },
            { case: { $lt: ["$price", 1500] }, then: "Mid-range" },
            { case: { $gte: ["$price", 1500] }, then: "Premium" }
          ],
          default: "Unknown"
        }
      },
      stockStatus: {
        $cond: {
          if: { $gte: ["$quantity", 20] },
          then: "High Stock",
          else: "Low Stock"
        }
      }
    }
  },
  // Stage 3: Group by price category and brand
  {
    $group: {
      _id: {
        priceCategory: "$priceCategory",
        brand: "$brand"
      },
      productCount: { $sum: 1 },
      averageRating: { $avg: "$ratings.average" },
      totalValue: { $sum: { $multiply: ["$price", "$quantity"] } },
      products: { $push: "$name" }
    }
  },
  // Stage 4: Sort and format output
  {
    $sort: { "_id.priceCategory": 1, totalValue: -1 }
  },
  // Stage 5: Project final output
  {
    $project: {
      _id: 0,
      priceCategory: "$_id.priceCategory",
      brand: "$_id.brand",
      productCount: 1,
      averageRating: { $round: ["$averageRating", 2] },
      totalValue: { $round: ["$totalValue", 2] },
      topProducts: { $slice: ["$products", 3] }
    }
  }
])

// Faceted search aggregation
db.products.aggregate([
  {
    $facet: {
      "categoryStats": [
        { $group: { _id: "$category", count: { $sum: 1 } } },
        { $sort: { count: -1 } }
      ],
      "brandStats": [
        { $group: { _id: "$brand", avgPrice: { $avg: "$price" } } },
        { $sort: { avgPrice: -1 } }
      ],
      "priceRanges": [
        {
          $bucket: {
            groupBy: "$price",
            boundaries: [0, 500, 1000, 1500, 2000, 5000],
            default: "Other",
            output: {
              count: { $sum: 1 },
              avgRating: { $avg: "$ratings.average" }
            }
          }
        }
      ],
      "topRated": [
        { $match: { "ratings.count": { $gte: 100 } } },
        { $sort: { "ratings.average": -1 } },
        { $limit: 5 },
        { $project: { name: 1, "ratings.average": 1, price: 1 } }
      ]
    }
  }
])

// Time-based analytics with date aggregation
db.sales.aggregate([
  {
    $match: {
      orderDate: {
        $gte: new Date("2024-01-01"),
        $lt: new Date("2024-12-31")
      }
    }
  },
  {
    $group: {
      _id: {
        year: { $year: "$orderDate" },
        month: { $month: "$orderDate" },
        day: { $dayOfMonth: "$orderDate" }
      },
      dailySales: { $sum: "$totalAmount" },
      orderCount: { $sum: 1 },
      averageOrderValue: { $avg: "$totalAmount" }
    }
  },
  {
    $sort: { "_id.year": 1, "_id.month": 1, "_id.day": 1 }
  },
  {
    $group: {
      _id: {
        year: "$_id.year",
        month: "$_id.month"
      },
      monthlySales: { $sum: "$dailySales" },
      monthlyOrders: { $sum: "$orderCount" },
      dailyAverages: { $avg: "$dailySales" },
      dailyData: {
        $push: {
          day: "$_id.day",
          sales: "$dailySales",
          orders: "$orderCount"
        }
      }
    }
  }
])

// Lookup (JOIN) operations with related collections
db.orders.aggregate([
  // Join with customers
  {
    $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customer"
    }
  },
  // Join with products
  {
    $lookup: {
      from: "products",
      localField: "items.productId",
      foreignField: "_id",
      as: "productDetails"
    }
  },
  // Unwind and calculate
  { $unwind: "$customer" },
  {
    $addFields: {
      customerName: "$customer.name",
      customerEmail: "$customer.email",
      totalValue: { $sum: "$items.subtotal" }
    }
  },
  // Group by customer
  {
    $group: {
      _id: "$customerId",
      customerName: { $first: "$customerName" },
      customerEmail: { $first: "$customerEmail" },
      totalOrders: { $sum: 1 },
      totalSpent: { $sum: "$totalValue" },
      averageOrderValue: { $avg: "$totalValue" },
      lastOrderDate: { $max: "$orderDate" }
    }
  },
  { $sort: { totalSpent: -1 } }
])

Python Integration and Application Development

import pymongo
from pymongo import MongoClient
from datetime import datetime, timedelta
import json
from bson import ObjectId
from typing import List, Dict, Optional
import logging

class MongoDBManager:
    def __init__(self, connection_string: str = "mongodb://localhost:27017/", 
                 database_name: str = "ecommerce"):
        """MongoDB connection and management class"""
        try:
            self.client = MongoClient(
                connection_string,
                serverSelectionTimeoutMS=5000,
                connectTimeoutMS=10000,
                socketTimeoutMS=10000,
                retryWrites=True,
                w="majority"
            )
            
            # Test connection
            self.client.admin.command('ismaster')
            print("MongoDB connection successful")
            
            self.db = self.client[database_name]
            
        except Exception as e:
            logging.error(f"MongoDB connection failed: {e}")
            raise
    
    def create_collections_with_schema(self):
        """Create collections with schema validation"""
        
        # Products collection schema
        products_schema = {
            "$jsonSchema": {
                "bsonType": "object",
                "required": ["name", "price", "category"],
                "properties": {
                    "name": {
                        "bsonType": "string",
                        "description": "Product name is required and must be a string"
                    },
                    "price": {
                        "bsonType": ["double", "int"],
                        "minimum": 0,
                        "description": "Price must be a positive number"
                    },
                    "category": {
                        "bsonType": "string",
                        "enum": ["laptops", "smartphones", "tablets", "accessories"],
                        "description": "Category must be one of the allowed values"
                    },
                    "inStock": {
                        "bsonType": "bool",
                        "description": "Stock status must be boolean"
                    },
                    "quantity": {
                        "bsonType": "int",
                        "minimum": 0,
                        "description": "Quantity must be a non-negative integer"
                    }
                }
            }
        }
        
        try:
            self.db.create_collection(
                "products",
                validator=products_schema,
                validationLevel="strict",
                validationAction="error"
            )
            print("Products collection created with schema validation")
        except Exception as e:
            print(f"Products collection may already exist: {e}")
    
    def insert_product(self, product_data: Dict) -> str:
        """Insert a single product"""
        try:
            product_data['createdAt'] = datetime.now()
            product_data['updatedAt'] = datetime.now()
            
            result = self.db.products.insert_one(product_data)
            print(f"Product inserted with ID: {result.inserted_id}")
            return str(result.inserted_id)
            
        except Exception as e:
            logging.error(f"Product insertion failed: {e}")
            return None
    
    def bulk_insert_products(self, products: List[Dict]) -> List[str]:
        """Bulk insert multiple products"""
        try:
            for product in products:
                product['createdAt'] = datetime.now()
                product['updatedAt'] = datetime.now()
            
            result = self.db.products.insert_many(products)
            print(f"Bulk insert completed: {len(result.inserted_ids)} products")
            return [str(id) for id in result.inserted_ids]
            
        except Exception as e:
            logging.error(f"Bulk insert failed: {e}")
            return []
    
    def find_products_by_criteria(self, criteria: Dict, limit: int = 10) -> List[Dict]:
        """Find products by specified criteria"""
        try:
            cursor = self.db.products.find(criteria).limit(limit)
            products = list(cursor)
            
            # Convert ObjectId to string for JSON serialization
            for product in products:
                product['_id'] = str(product['_id'])
            
            return products
            
        except Exception as e:
            logging.error(f"Product search failed: {e}")
            return []
    
    def advanced_product_search(self, text_query: str = None, 
                              price_range: Dict = None,
                              category: str = None,
                              sort_by: str = "name",
                              sort_order: int = 1) -> List[Dict]:
        """Advanced product search with multiple filters"""
        
        pipeline = []
        
        # Match stage
        match_conditions = {}
        
        if text_query:
            match_conditions["$text"] = {"$search": text_query}
        
        if price_range:
            match_conditions["price"] = {}
            if "min" in price_range:
                match_conditions["price"]["$gte"] = price_range["min"]
            if "max" in price_range:
                match_conditions["price"]["$lte"] = price_range["max"]
        
        if category:
            match_conditions["category"] = category
        
        if match_conditions:
            pipeline.append({"$match": match_conditions})
        
        # Add fields stage
        pipeline.append({
            "$addFields": {
                "priceCategory": {
                    "$switch": {
                        "branches": [
                            {"case": {"$lt": ["$price", 500]}, "then": "Budget"},
                            {"case": {"$lt": ["$price", 1500]}, "then": "Mid-range"},
                            {"case": {"$gte": ["$price", 1500]}, "then": "Premium"}
                        ],
                        "default": "Unknown"
                    }
                }
            }
        })
        
        # Sort stage
        pipeline.append({"$sort": {sort_by: sort_order}})
        
        # Limit stage
        pipeline.append({"$limit": 20})
        
        try:
            cursor = self.db.products.aggregate(pipeline)
            results = list(cursor)
            
            # Convert ObjectId to string
            for result in results:
                result['_id'] = str(result['_id'])
            
            return results
            
        except Exception as e:
            logging.error(f"Advanced search failed: {e}")
            return []
    
    def get_product_analytics(self) -> Dict:
        """Get comprehensive product analytics"""
        try:
            pipeline = [
                {
                    "$facet": {
                        "totalStats": [
                            {
                                "$group": {
                                    "_id": None,
                                    "totalProducts": {"$sum": 1},
                                    "totalValue": {"$sum": {"$multiply": ["$price", "$quantity"]}},
                                    "averagePrice": {"$avg": "$price"},
                                    "inStockCount": {
                                        "$sum": {"$cond": [{"$eq": ["$inStock", True]}, 1, 0]}
                                    }
                                }
                            }
                        ],
                        "categoryBreakdown": [
                            {
                                "$group": {
                                    "_id": "$category",
                                    "count": {"$sum": 1},
                                    "averagePrice": {"$avg": "$price"},
                                    "totalQuantity": {"$sum": "$quantity"}
                                }
                            },
                            {"$sort": {"count": -1}}
                        ],
                        "brandAnalysis": [
                            {
                                "$group": {
                                    "_id": "$brand",
                                    "productCount": {"$sum": 1},
                                    "averagePrice": {"$avg": "$price"},
                                    "averageRating": {"$avg": "$ratings.average"}
                                }
                            },
                            {"$sort": {"productCount": -1}}
                        ],
                        "priceDistribution": [
                            {
                                "$bucket": {
                                    "groupBy": "$price",
                                    "boundaries": [0, 500, 1000, 1500, 2000, 5000],
                                    "default": "5000+",
                                    "output": {
                                        "count": {"$sum": 1},
                                        "averageRating": {"$avg": "$ratings.average"}
                                    }
                                }
                            }
                        ]
                    }
                }
            ]
            
            result = list(self.db.products.aggregate(pipeline))[0]
            return result
            
        except Exception as e:
            logging.error(f"Analytics query failed: {e}")
            return {}
    
    def update_product_inventory(self, product_id: str, quantity_change: int) -> bool:
        """Update product inventory with atomic operation"""
        try:
            result = self.db.products.update_one(
                {"_id": ObjectId(product_id)},
                {
                    "$inc": {"quantity": quantity_change},
                    "$set": {"updatedAt": datetime.now()}
                }
            )
            
            if result.modified_count > 0:
                # Check if product is out of stock
                product = self.db.products.find_one({"_id": ObjectId(product_id)})
                if product and product["quantity"] <= 0:
                    self.db.products.update_one(
                        {"_id": ObjectId(product_id)},
                        {"$set": {"inStock": False}}
                    )
                    print(f"Product {product_id} marked as out of stock")
                
                return True
            return False
            
        except Exception as e:
            logging.error(f"Inventory update failed: {e}")
            return False
    
    def create_indexes(self):
        """Create performance indexes"""
        try:
            # Text index for search
            self.db.products.create_index([
                ("name", "text"),
                ("description", "text")
            ])
            
            # Compound indexes for common queries
            self.db.products.create_index([("category", 1), ("price", 1)])
            self.db.products.create_index([("brand", 1), ("inStock", 1)])
            self.db.products.create_index([("ratings.average", -1)])
            self.db.products.create_index([("createdAt", -1)])
            
            print("Indexes created successfully")
            
        except Exception as e:
            logging.error(f"Index creation failed: {e}")
    
    def close_connection(self):
        """Close database connection"""
        if self.client:
            self.client.close()
            print("MongoDB connection closed")

# Usage example and testing
def demo_mongodb_operations():
    """Demonstrate MongoDB operations"""
    
    # Initialize MongoDB manager
    db_manager = MongoDBManager()
    
    try:
        # Create collections with schema
        db_manager.create_collections_with_schema()
        
        # Create indexes
        db_manager.create_indexes()
        
        # Sample product data
        sample_products = [
            {
                "name": "MacBook Pro 16-inch M3 Max",
                "description": "High-performance laptop for professionals",
                "price": 2499.99,
                "category": "laptops",
                "brand": "Apple",
                "specifications": {
                    "processor": "Apple M3 Max",
                    "memory": "32GB",
                    "storage": "1TB SSD"
                },
                "inStock": True,
                "quantity": 15,
                "ratings": {"average": 4.8, "count": 1247}
            },
            {
                "name": "iPhone 15 Pro",
                "description": "Latest iPhone with A17 Pro chip",
                "price": 999.99,
                "category": "smartphones",
                "brand": "Apple",
                "specifications": {
                    "processor": "A17 Pro",
                    "memory": "8GB",
                    "storage": "256GB"
                },
                "inStock": True,
                "quantity": 32,
                "ratings": {"average": 4.9, "count": 2156}
            }
        ]
        
        # Insert sample products
        inserted_ids = db_manager.bulk_insert_products(sample_products)
        print(f"Inserted products with IDs: {inserted_ids}")
        
        # Search products
        laptops = db_manager.find_products_by_criteria({"category": "laptops"})
        print(f"Found {len(laptops)} laptops")
        
        # Advanced search
        expensive_products = db_manager.advanced_product_search(
            price_range={"min": 1000},
            sort_by="price",
            sort_order=-1
        )
        print(f"Found {len(expensive_products)} expensive products")
        
        # Get analytics
        analytics = db_manager.get_product_analytics()
        print("Analytics:")
        print(json.dumps(analytics, indent=2, default=str))
        
        # Update inventory
        if inserted_ids:
            success = db_manager.update_product_inventory(inserted_ids[0], -2)
            print(f"Inventory update success: {success}")
        
    except Exception as e:
        print(f"Demo error: {e}")
    
    finally:
        db_manager.close_connection()

if __name__ == "__main__":
    demo_mongodb_operations()

Replica Set Configuration and High Availability

# Replica Set setup for high availability
# Initialize replica set on primary node
mongosh

# Switch to admin database
use admin

# Initialize replica set
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongodb-primary:27017", priority: 2 },
    { _id: 1, host: "mongodb-secondary1:27017", priority: 1 },
    { _id: 2, host: "mongodb-secondary2:27017", priority: 1 }
  ]
})

# Check replica set status
rs.status()

# Add a new member to replica set
rs.add("mongodb-secondary3:27017")

# Remove a member from replica set
rs.remove("mongodb-secondary3:27017")

# Step down primary (force election)
rs.stepDown()

# Docker Compose configuration for replica set
cat > docker-compose-replica.yml << 'EOF'
version: '3.8'
services:
  mongo-primary:
    image: mongo:8.0
    container_name: mongo-primary
    restart: unless-stopped
    ports:
      - "27017:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: password123
    volumes:
      - mongo-primary-data:/data/db
      - ./replica-set-init.js:/docker-entrypoint-initdb.d/replica-set-init.js
    command: mongod --replSet rs0 --bind_ip_all
    networks:
      - mongo-cluster

  mongo-secondary1:
    image: mongo:8.0
    container_name: mongo-secondary1
    restart: unless-stopped
    ports:
      - "27018:27017"
    volumes:
      - mongo-secondary1-data:/data/db
    command: mongod --replSet rs0 --bind_ip_all
    networks:
      - mongo-cluster
    depends_on:
      - mongo-primary

  mongo-secondary2:
    image: mongo:8.0
    container_name: mongo-secondary2
    restart: unless-stopped
    ports:
      - "27019:27017"
    volumes:
      - mongo-secondary2-data:/data/db
    command: mongod --replSet rs0 --bind_ip_all
    networks:
      - mongo-cluster
    depends_on:
      - mongo-primary

  mongo-arbiter:
    image: mongo:8.0
    container_name: mongo-arbiter
    restart: unless-stopped
    ports:
      - "27020:27017"
    command: mongod --replSet rs0 --bind_ip_all
    networks:
      - mongo-cluster
    depends_on:
      - mongo-primary

volumes:
  mongo-primary-data:
  mongo-secondary1-data:
  mongo-secondary2-data:

networks:
  mongo-cluster:
    driver: bridge
EOF

# Replica set initialization script
cat > replica-set-init.js << 'EOF'
// Wait for MongoDB to be ready
sleep(5000);

// Connect to primary
db = db.getSiblingDB('admin');

// Initialize replica set
config = {
    "_id": "rs0",
    "members": [
        { "_id": 0, "host": "mongo-primary:27017", "priority": 2 },
        { "_id": 1, "host": "mongo-secondary1:27017", "priority": 1 },
        { "_id": 2, "host": "mongo-secondary2:27017", "priority": 1 },
        { "_id": 3, "host": "mongo-arbiter:27017", "arbiterOnly": true }
    ]
};

rs.initiate(config);

// Create admin user
db.createUser({
    user: "admin",
    pwd: "password123",
    roles: [
        { role: "root", db: "admin" }
    ]
});

print("Replica set initialization completed");
EOF

# Start replica set cluster
docker-compose -f docker-compose-replica.yml up -d

# Connection string for application
MONGO_URI="mongodb://admin:password123@mongo-primary:27017,mongo-secondary1:27018,mongo-secondary2:27019/myapp?replicaSet=rs0&authSource=admin"

# Read preference configuration
# Primary: All reads from primary
# Secondary: All reads from secondary
# PrimaryPreferred: Primary preferred, fallback to secondary
# SecondaryPreferred: Secondary preferred, fallback to primary
# Nearest: Lowest network latency

Sharding Configuration and Horizontal Scaling

# Sharded cluster setup
# Config servers (replica set)
cat > config-server-docker-compose.yml << 'EOF'
version: '3.8'
services:
  config1:
    image: mongo:8.0
    container_name: config1
    ports:
      - "27019:27017"
    volumes:
      - config1-data:/data/db
    command: mongod --configsvr --replSet configReplSet --bind_ip_all
    networks:
      - mongo-shard

  config2:
    image: mongo:8.0
    container_name: config2
    ports:
      - "27020:27017"
    volumes:
      - config2-data:/data/db
    command: mongod --configsvr --replSet configReplSet --bind_ip_all
    networks:
      - mongo-shard

  config3:
    image: mongo:8.0
    container_name: config3
    ports:
      - "27021:27017"
    volumes:
      - config3-data:/data/db
    command: mongod --configsvr --replSet configReplSet --bind_ip_all
    networks:
      - mongo-shard

volumes:
  config1-data:
  config2-data:
  config3-data:

networks:
  mongo-shard:
    external: true
EOF

# Shard servers
cat > shard-servers-docker-compose.yml << 'EOF'
version: '3.8'
services:
  # Shard 1
  shard1-primary:
    image: mongo:8.0
    container_name: shard1-primary
    ports:
      - "27022:27017"
    volumes:
      - shard1-primary-data:/data/db
    command: mongod --shardsvr --replSet shard1ReplSet --bind_ip_all
    networks:
      - mongo-shard

  shard1-secondary:
    image: mongo:8.0
    container_name: shard1-secondary
    ports:
      - "27023:27017"
    volumes:
      - shard1-secondary-data:/data/db
    command: mongod --shardsvr --replSet shard1ReplSet --bind_ip_all
    networks:
      - mongo-shard

  # Shard 2
  shard2-primary:
    image: mongo:8.0
    container_name: shard2-primary
    ports:
      - "27024:27017"
    volumes:
      - shard2-primary-data:/data/db
    command: mongod --shardsvr --replSet shard2ReplSet --bind_ip_all
    networks:
      - mongo-shard

  shard2-secondary:
    image: mongo:8.0
    container_name: shard2-secondary
    ports:
      - "27025:27017"
    volumes:
      - shard2-secondary-data:/data/db
    command: mongod --shardsvr --replSet shard2ReplSet --bind_ip_all
    networks:
      - mongo-shard

volumes:
  shard1-primary-data:
  shard1-secondary-data:
  shard2-primary-data:
  shard2-secondary-data:

networks:
  mongo-shard:
    external: true
EOF

# Mongos router
cat > mongos-docker-compose.yml << 'EOF'
version: '3.8'
services:
  mongos1:
    image: mongo:8.0
    container_name: mongos1
    ports:
      - "27017:27017"
    command: mongos --configdb configReplSet/config1:27017,config2:27017,config3:27017 --bind_ip_all
    networks:
      - mongo-shard
    depends_on:
      - config1
      - config2
      - config3

  mongos2:
    image: mongo:8.0
    container_name: mongos2
    ports:
      - "27018:27017"
    command: mongos --configdb configReplSet/config1:27017,config2:27017,config3:27017 --bind_ip_all
    networks:
      - mongo-shard
    depends_on:
      - config1
      - config2
      - config3

networks:
  mongo-shard:
    external: true
EOF

# Create network and start cluster
docker network create mongo-shard

# Start components in order
docker-compose -f config-server-docker-compose.yml up -d
sleep 10
docker-compose -f shard-servers-docker-compose.yml up -d
sleep 10
docker-compose -f mongos-docker-compose.yml up -d

# Initialize config server replica set
mongosh --host config1:27019

rs.initiate({
  _id: "configReplSet",
  configsvr: true,
  members: [
    { _id: 0, host: "config1:27017" },
    { _id: 1, host: "config2:27017" },
    { _id: 2, host: "config3:27017" }
  ]
})

# Initialize shard replica sets
# Shard 1
mongosh --host shard1-primary:27022

rs.initiate({
  _id: "shard1ReplSet",
  members: [
    { _id: 0, host: "shard1-primary:27017" },
    { _id: 1, host: "shard1-secondary:27017" }
  ]
})

# Shard 2
mongosh --host shard2-primary:27024

rs.initiate({
  _id: "shard2ReplSet",
  members: [
    { _id: 0, host: "shard2-primary:27017" },
    { _id: 1, host: "shard2-secondary:27017" }
  ]
})

# Connect to mongos and add shards
mongosh --host mongos1:27017

sh.addShard("shard1ReplSet/shard1-primary:27017,shard1-secondary:27017")
sh.addShard("shard2ReplSet/shard2-primary:27017,shard2-secondary:27017")

# Enable sharding for database
sh.enableSharding("ecommerce")

# Shard collection by a shard key
sh.shardCollection("ecommerce.products", { "_id": "hashed" })

# Check sharding status
sh.status()

# Query sharding statistics
db.products.getShardDistribution()

Performance Monitoring and Optimization

# Performance monitoring and profiling
mongosh

# Enable profiling for slow queries (> 100ms)
db.setProfilingLevel(2, { slowms: 100 })

# Check profiling status
db.getProfilingStatus()

# Query profiler collection
db.system.profile.find().limit(5).sort({ ts: -1 }).pretty()

# Analyze slow queries
db.system.profile.find({
  "ts": {
    $gte: new Date(Date.now() - 1000 * 60 * 60) // Last hour
  }
}).sort({ "ts": -1 })

# Index analysis
db.products.getIndexes()

# Explain query execution
db.products.find({ category: "laptops", price: { $gte: 1000 } }).explain("executionStats")

# Monitor current operations
db.currentOp()

# Kill long-running operation
db.killOp(123456)

# Database statistics
db.stats()
db.products.stats()

# Server status
db.serverStatus()

# Connection monitoring
db.serverStatus().connections

# Memory usage monitoring
db.serverStatus().mem

# WiredTiger cache statistics
db.serverStatus().wiredTiger.cache

# Performance monitoring script
cat > mongodb_monitor.py << 'EOF'
#!/usr/bin/env python3
import pymongo
import time
import json
from datetime import datetime

class MongoDBMonitor:
    def __init__(self, connection_string="mongodb://localhost:27017/"):
        self.client = pymongo.MongoClient(connection_string)
        self.db = self.client.admin
    
    def get_server_status(self):
        """Get comprehensive server status"""
        status = self.db.command("serverStatus")
        return {
            "uptime": status["uptime"],
            "connections": status["connections"],
            "memory": status["mem"],
            "opcounters": status["opcounters"],
            "network": status["network"],
            "wiredTiger": {
                "cache": status["wiredTiger"]["cache"] if "wiredTiger" in status else None,
                "concurrentTransactions": status["wiredTiger"]["concurrentTransactions"] if "wiredTiger" in status else None
            }
        }
    
    def get_slow_queries(self, db_name="ecommerce", limit=10):
        """Get recent slow queries from profiler"""
        db = self.client[db_name]
        
        slow_queries = list(db.system.profile.find({
            "ts": {"$gte": datetime.now().replace(hour=0, minute=0, second=0)}
        }).sort("ts", -1).limit(limit))
        
        return slow_queries
    
    def get_index_usage(self, db_name="ecommerce", collection_name="products"):
        """Get index usage statistics"""
        db = self.client[db_name]
        
        # Get index statistics
        stats = db.command("collStats", collection_name, indexDetails=True)
        return stats.get("indexSizes", {})
    
    def monitor_replication(self):
        """Monitor replica set status"""
        try:
            rs_status = self.db.command("replSetGetStatus")
            return {
                "set": rs_status["set"],
                "members": [
                    {
                        "name": member["name"],
                        "state": member["stateStr"],
                        "health": member["health"],
                        "optime": member.get("optime", {})
                    }
                    for member in rs_status["members"]
                ]
            }
        except Exception as e:
            return {"error": str(e)}
    
    def generate_report(self):
        """Generate comprehensive monitoring report"""
        report = {
            "timestamp": datetime.now().isoformat(),
            "server_status": self.get_server_status(),
            "slow_queries": self.get_slow_queries(),
            "index_usage": self.get_index_usage(),
            "replication_status": self.monitor_replication()
        }
        return report

def main():
    monitor = MongoDBMonitor()
    
    while True:
        try:
            report = monitor.generate_report()
            
            print(f"\n=== MongoDB Monitor Report - {report['timestamp']} ===")
            print(f"Uptime: {report['server_status']['uptime']} seconds")
            print(f"Current connections: {report['server_status']['connections']['current']}")
            print(f"Available connections: {report['server_status']['connections']['available']}")
            
            if report['server_status']['wiredTiger']['cache']:
                cache = report['server_status']['wiredTiger']['cache']
                print(f"Cache usage: {cache.get('bytes currently in the cache', 0) / 1024 / 1024:.2f} MB")
            
            print(f"Recent slow queries: {len(report['slow_queries'])}")
            
            if report['replication_status'].get('members'):
                print("Replica set members:")
                for member in report['replication_status']['members']:
                    print(f"  - {member['name']}: {member['state']} (health: {member['health']})")
            
            # Save detailed report to file
            with open(f"mongodb_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json", 'w') as f:
                json.dump(report, f, indent=2, default=str)
            
            time.sleep(60)  # Monitor every minute
            
        except KeyboardInterrupt:
            print("\nStopping MongoDB monitor")
            break
        except Exception as e:
            print(f"Monitoring error: {e}")
            time.sleep(60)

if __name__ == "__main__":
    main()
EOF

chmod +x mongodb_monitor.py

# MongoDB optimization commands
# Compact collection to reclaim space
db.products.compact()

# Re-index collection
db.products.reIndex()

# Analyze collection
db.products.validate()

# Set read concern and write concern
db.products.find().readConcern("majority")

# Connection pooling configuration (for applications)
# In connection string:
# mongodb://localhost:27017/mydb?maxPoolSize=100&minPoolSize=10&maxIdleTimeMS=30000

# Memory optimization (in mongod.conf)
cat >> /etc/mongod.conf << 'EOF'
storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 8  # Set to ~60% of available RAM
    collectionConfig:
      blockCompressor: snappy
    indexConfig:
      prefixCompression: true

setParameter:
  wiredTigerConcurrentReadTransactions: 128
  wiredTigerConcurrentWriteTransactions: 128
EOF

Security and Atlas Cloud Integration

# Security configuration and best practices
# Enable authentication
mongosh

use admin

# Create admin user
db.createUser({
  user: "admin",
  pwd: "SecurePassword123!",
  roles: [
    { role: "userAdminAnyDatabase", db: "admin" },
    { role: "readWriteAnyDatabase", db: "admin" },
    { role: "dbAdminAnyDatabase", db: "admin" }
  ]
})

# Create application-specific user
use ecommerce

db.createUser({
  user: "appuser",
  pwd: "AppPassword123!",
  roles: [
    { role: "readWrite", db: "ecommerce" },
    { role: "dbAdmin", db: "ecommerce" }
  ]
})

# Enable TLS/SSL (in mongod.conf)
cat > /etc/ssl/mongodb/mongod.conf << 'EOF'
net:
  tls:
    mode: requireTLS
    certificateKeyFile: /etc/ssl/mongodb/mongodb.pem
    CAFile: /etc/ssl/mongodb/ca.pem
    allowConnectionsWithoutCertificates: false

security:
  authorization: enabled
  clusterAuthMode: x509
EOF

# Generate self-signed certificates for testing
openssl req -newkey rsa:4096 -nodes -out mongodb.csr -keyout mongodb.key -subj "/CN=localhost"
openssl x509 -signkey mongodb.key -in mongodb.csr -req -days 365 -out mongodb.crt
cat mongodb.key mongodb.crt > mongodb.pem

# Field-level encryption setup
mongosh "mongodb://admin:SecurePassword123!@localhost:27017/ecommerce?authSource=admin"

# Create key vault collection
use encryption
db.createCollection("__keyVault")

# MongoDB Atlas connection examples
# Atlas connection string format:
ATLAS_URI="mongodb+srv://username:[email protected]/myapp?retryWrites=true&w=majority"

# Atlas Data API example (REST API)
cat > atlas_api_example.py << 'EOF'
import requests
import json

class AtlasDataAPI:
    def __init__(self, api_key, app_id, data_source, database):
        self.base_url = f"https://data.mongodb-api.com/app/{app_id}/endpoint/data/v1"
        self.headers = {
            "Content-Type": "application/json",
            "api-key": api_key
        }
        self.data_source = data_source
        self.database = database
    
    def find_documents(self, collection, filter_doc=None, limit=10):
        """Find documents using Atlas Data API"""
        url = f"{self.base_url}/action/find"
        
        payload = {
            "dataSource": self.data_source,
            "database": self.database,
            "collection": collection,
            "filter": filter_doc or {},
            "limit": limit
        }
        
        response = requests.post(url, headers=self.headers, data=json.dumps(payload))
        return response.json()
    
    def insert_document(self, collection, document):
        """Insert document using Atlas Data API"""
        url = f"{self.base_url}/action/insertOne"
        
        payload = {
            "dataSource": self.data_source,
            "database": self.database,
            "collection": collection,
            "document": document
        }
        
        response = requests.post(url, headers=self.headers, data=json.dumps(payload))
        return response.json()

# Usage example
# atlas_api = AtlasDataAPI(
#     api_key="your-api-key",
#     app_id="your-app-id",
#     data_source="Cluster0",
#     database="ecommerce"
# )
# 
# result = atlas_api.find_documents("products", {"category": "laptops"})
# print(result)
EOF

# Atlas Search configuration example
db.products.createSearchIndex(
  "default",
  {
    "mappings": {
      "dynamic": false,
      "fields": {
        "name": {
          "type": "string",
          "analyzer": "lucene.standard"
        },
        "description": {
          "type": "string",
          "analyzer": "lucene.standard"
        },
        "category": {
          "type": "string",
          "analyzer": "lucene.keyword"
        },
        "price": {
          "type": "number"
        }
      }
    }
  }
)

# Atlas Search query example
db.products.aggregate([
  {
    $search: {
      index: "default",
      compound: {
        must: [
          {
            text: {
              query: "laptop professional",
              path: ["name", "description"]
            }
          }
        ],
        filter: [
          {
            range: {
              path: "price",
              gte: 1000,
              lte: 3000
            }
          }
        ]
      }
    }
  },
  {
    $project: {
      name: 1,
      description: 1,
      price: 1,
      score: { $meta: "searchScore" }
    }
  }
])

# MongoDB Compass connection string examples
# Local: mongodb://localhost:27017
# Replica Set: mongodb://host1:27017,host2:27017,host3:27017/?replicaSet=rs0
# Sharded: mongodb://mongos1:27017,mongos2:27017/
# Atlas: mongodb+srv://username:[email protected]/database

echo "MongoDB English version setup and examples completed"