MongoDB
Document-oriented NoSQL database. Stores data in JSON-like document format. Features scalability and developer-friendly design.
Database Server
MongoDB
Overview
MongoDB is a document-oriented NoSQL database developed by 10gen (now MongoDB Inc.) in 2009. Storing data in JSON-like BSON format, it provides flexible schema design, automatic sharding, and high availability through replica sets, making it an ideal database system optimized for modern application development. MongoDB 8.0 (released in 2024) delivers significant evolution with 25% performance improvements, enhanced Queryable Encryption, background compression functionality, and 60% faster time series data processing. Combined with MongoDB Atlas cloud services, it enables global-scale automatic scaling, security enhancements, and multi-cloud support, widely adopted from startups to large enterprises. Through developer-friendly APIs and a rich ecosystem, it supports rapid application development and operations.
Details
MongoDB 8.0 has evolved as a comprehensive data platform that breaks through the limitations of traditional NoSQL databases for the 2025 edition. The latest version significantly enhances real-time analytics, machine learning integration, Queryable Encryption, time series data optimization, and AI-ready vector search capabilities. MongoDB Atlas delivers 50% faster automatic scaling, 5x faster real-time resource responsiveness, and unified full-text and vector search through Atlas Search, providing all necessary features for modern data-driven applications. Additionally, Relational Migrator supports migration from existing RDBMSs, Atlas Data Federation enables data lake integration, and comprehensive operational tools including Charts, Compass, and Ops Manager address enterprise-level requirements. ACID-compliant multi-document transactions, unlimited scalability through distributed architecture, and real-time data change monitoring via Change Streams function as the foundation for mission-critical systems.
Key Features
- Document-Oriented: Intuitive and flexible data modeling through JSON-like BSON
- Automatic Sharding: Automatic data distribution and query load balancing
- High Availability: Automatic failover and data redundancy through replica sets
- ACID Compliance: Strong consistency through multi-document transactions
- Atlas Cloud: Zero operational overhead and global deployment through managed services
- Real-time Analytics: High-speed data processing through aggregation pipelines and Time Series
Pros and Cons
Pros
- 30-50% faster read/write performance than relational databases with horizontal scalability
- Flexible schema design supporting agile development and microservices
- Automatic scaling, backup, and security operations through Atlas Cloud
- Rich programming language drivers and framework integrations
- Real-time data change detection and distribution through Change Streams
- Built-in support for geospatial data, full-text search, and time series data
Cons
- RDBMSs are superior for complex JOIN operations and transaction processing
- Higher memory usage than RDBMSs and increased storage costs
- Learning costs for operations and debugging due to distributed system complexity
- Potential data consistency risks from schema-less design
- MongoDB Atlas licensing costs and cloud provider dependency
- Performance overhead and design trade-offs for ACID guarantees
Reference Pages
Code Examples
Installation and Basic Setup
# MongoDB Community Edition installation on Ubuntu/Debian
# Import MongoDB official GPG key
wget -qO - https://www.mongodb.org/static/pgp/server-8.0.asc | sudo apt-key add -
# Add MongoDB repository
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu $(lsb_release -cs)/mongodb-org/8.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-8.0.list
# Update package database and install MongoDB
sudo apt-get update
sudo apt-get install -y mongodb-org
# Start and enable MongoDB service
sudo systemctl enable mongod
sudo systemctl start mongod
sudo systemctl status mongod
# MongoDB environment setup using Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
mongodb:
image: mongo:8.0
container_name: mongodb
restart: unless-stopped
environment:
MONGO_INITDB_ROOT_USERNAME: admin
MONGO_INITDB_ROOT_PASSWORD: password123
MONGO_INITDB_DATABASE: myapp
ports:
- "27017:27017"
volumes:
- mongodb_data:/data/db
- mongodb_config:/data/configdb
- ./mongod.conf:/etc/mongod.conf
command: --config /etc/mongod.conf
mongo-express:
image: mongo-express:latest
container_name: mongo-express
restart: unless-stopped
ports:
- "8081:8081"
environment:
ME_CONFIG_MONGODB_ADMINUSERNAME: admin
ME_CONFIG_MONGODB_ADMINPASSWORD: password123
ME_CONFIG_MONGODB_URL: mongodb://admin:password123@mongodb:27017/
ME_CONFIG_BASICAUTH: false
depends_on:
- mongodb
volumes:
mongodb_data:
mongodb_config:
EOF
# Start services
docker-compose up -d
# Verify operation
curl http://localhost:8081
# MongoDB configuration optimization
cat > mongod.conf << 'EOF'
# Network settings
net:
port: 27017
bindIp: 127.0.0.1
# Storage settings
storage:
dbPath: /data/db
journal:
enabled: true
wiredTiger:
engineConfig:
cacheSizeGB: 2
collectionConfig:
blockCompressor: snappy
indexConfig:
prefixCompression: true
# System logging
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
# Process management
processManagement:
fork: true
pidFilePath: /var/run/mongodb/mongod.pid
# Security settings
security:
authorization: enabled
# Replication settings
replication:
replSetName: rs0
# Sharding settings
# sharding:
# clusterRole: shardsvr
EOF
Basic Database Operations and CRUD
// MongoDB Shell connection
mongosh "mongodb://localhost:27017"
// Database and collection creation
use ecommerce
db.createCollection("products")
// Document insertion (single)
db.products.insertOne({
name: "MacBook Pro 16-inch",
description: "High-performance laptop with Apple M3 Max chip",
price: 2499.99,
category: "laptops",
brand: "Apple",
specifications: {
processor: "Apple M3 Max",
memory: "32GB",
storage: "1TB SSD",
display: "16-inch Retina"
},
tags: ["professional", "creative", "developer"],
inStock: true,
quantity: 25,
ratings: {
average: 4.8,
count: 1247
},
createdAt: new Date(),
updatedAt: new Date()
})
// Bulk document insertion
db.products.insertMany([
{
name: "Dell XPS 13",
description: "Ultra-portable laptop for business professionals",
price: 1299.99,
category: "laptops",
brand: "Dell",
specifications: {
processor: "Intel Core i7",
memory: "16GB",
storage: "512GB SSD",
display: "13.3-inch FHD+"
},
tags: ["business", "portable", "productivity"],
inStock: true,
quantity: 18,
ratings: { average: 4.5, count: 832 },
createdAt: new Date(),
updatedAt: new Date()
},
{
name: "iPhone 15 Pro",
description: "Latest iPhone with A17 Pro chip",
price: 999.99,
category: "smartphones",
brand: "Apple",
specifications: {
processor: "A17 Pro",
memory: "8GB",
storage: "256GB",
display: "6.1-inch Super Retina XDR"
},
tags: ["flagship", "5g", "camera"],
inStock: true,
quantity: 42,
ratings: { average: 4.9, count: 2156 },
createdAt: new Date(),
updatedAt: new Date()
}
])
// Basic queries
// Find all products
db.products.find()
// Find by category
db.products.find({ category: "laptops" })
// Find with price range
db.products.find({
price: { $gte: 1000, $lte: 2000 }
})
// Complex query with multiple conditions
db.products.find({
$and: [
{ category: "laptops" },
{ "specifications.memory": "16GB" },
{ inStock: true },
{ quantity: { $gt: 10 } }
]
})
// Text search
db.products.createIndex({ name: "text", description: "text" })
db.products.find({ $text: { $search: "laptop professional" } })
// Document update operations
// Update single document
db.products.updateOne(
{ name: "MacBook Pro 16-inch" },
{
$set: {
price: 2399.99,
updatedAt: new Date()
},
$inc: { quantity: -1 }
}
)
// Update multiple documents
db.products.updateMany(
{ category: "laptops" },
{
$set: { "specifications.warranty": "2 years" },
$currentDate: { updatedAt: true }
}
)
// Upsert operation
db.products.updateOne(
{ name: "Surface Pro 9" },
{
$set: {
name: "Surface Pro 9",
description: "2-in-1 tablet PC",
price: 1299.99,
category: "tablets",
brand: "Microsoft",
inStock: true,
quantity: 15,
createdAt: new Date(),
updatedAt: new Date()
}
},
{ upsert: true }
)
// Document deletion
// Delete single document
db.products.deleteOne({ name: "Old Product" })
// Delete multiple documents
db.products.deleteMany({
$and: [
{ quantity: 0 },
{ inStock: false }
]
})
// Index creation and optimization
db.products.createIndex({ category: 1 })
db.products.createIndex({ price: 1 })
db.products.createIndex({ "ratings.average": -1 })
db.products.createIndex({ category: 1, price: 1 }) // Compound index
// Show indexes
db.products.getIndexes()
Advanced Aggregation and Analytics
// Aggregation pipeline examples
// Basic aggregation: Group by category with statistics
db.products.aggregate([
{
$group: {
_id: "$category",
count: { $sum: 1 },
averagePrice: { $avg: "$price" },
maxPrice: { $max: "$price" },
minPrice: { $min: "$price" },
totalQuantity: { $sum: "$quantity" }
}
},
{ $sort: { averagePrice: -1 } }
])
// Complex aggregation with multiple stages
db.products.aggregate([
// Stage 1: Filter active products
{
$match: {
inStock: true,
quantity: { $gt: 0 }
}
},
// Stage 2: Add calculated fields
{
$addFields: {
priceCategory: {
$switch: {
branches: [
{ case: { $lt: ["$price", 500] }, then: "Budget" },
{ case: { $lt: ["$price", 1500] }, then: "Mid-range" },
{ case: { $gte: ["$price", 1500] }, then: "Premium" }
],
default: "Unknown"
}
},
stockStatus: {
$cond: {
if: { $gte: ["$quantity", 20] },
then: "High Stock",
else: "Low Stock"
}
}
}
},
// Stage 3: Group by price category and brand
{
$group: {
_id: {
priceCategory: "$priceCategory",
brand: "$brand"
},
productCount: { $sum: 1 },
averageRating: { $avg: "$ratings.average" },
totalValue: { $sum: { $multiply: ["$price", "$quantity"] } },
products: { $push: "$name" }
}
},
// Stage 4: Sort and format output
{
$sort: { "_id.priceCategory": 1, totalValue: -1 }
},
// Stage 5: Project final output
{
$project: {
_id: 0,
priceCategory: "$_id.priceCategory",
brand: "$_id.brand",
productCount: 1,
averageRating: { $round: ["$averageRating", 2] },
totalValue: { $round: ["$totalValue", 2] },
topProducts: { $slice: ["$products", 3] }
}
}
])
// Faceted search aggregation
db.products.aggregate([
{
$facet: {
"categoryStats": [
{ $group: { _id: "$category", count: { $sum: 1 } } },
{ $sort: { count: -1 } }
],
"brandStats": [
{ $group: { _id: "$brand", avgPrice: { $avg: "$price" } } },
{ $sort: { avgPrice: -1 } }
],
"priceRanges": [
{
$bucket: {
groupBy: "$price",
boundaries: [0, 500, 1000, 1500, 2000, 5000],
default: "Other",
output: {
count: { $sum: 1 },
avgRating: { $avg: "$ratings.average" }
}
}
}
],
"topRated": [
{ $match: { "ratings.count": { $gte: 100 } } },
{ $sort: { "ratings.average": -1 } },
{ $limit: 5 },
{ $project: { name: 1, "ratings.average": 1, price: 1 } }
]
}
}
])
// Time-based analytics with date aggregation
db.sales.aggregate([
{
$match: {
orderDate: {
$gte: new Date("2024-01-01"),
$lt: new Date("2024-12-31")
}
}
},
{
$group: {
_id: {
year: { $year: "$orderDate" },
month: { $month: "$orderDate" },
day: { $dayOfMonth: "$orderDate" }
},
dailySales: { $sum: "$totalAmount" },
orderCount: { $sum: 1 },
averageOrderValue: { $avg: "$totalAmount" }
}
},
{
$sort: { "_id.year": 1, "_id.month": 1, "_id.day": 1 }
},
{
$group: {
_id: {
year: "$_id.year",
month: "$_id.month"
},
monthlySales: { $sum: "$dailySales" },
monthlyOrders: { $sum: "$orderCount" },
dailyAverages: { $avg: "$dailySales" },
dailyData: {
$push: {
day: "$_id.day",
sales: "$dailySales",
orders: "$orderCount"
}
}
}
}
])
// Lookup (JOIN) operations with related collections
db.orders.aggregate([
// Join with customers
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customer"
}
},
// Join with products
{
$lookup: {
from: "products",
localField: "items.productId",
foreignField: "_id",
as: "productDetails"
}
},
// Unwind and calculate
{ $unwind: "$customer" },
{
$addFields: {
customerName: "$customer.name",
customerEmail: "$customer.email",
totalValue: { $sum: "$items.subtotal" }
}
},
// Group by customer
{
$group: {
_id: "$customerId",
customerName: { $first: "$customerName" },
customerEmail: { $first: "$customerEmail" },
totalOrders: { $sum: 1 },
totalSpent: { $sum: "$totalValue" },
averageOrderValue: { $avg: "$totalValue" },
lastOrderDate: { $max: "$orderDate" }
}
},
{ $sort: { totalSpent: -1 } }
])
Python Integration and Application Development
import pymongo
from pymongo import MongoClient
from datetime import datetime, timedelta
import json
from bson import ObjectId
from typing import List, Dict, Optional
import logging
class MongoDBManager:
def __init__(self, connection_string: str = "mongodb://localhost:27017/",
database_name: str = "ecommerce"):
"""MongoDB connection and management class"""
try:
self.client = MongoClient(
connection_string,
serverSelectionTimeoutMS=5000,
connectTimeoutMS=10000,
socketTimeoutMS=10000,
retryWrites=True,
w="majority"
)
# Test connection
self.client.admin.command('ismaster')
print("MongoDB connection successful")
self.db = self.client[database_name]
except Exception as e:
logging.error(f"MongoDB connection failed: {e}")
raise
def create_collections_with_schema(self):
"""Create collections with schema validation"""
# Products collection schema
products_schema = {
"$jsonSchema": {
"bsonType": "object",
"required": ["name", "price", "category"],
"properties": {
"name": {
"bsonType": "string",
"description": "Product name is required and must be a string"
},
"price": {
"bsonType": ["double", "int"],
"minimum": 0,
"description": "Price must be a positive number"
},
"category": {
"bsonType": "string",
"enum": ["laptops", "smartphones", "tablets", "accessories"],
"description": "Category must be one of the allowed values"
},
"inStock": {
"bsonType": "bool",
"description": "Stock status must be boolean"
},
"quantity": {
"bsonType": "int",
"minimum": 0,
"description": "Quantity must be a non-negative integer"
}
}
}
}
try:
self.db.create_collection(
"products",
validator=products_schema,
validationLevel="strict",
validationAction="error"
)
print("Products collection created with schema validation")
except Exception as e:
print(f"Products collection may already exist: {e}")
def insert_product(self, product_data: Dict) -> str:
"""Insert a single product"""
try:
product_data['createdAt'] = datetime.now()
product_data['updatedAt'] = datetime.now()
result = self.db.products.insert_one(product_data)
print(f"Product inserted with ID: {result.inserted_id}")
return str(result.inserted_id)
except Exception as e:
logging.error(f"Product insertion failed: {e}")
return None
def bulk_insert_products(self, products: List[Dict]) -> List[str]:
"""Bulk insert multiple products"""
try:
for product in products:
product['createdAt'] = datetime.now()
product['updatedAt'] = datetime.now()
result = self.db.products.insert_many(products)
print(f"Bulk insert completed: {len(result.inserted_ids)} products")
return [str(id) for id in result.inserted_ids]
except Exception as e:
logging.error(f"Bulk insert failed: {e}")
return []
def find_products_by_criteria(self, criteria: Dict, limit: int = 10) -> List[Dict]:
"""Find products by specified criteria"""
try:
cursor = self.db.products.find(criteria).limit(limit)
products = list(cursor)
# Convert ObjectId to string for JSON serialization
for product in products:
product['_id'] = str(product['_id'])
return products
except Exception as e:
logging.error(f"Product search failed: {e}")
return []
def advanced_product_search(self, text_query: str = None,
price_range: Dict = None,
category: str = None,
sort_by: str = "name",
sort_order: int = 1) -> List[Dict]:
"""Advanced product search with multiple filters"""
pipeline = []
# Match stage
match_conditions = {}
if text_query:
match_conditions["$text"] = {"$search": text_query}
if price_range:
match_conditions["price"] = {}
if "min" in price_range:
match_conditions["price"]["$gte"] = price_range["min"]
if "max" in price_range:
match_conditions["price"]["$lte"] = price_range["max"]
if category:
match_conditions["category"] = category
if match_conditions:
pipeline.append({"$match": match_conditions})
# Add fields stage
pipeline.append({
"$addFields": {
"priceCategory": {
"$switch": {
"branches": [
{"case": {"$lt": ["$price", 500]}, "then": "Budget"},
{"case": {"$lt": ["$price", 1500]}, "then": "Mid-range"},
{"case": {"$gte": ["$price", 1500]}, "then": "Premium"}
],
"default": "Unknown"
}
}
}
})
# Sort stage
pipeline.append({"$sort": {sort_by: sort_order}})
# Limit stage
pipeline.append({"$limit": 20})
try:
cursor = self.db.products.aggregate(pipeline)
results = list(cursor)
# Convert ObjectId to string
for result in results:
result['_id'] = str(result['_id'])
return results
except Exception as e:
logging.error(f"Advanced search failed: {e}")
return []
def get_product_analytics(self) -> Dict:
"""Get comprehensive product analytics"""
try:
pipeline = [
{
"$facet": {
"totalStats": [
{
"$group": {
"_id": None,
"totalProducts": {"$sum": 1},
"totalValue": {"$sum": {"$multiply": ["$price", "$quantity"]}},
"averagePrice": {"$avg": "$price"},
"inStockCount": {
"$sum": {"$cond": [{"$eq": ["$inStock", True]}, 1, 0]}
}
}
}
],
"categoryBreakdown": [
{
"$group": {
"_id": "$category",
"count": {"$sum": 1},
"averagePrice": {"$avg": "$price"},
"totalQuantity": {"$sum": "$quantity"}
}
},
{"$sort": {"count": -1}}
],
"brandAnalysis": [
{
"$group": {
"_id": "$brand",
"productCount": {"$sum": 1},
"averagePrice": {"$avg": "$price"},
"averageRating": {"$avg": "$ratings.average"}
}
},
{"$sort": {"productCount": -1}}
],
"priceDistribution": [
{
"$bucket": {
"groupBy": "$price",
"boundaries": [0, 500, 1000, 1500, 2000, 5000],
"default": "5000+",
"output": {
"count": {"$sum": 1},
"averageRating": {"$avg": "$ratings.average"}
}
}
}
]
}
}
]
result = list(self.db.products.aggregate(pipeline))[0]
return result
except Exception as e:
logging.error(f"Analytics query failed: {e}")
return {}
def update_product_inventory(self, product_id: str, quantity_change: int) -> bool:
"""Update product inventory with atomic operation"""
try:
result = self.db.products.update_one(
{"_id": ObjectId(product_id)},
{
"$inc": {"quantity": quantity_change},
"$set": {"updatedAt": datetime.now()}
}
)
if result.modified_count > 0:
# Check if product is out of stock
product = self.db.products.find_one({"_id": ObjectId(product_id)})
if product and product["quantity"] <= 0:
self.db.products.update_one(
{"_id": ObjectId(product_id)},
{"$set": {"inStock": False}}
)
print(f"Product {product_id} marked as out of stock")
return True
return False
except Exception as e:
logging.error(f"Inventory update failed: {e}")
return False
def create_indexes(self):
"""Create performance indexes"""
try:
# Text index for search
self.db.products.create_index([
("name", "text"),
("description", "text")
])
# Compound indexes for common queries
self.db.products.create_index([("category", 1), ("price", 1)])
self.db.products.create_index([("brand", 1), ("inStock", 1)])
self.db.products.create_index([("ratings.average", -1)])
self.db.products.create_index([("createdAt", -1)])
print("Indexes created successfully")
except Exception as e:
logging.error(f"Index creation failed: {e}")
def close_connection(self):
"""Close database connection"""
if self.client:
self.client.close()
print("MongoDB connection closed")
# Usage example and testing
def demo_mongodb_operations():
"""Demonstrate MongoDB operations"""
# Initialize MongoDB manager
db_manager = MongoDBManager()
try:
# Create collections with schema
db_manager.create_collections_with_schema()
# Create indexes
db_manager.create_indexes()
# Sample product data
sample_products = [
{
"name": "MacBook Pro 16-inch M3 Max",
"description": "High-performance laptop for professionals",
"price": 2499.99,
"category": "laptops",
"brand": "Apple",
"specifications": {
"processor": "Apple M3 Max",
"memory": "32GB",
"storage": "1TB SSD"
},
"inStock": True,
"quantity": 15,
"ratings": {"average": 4.8, "count": 1247}
},
{
"name": "iPhone 15 Pro",
"description": "Latest iPhone with A17 Pro chip",
"price": 999.99,
"category": "smartphones",
"brand": "Apple",
"specifications": {
"processor": "A17 Pro",
"memory": "8GB",
"storage": "256GB"
},
"inStock": True,
"quantity": 32,
"ratings": {"average": 4.9, "count": 2156}
}
]
# Insert sample products
inserted_ids = db_manager.bulk_insert_products(sample_products)
print(f"Inserted products with IDs: {inserted_ids}")
# Search products
laptops = db_manager.find_products_by_criteria({"category": "laptops"})
print(f"Found {len(laptops)} laptops")
# Advanced search
expensive_products = db_manager.advanced_product_search(
price_range={"min": 1000},
sort_by="price",
sort_order=-1
)
print(f"Found {len(expensive_products)} expensive products")
# Get analytics
analytics = db_manager.get_product_analytics()
print("Analytics:")
print(json.dumps(analytics, indent=2, default=str))
# Update inventory
if inserted_ids:
success = db_manager.update_product_inventory(inserted_ids[0], -2)
print(f"Inventory update success: {success}")
except Exception as e:
print(f"Demo error: {e}")
finally:
db_manager.close_connection()
if __name__ == "__main__":
demo_mongodb_operations()
Replica Set Configuration and High Availability
# Replica Set setup for high availability
# Initialize replica set on primary node
mongosh
# Switch to admin database
use admin
# Initialize replica set
rs.initiate({
_id: "rs0",
members: [
{ _id: 0, host: "mongodb-primary:27017", priority: 2 },
{ _id: 1, host: "mongodb-secondary1:27017", priority: 1 },
{ _id: 2, host: "mongodb-secondary2:27017", priority: 1 }
]
})
# Check replica set status
rs.status()
# Add a new member to replica set
rs.add("mongodb-secondary3:27017")
# Remove a member from replica set
rs.remove("mongodb-secondary3:27017")
# Step down primary (force election)
rs.stepDown()
# Docker Compose configuration for replica set
cat > docker-compose-replica.yml << 'EOF'
version: '3.8'
services:
mongo-primary:
image: mongo:8.0
container_name: mongo-primary
restart: unless-stopped
ports:
- "27017:27017"
environment:
MONGO_INITDB_ROOT_USERNAME: admin
MONGO_INITDB_ROOT_PASSWORD: password123
volumes:
- mongo-primary-data:/data/db
- ./replica-set-init.js:/docker-entrypoint-initdb.d/replica-set-init.js
command: mongod --replSet rs0 --bind_ip_all
networks:
- mongo-cluster
mongo-secondary1:
image: mongo:8.0
container_name: mongo-secondary1
restart: unless-stopped
ports:
- "27018:27017"
volumes:
- mongo-secondary1-data:/data/db
command: mongod --replSet rs0 --bind_ip_all
networks:
- mongo-cluster
depends_on:
- mongo-primary
mongo-secondary2:
image: mongo:8.0
container_name: mongo-secondary2
restart: unless-stopped
ports:
- "27019:27017"
volumes:
- mongo-secondary2-data:/data/db
command: mongod --replSet rs0 --bind_ip_all
networks:
- mongo-cluster
depends_on:
- mongo-primary
mongo-arbiter:
image: mongo:8.0
container_name: mongo-arbiter
restart: unless-stopped
ports:
- "27020:27017"
command: mongod --replSet rs0 --bind_ip_all
networks:
- mongo-cluster
depends_on:
- mongo-primary
volumes:
mongo-primary-data:
mongo-secondary1-data:
mongo-secondary2-data:
networks:
mongo-cluster:
driver: bridge
EOF
# Replica set initialization script
cat > replica-set-init.js << 'EOF'
// Wait for MongoDB to be ready
sleep(5000);
// Connect to primary
db = db.getSiblingDB('admin');
// Initialize replica set
config = {
"_id": "rs0",
"members": [
{ "_id": 0, "host": "mongo-primary:27017", "priority": 2 },
{ "_id": 1, "host": "mongo-secondary1:27017", "priority": 1 },
{ "_id": 2, "host": "mongo-secondary2:27017", "priority": 1 },
{ "_id": 3, "host": "mongo-arbiter:27017", "arbiterOnly": true }
]
};
rs.initiate(config);
// Create admin user
db.createUser({
user: "admin",
pwd: "password123",
roles: [
{ role: "root", db: "admin" }
]
});
print("Replica set initialization completed");
EOF
# Start replica set cluster
docker-compose -f docker-compose-replica.yml up -d
# Connection string for application
MONGO_URI="mongodb://admin:password123@mongo-primary:27017,mongo-secondary1:27018,mongo-secondary2:27019/myapp?replicaSet=rs0&authSource=admin"
# Read preference configuration
# Primary: All reads from primary
# Secondary: All reads from secondary
# PrimaryPreferred: Primary preferred, fallback to secondary
# SecondaryPreferred: Secondary preferred, fallback to primary
# Nearest: Lowest network latency
Sharding Configuration and Horizontal Scaling
# Sharded cluster setup
# Config servers (replica set)
cat > config-server-docker-compose.yml << 'EOF'
version: '3.8'
services:
config1:
image: mongo:8.0
container_name: config1
ports:
- "27019:27017"
volumes:
- config1-data:/data/db
command: mongod --configsvr --replSet configReplSet --bind_ip_all
networks:
- mongo-shard
config2:
image: mongo:8.0
container_name: config2
ports:
- "27020:27017"
volumes:
- config2-data:/data/db
command: mongod --configsvr --replSet configReplSet --bind_ip_all
networks:
- mongo-shard
config3:
image: mongo:8.0
container_name: config3
ports:
- "27021:27017"
volumes:
- config3-data:/data/db
command: mongod --configsvr --replSet configReplSet --bind_ip_all
networks:
- mongo-shard
volumes:
config1-data:
config2-data:
config3-data:
networks:
mongo-shard:
external: true
EOF
# Shard servers
cat > shard-servers-docker-compose.yml << 'EOF'
version: '3.8'
services:
# Shard 1
shard1-primary:
image: mongo:8.0
container_name: shard1-primary
ports:
- "27022:27017"
volumes:
- shard1-primary-data:/data/db
command: mongod --shardsvr --replSet shard1ReplSet --bind_ip_all
networks:
- mongo-shard
shard1-secondary:
image: mongo:8.0
container_name: shard1-secondary
ports:
- "27023:27017"
volumes:
- shard1-secondary-data:/data/db
command: mongod --shardsvr --replSet shard1ReplSet --bind_ip_all
networks:
- mongo-shard
# Shard 2
shard2-primary:
image: mongo:8.0
container_name: shard2-primary
ports:
- "27024:27017"
volumes:
- shard2-primary-data:/data/db
command: mongod --shardsvr --replSet shard2ReplSet --bind_ip_all
networks:
- mongo-shard
shard2-secondary:
image: mongo:8.0
container_name: shard2-secondary
ports:
- "27025:27017"
volumes:
- shard2-secondary-data:/data/db
command: mongod --shardsvr --replSet shard2ReplSet --bind_ip_all
networks:
- mongo-shard
volumes:
shard1-primary-data:
shard1-secondary-data:
shard2-primary-data:
shard2-secondary-data:
networks:
mongo-shard:
external: true
EOF
# Mongos router
cat > mongos-docker-compose.yml << 'EOF'
version: '3.8'
services:
mongos1:
image: mongo:8.0
container_name: mongos1
ports:
- "27017:27017"
command: mongos --configdb configReplSet/config1:27017,config2:27017,config3:27017 --bind_ip_all
networks:
- mongo-shard
depends_on:
- config1
- config2
- config3
mongos2:
image: mongo:8.0
container_name: mongos2
ports:
- "27018:27017"
command: mongos --configdb configReplSet/config1:27017,config2:27017,config3:27017 --bind_ip_all
networks:
- mongo-shard
depends_on:
- config1
- config2
- config3
networks:
mongo-shard:
external: true
EOF
# Create network and start cluster
docker network create mongo-shard
# Start components in order
docker-compose -f config-server-docker-compose.yml up -d
sleep 10
docker-compose -f shard-servers-docker-compose.yml up -d
sleep 10
docker-compose -f mongos-docker-compose.yml up -d
# Initialize config server replica set
mongosh --host config1:27019
rs.initiate({
_id: "configReplSet",
configsvr: true,
members: [
{ _id: 0, host: "config1:27017" },
{ _id: 1, host: "config2:27017" },
{ _id: 2, host: "config3:27017" }
]
})
# Initialize shard replica sets
# Shard 1
mongosh --host shard1-primary:27022
rs.initiate({
_id: "shard1ReplSet",
members: [
{ _id: 0, host: "shard1-primary:27017" },
{ _id: 1, host: "shard1-secondary:27017" }
]
})
# Shard 2
mongosh --host shard2-primary:27024
rs.initiate({
_id: "shard2ReplSet",
members: [
{ _id: 0, host: "shard2-primary:27017" },
{ _id: 1, host: "shard2-secondary:27017" }
]
})
# Connect to mongos and add shards
mongosh --host mongos1:27017
sh.addShard("shard1ReplSet/shard1-primary:27017,shard1-secondary:27017")
sh.addShard("shard2ReplSet/shard2-primary:27017,shard2-secondary:27017")
# Enable sharding for database
sh.enableSharding("ecommerce")
# Shard collection by a shard key
sh.shardCollection("ecommerce.products", { "_id": "hashed" })
# Check sharding status
sh.status()
# Query sharding statistics
db.products.getShardDistribution()
Performance Monitoring and Optimization
# Performance monitoring and profiling
mongosh
# Enable profiling for slow queries (> 100ms)
db.setProfilingLevel(2, { slowms: 100 })
# Check profiling status
db.getProfilingStatus()
# Query profiler collection
db.system.profile.find().limit(5).sort({ ts: -1 }).pretty()
# Analyze slow queries
db.system.profile.find({
"ts": {
$gte: new Date(Date.now() - 1000 * 60 * 60) // Last hour
}
}).sort({ "ts": -1 })
# Index analysis
db.products.getIndexes()
# Explain query execution
db.products.find({ category: "laptops", price: { $gte: 1000 } }).explain("executionStats")
# Monitor current operations
db.currentOp()
# Kill long-running operation
db.killOp(123456)
# Database statistics
db.stats()
db.products.stats()
# Server status
db.serverStatus()
# Connection monitoring
db.serverStatus().connections
# Memory usage monitoring
db.serverStatus().mem
# WiredTiger cache statistics
db.serverStatus().wiredTiger.cache
# Performance monitoring script
cat > mongodb_monitor.py << 'EOF'
#!/usr/bin/env python3
import pymongo
import time
import json
from datetime import datetime
class MongoDBMonitor:
def __init__(self, connection_string="mongodb://localhost:27017/"):
self.client = pymongo.MongoClient(connection_string)
self.db = self.client.admin
def get_server_status(self):
"""Get comprehensive server status"""
status = self.db.command("serverStatus")
return {
"uptime": status["uptime"],
"connections": status["connections"],
"memory": status["mem"],
"opcounters": status["opcounters"],
"network": status["network"],
"wiredTiger": {
"cache": status["wiredTiger"]["cache"] if "wiredTiger" in status else None,
"concurrentTransactions": status["wiredTiger"]["concurrentTransactions"] if "wiredTiger" in status else None
}
}
def get_slow_queries(self, db_name="ecommerce", limit=10):
"""Get recent slow queries from profiler"""
db = self.client[db_name]
slow_queries = list(db.system.profile.find({
"ts": {"$gte": datetime.now().replace(hour=0, minute=0, second=0)}
}).sort("ts", -1).limit(limit))
return slow_queries
def get_index_usage(self, db_name="ecommerce", collection_name="products"):
"""Get index usage statistics"""
db = self.client[db_name]
# Get index statistics
stats = db.command("collStats", collection_name, indexDetails=True)
return stats.get("indexSizes", {})
def monitor_replication(self):
"""Monitor replica set status"""
try:
rs_status = self.db.command("replSetGetStatus")
return {
"set": rs_status["set"],
"members": [
{
"name": member["name"],
"state": member["stateStr"],
"health": member["health"],
"optime": member.get("optime", {})
}
for member in rs_status["members"]
]
}
except Exception as e:
return {"error": str(e)}
def generate_report(self):
"""Generate comprehensive monitoring report"""
report = {
"timestamp": datetime.now().isoformat(),
"server_status": self.get_server_status(),
"slow_queries": self.get_slow_queries(),
"index_usage": self.get_index_usage(),
"replication_status": self.monitor_replication()
}
return report
def main():
monitor = MongoDBMonitor()
while True:
try:
report = monitor.generate_report()
print(f"\n=== MongoDB Monitor Report - {report['timestamp']} ===")
print(f"Uptime: {report['server_status']['uptime']} seconds")
print(f"Current connections: {report['server_status']['connections']['current']}")
print(f"Available connections: {report['server_status']['connections']['available']}")
if report['server_status']['wiredTiger']['cache']:
cache = report['server_status']['wiredTiger']['cache']
print(f"Cache usage: {cache.get('bytes currently in the cache', 0) / 1024 / 1024:.2f} MB")
print(f"Recent slow queries: {len(report['slow_queries'])}")
if report['replication_status'].get('members'):
print("Replica set members:")
for member in report['replication_status']['members']:
print(f" - {member['name']}: {member['state']} (health: {member['health']})")
# Save detailed report to file
with open(f"mongodb_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json", 'w') as f:
json.dump(report, f, indent=2, default=str)
time.sleep(60) # Monitor every minute
except KeyboardInterrupt:
print("\nStopping MongoDB monitor")
break
except Exception as e:
print(f"Monitoring error: {e}")
time.sleep(60)
if __name__ == "__main__":
main()
EOF
chmod +x mongodb_monitor.py
# MongoDB optimization commands
# Compact collection to reclaim space
db.products.compact()
# Re-index collection
db.products.reIndex()
# Analyze collection
db.products.validate()
# Set read concern and write concern
db.products.find().readConcern("majority")
# Connection pooling configuration (for applications)
# In connection string:
# mongodb://localhost:27017/mydb?maxPoolSize=100&minPoolSize=10&maxIdleTimeMS=30000
# Memory optimization (in mongod.conf)
cat >> /etc/mongod.conf << 'EOF'
storage:
wiredTiger:
engineConfig:
cacheSizeGB: 8 # Set to ~60% of available RAM
collectionConfig:
blockCompressor: snappy
indexConfig:
prefixCompression: true
setParameter:
wiredTigerConcurrentReadTransactions: 128
wiredTigerConcurrentWriteTransactions: 128
EOF
Security and Atlas Cloud Integration
# Security configuration and best practices
# Enable authentication
mongosh
use admin
# Create admin user
db.createUser({
user: "admin",
pwd: "SecurePassword123!",
roles: [
{ role: "userAdminAnyDatabase", db: "admin" },
{ role: "readWriteAnyDatabase", db: "admin" },
{ role: "dbAdminAnyDatabase", db: "admin" }
]
})
# Create application-specific user
use ecommerce
db.createUser({
user: "appuser",
pwd: "AppPassword123!",
roles: [
{ role: "readWrite", db: "ecommerce" },
{ role: "dbAdmin", db: "ecommerce" }
]
})
# Enable TLS/SSL (in mongod.conf)
cat > /etc/ssl/mongodb/mongod.conf << 'EOF'
net:
tls:
mode: requireTLS
certificateKeyFile: /etc/ssl/mongodb/mongodb.pem
CAFile: /etc/ssl/mongodb/ca.pem
allowConnectionsWithoutCertificates: false
security:
authorization: enabled
clusterAuthMode: x509
EOF
# Generate self-signed certificates for testing
openssl req -newkey rsa:4096 -nodes -out mongodb.csr -keyout mongodb.key -subj "/CN=localhost"
openssl x509 -signkey mongodb.key -in mongodb.csr -req -days 365 -out mongodb.crt
cat mongodb.key mongodb.crt > mongodb.pem
# Field-level encryption setup
mongosh "mongodb://admin:SecurePassword123!@localhost:27017/ecommerce?authSource=admin"
# Create key vault collection
use encryption
db.createCollection("__keyVault")
# MongoDB Atlas connection examples
# Atlas connection string format:
ATLAS_URI="mongodb+srv://username:[email protected]/myapp?retryWrites=true&w=majority"
# Atlas Data API example (REST API)
cat > atlas_api_example.py << 'EOF'
import requests
import json
class AtlasDataAPI:
def __init__(self, api_key, app_id, data_source, database):
self.base_url = f"https://data.mongodb-api.com/app/{app_id}/endpoint/data/v1"
self.headers = {
"Content-Type": "application/json",
"api-key": api_key
}
self.data_source = data_source
self.database = database
def find_documents(self, collection, filter_doc=None, limit=10):
"""Find documents using Atlas Data API"""
url = f"{self.base_url}/action/find"
payload = {
"dataSource": self.data_source,
"database": self.database,
"collection": collection,
"filter": filter_doc or {},
"limit": limit
}
response = requests.post(url, headers=self.headers, data=json.dumps(payload))
return response.json()
def insert_document(self, collection, document):
"""Insert document using Atlas Data API"""
url = f"{self.base_url}/action/insertOne"
payload = {
"dataSource": self.data_source,
"database": self.database,
"collection": collection,
"document": document
}
response = requests.post(url, headers=self.headers, data=json.dumps(payload))
return response.json()
# Usage example
# atlas_api = AtlasDataAPI(
# api_key="your-api-key",
# app_id="your-app-id",
# data_source="Cluster0",
# database="ecommerce"
# )
#
# result = atlas_api.find_documents("products", {"category": "laptops"})
# print(result)
EOF
# Atlas Search configuration example
db.products.createSearchIndex(
"default",
{
"mappings": {
"dynamic": false,
"fields": {
"name": {
"type": "string",
"analyzer": "lucene.standard"
},
"description": {
"type": "string",
"analyzer": "lucene.standard"
},
"category": {
"type": "string",
"analyzer": "lucene.keyword"
},
"price": {
"type": "number"
}
}
}
}
)
# Atlas Search query example
db.products.aggregate([
{
$search: {
index: "default",
compound: {
must: [
{
text: {
query: "laptop professional",
path: ["name", "description"]
}
}
],
filter: [
{
range: {
path: "price",
gte: 1000,
lte: 3000
}
}
]
}
}
},
{
$project: {
name: 1,
description: 1,
price: 1,
score: { $meta: "searchScore" }
}
}
])
# MongoDB Compass connection string examples
# Local: mongodb://localhost:27017
# Replica Set: mongodb://host1:27017,host2:27017,host3:27017/?replicaSet=rs0
# Sharded: mongodb://mongos1:27017,mongos2:27017/
# Atlas: mongodb+srv://username:[email protected]/database
echo "MongoDB English version setup and examples completed"