Database
Azure Cosmos DB
Overview
Azure Cosmos DB is a globally distributed, multi-model database service provided by Microsoft. It features high availability, automatic scaling, and single-digit millisecond response times, combining NoSQL and vector database capabilities in a modern cloud-native database. It supports multiple APIs and serves applications of any scale worldwide.
Details
Azure Cosmos DB was announced by Microsoft in 2017, evolving from the previous DocumentDB service into a next-generation database platform. It features the following characteristics:
Key Features
- Global Distribution: Instant data replication across 54+ regions worldwide
- Multi-Model: Supports SQL, MongoDB, Cassandra, Gremlin, and Table APIs
- Automatic Scaling: Unlimited scaling with autopilot functionality
- Low Latency: Sub-10ms read and write operations
- Multiple Consistency Levels: Choose from Strong, Bounded staleness, Session, Consistent prefix, and Eventual
- AI Integration: Vector search, full-text search, and hybrid search capabilities
- Unlimited Throughput: Performance guarantees at any scale
- Comprehensive SLA: 99.999% availability, latency, throughput, and consistency guarantees
2024-2025 New Features
- Vector Search: High-performance vector search powered by DiskANN algorithms
- Full-Text Search: Keyword and phrase search using BM25 algorithm
- Hybrid Search: Combination of vector search and full-text search
- Multi-Language Support: English, French, Spanish, and German support
- AI Foundry Integration: Integration with Azure AI for agent application development
- Fuzzy Search: Resilience to typos and text variations
Architecture
- Data Model: Hierarchical structure (Account > Database > Container > Item)
- Automatic Indexing: Automatic index creation for all fields
- Schema-Agnostic: Flexible JSON document structure
- Partitioning: Horizontal scaling through partitioning
- Multi-Master: Write operations across multiple regions
Consistency Levels
- Strong: Linearizability guarantee
- Bounded staleness: Configurable lag bounds
- Session: Read-your-writes consistency within single session
- Consistent prefix: Prefix consistency
- Eventual: Eventual consistency
Advantages and Disadvantages
Advantages
- Global Scale: Automatic data distribution and replication worldwide
- High Performance: Single-digit millisecond response times with auto-scaling
- Comprehensive SLA: 99.999% availability, latency, and throughput guarantees
- Multi-Model: Multiple APIs and data models in a single service
- Zero Operations: Fully managed service with no infrastructure management
- AI Capabilities: Vector search, full-text search, and AI integration
- Flexible Consistency: Consistency level selection based on application needs
- Azure Integration: Complete integration with Azure ecosystem
Disadvantages
- High Cost: Pay-per-use pricing can be expensive for large-scale usage
- Azure Dependency: Available only on Azure cloud, vendor lock-in
- Learning Curve: Understanding multiple APIs and configuration options required
- Complex Pricing: RU (Request Unit) based billing structure
- Data Size Limits: 2MB limit per item
- Query Limitations: Constraints on complex queries and aggregation operations
- Regional Limitations: Not available in all regions
Key Links
Code Examples
Installation & Setup
# Create Cosmos DB account with Azure CLI
az cosmosdb create \
--name mycosmosaccount \
--resource-group myresourcegroup \
--default-consistency-level Session \
--locations regionName="East US" failoverPriority=0 isZoneRedundant=False
# Create database and container
az cosmosdb sql database create \
--account-name mycosmosaccount \
--name mydatabase \
--resource-group myresourcegroup
az cosmosdb sql container create \
--account-name mycosmosaccount \
--database-name mydatabase \
--name mycontainer \
--partition-key-path "/partitionKey" \
--throughput 400 \
--resource-group myresourcegroup
# Install Node.js SDK
npm install @azure/cosmos
# Install .NET SDK
dotnet add package Microsoft.Azure.Cosmos
# Install Python SDK
pip install azure-cosmos
Basic Operations (CRUD)
// Connection and basic operations with Node.js
const { CosmosClient } = require('@azure/cosmos');
// Connection configuration
const client = new CosmosClient({
endpoint: 'https://myaccount.documents.azure.com:443/',
key: 'your-primary-key'
});
const database = client.database('mydatabase');
const container = database.container('mycontainer');
async function basicOperations() {
try {
// Create document
const newItem = {
id: 'item-001',
partitionKey: 'electronics',
name: 'Laptop Computer',
brand: 'Sample Brand',
price: 1200,
category: 'electronics',
specifications: {
cpu: 'Intel Core i7',
memory: '16GB',
storage: '512GB SSD'
},
tags: ['computer', 'business', 'mobile'],
createdAt: new Date().toISOString()
};
const { resource: createdItem } = await container.items.create(newItem);
console.log('Created item:', createdItem);
// Read document
const { resource: readItem } = await container.item('item-001', 'electronics').read();
console.log('Read item:', readItem);
// Update document
readItem.price = 1150;
readItem.updatedAt = new Date().toISOString();
const { resource: updatedItem } = await container.item('item-001', 'electronics').replace(readItem);
console.log('Updated item:', updatedItem);
// Execute query
const querySpec = {
query: 'SELECT * FROM c WHERE c.category = @category AND c.price < @maxPrice',
parameters: [
{ name: '@category', value: 'electronics' },
{ name: '@maxPrice', value: 1500 }
]
};
const { resources: results } = await container.items.query(querySpec).fetchAll();
console.log('Query results:', results);
// Delete document
await container.item('item-001', 'electronics').delete();
console.log('Item deleted');
} catch (error) {
console.error('Error during operations:', error);
}
}
basicOperations();
Data Modeling
// Complex document structure example
const customerDocument = {
id: 'customer-12345',
partitionKey: 'customer-12345',
customerType: 'premium',
profile: {
firstName: 'John',
lastName: 'Doe',
email: '[email protected]',
phone: '+1-555-123-4567',
birthDate: '1985-04-15',
address: {
zipCode: '10001',
state: 'New York',
city: 'New York',
street: '123 Main St',
apartment: 'Apt 4B'
}
},
orders: [
{
orderId: 'order-001',
orderDate: '2024-01-15T10:30:00Z',
status: 'delivered',
items: [
{
productId: 'prod-001',
name: 'Wireless Headphones',
quantity: 1,
unitPrice: 250,
totalPrice: 250
}
],
totalAmount: 250,
deliveryAddress: {
zipCode: '10001',
state: 'New York',
city: 'New York',
street: '123 Main St'
}
}
],
preferences: {
language: 'en',
currency: 'USD',
notifications: {
email: true,
sms: false,
push: true
},
categories: ['electronics', 'books', 'clothing']
},
loyaltyProgram: {
tier: 'gold',
points: 15000,
joinDate: '2023-01-01T00:00:00Z'
},
metadata: {
createdAt: '2023-01-01T00:00:00Z',
updatedAt: '2024-01-15T15:45:00Z',
version: 2,
source: 'mobile-app'
}
};
// Indexing strategy
async function indexingStrategy() {
// Composite index configuration example
const indexingPolicy = {
indexingMode: 'consistent',
automatic: true,
includedPaths: [
{
path: '/*'
}
],
excludedPaths: [
{
path: '/metadata/*'
}
],
compositeIndexes: [
[
{ path: '/customerType', order: 'ascending' },
{ path: '/profile/email', order: 'ascending' }
],
[
{ path: '/orders/[]/status', order: 'ascending' },
{ path: '/orders/[]/orderDate', order: 'descending' }
]
]
};
console.log('Indexing strategy:', indexingPolicy);
}
Vector Search & AI Features
// Vector search implementation example
async function vectorSearchExample() {
// Document with vector embedding
const documentWithVector = {
id: 'doc-vector-001',
partitionKey: 'documents',
title: 'Introduction to Azure Cosmos DB',
content: 'Azure Cosmos DB is a globally distributed, multi-model database service.',
category: 'technology',
// Vector embedding generated by OpenAI or similar
contentVector: [0.1, 0.2, -0.3, 0.4, 0.5, /* ... 1536-dimensional vector */],
createdAt: new Date().toISOString()
};
await container.items.create(documentWithVector);
// Vector search query
const searchVector = [0.15, 0.25, -0.25, 0.35, 0.45]; // Search vector
const vectorSearchQuery = {
query: `
SELECT TOP 10 c.id, c.title, c.content,
VectorDistance(c.contentVector, @searchVector) AS similarity
FROM c
WHERE c.category = @category
ORDER BY VectorDistance(c.contentVector, @searchVector)
`,
parameters: [
{ name: '@searchVector', value: searchVector },
{ name: '@category', value: 'technology' }
]
};
const { resources: vectorResults } = await container.items.query(vectorSearchQuery).fetchAll();
console.log('Vector search results:', vectorResults);
// Hybrid search (Full-text + Vector)
const hybridSearchQuery = {
query: `
SELECT c.id, c.title, c.content,
VectorDistance(c.contentVector, @searchVector) AS vectorSimilarity,
RANK FullTextScore(c.content, @searchTerms) AS textScore
FROM c
WHERE CONTAINS(c.content, @searchTerms)
ORDER BY (VectorDistance(c.contentVector, @searchVector) * 0.6 +
(1.0 - RANK FullTextScore(c.content, @searchTerms)) * 0.4)
`,
parameters: [
{ name: '@searchVector', value: searchVector },
{ name: '@searchTerms', value: 'database service' }
]
};
const { resources: hybridResults } = await container.items.query(hybridSearchQuery).fetchAll();
console.log('Hybrid search results:', hybridResults);
}
Performance Optimization
// High-performance operations implementation
class CosmosDBOptimizer {
constructor(container) {
this.container = container;
}
// Bulk operations
async bulkInsert(documents) {
const operations = documents.map(doc => ({
operationType: 'Create',
resourceBody: doc
}));
const { result, statusCode } = await this.container.items.bulk(operations);
console.log(`Bulk insert completed: ${result.length} items, Status: ${statusCode}`);
return result;
}
// Efficient pagination
async paginateItems(querySpec, pageSize = 100) {
const iterator = this.container.items.query(querySpec, {
maxItemCount: pageSize
});
const results = [];
while (iterator.hasMoreResults()) {
const { resources, continuationToken } = await iterator.fetchNext();
results.push(...resources);
console.log(`Retrieved: ${resources.length} items, Continuation: ${continuationToken}`);
if (results.length >= 1000) { // Stop at 1000 items max
break;
}
}
return results;
}
// Transactional operations
async transactionalBatch(partitionKey, operations) {
const transactionalBatch = this.container.items.batch(operations, partitionKey);
const { result } = await transactionalBatch.execute();
console.log('Transaction execution result:', result);
return result;
}
// Efficient aggregation queries
async aggregateData() {
const aggregateQuery = {
query: `
SELECT
c.category,
COUNT(1) as itemCount,
AVG(c.price) as averagePrice,
SUM(c.price) as totalValue,
MIN(c.price) as minPrice,
MAX(c.price) as maxPrice
FROM c
GROUP BY c.category
`
};
const { resources } = await this.container.items.query(aggregateQuery).fetchAll();
return resources;
}
}
// Usage example
const optimizer = new CosmosDBOptimizer(container);
// Performance monitoring
async function monitorPerformance() {
const startTime = Date.now();
try {
const result = await container.items.query({
query: 'SELECT * FROM c WHERE c.category = @category',
parameters: [{ name: '@category', value: 'electronics' }]
}).fetchAll();
const endTime = Date.now();
const duration = endTime - startTime;
console.log(`Query execution time: ${duration}ms`);
console.log(`Result count: ${result.resources.length}`);
console.log(`RU consumption: ${result.requestCharge}`);
} catch (error) {
console.error('Query error:', error);
}
}
Security & Best Practices
// Security configuration and best practices
const { DefaultAzureCredential } = require('@azure/identity');
// Azure AD authentication
const aadClient = new CosmosClient({
endpoint: 'https://myaccount.documents.azure.com:443/',
aadCredentials: new DefaultAzureCredential()
});
// Data encryption configuration
const encryptionConfig = {
encryptionKey: {
encryptionAlgorithm: 'AEAD_AES_256_CBC_HMAC_SHA256',
wrappingAlgorithm: 'RSA_OAEP',
keyWrapMetadata: {
name: 'my-key',
type: 'AzureKeyVault',
value: 'https://my-keyvault.vault.azure.net/keys/my-key'
}
}
};
// Secure connection settings
const secureClient = new CosmosClient({
endpoint: 'https://myaccount.documents.azure.com:443/',
key: 'your-key',
connectionPolicy: {
enableEndpointDiscovery: false,
preferredLocations: ['East US'],
useMultipleWriteLocations: false
},
plugins: [
{
on: 'request',
plugin: async (context, diagNode) => {
// Add security headers
context.headers['x-ms-client-version'] = 'secure-app-v1.0';
return context;
}
}
]
});
// Resource-based access control
async function setupResourcePermissions() {
// Resource token generation example for read-only user
const permissions = {
permissionMode: 'Read',
resource: 'dbs/mydatabase/colls/mycontainer',
tokenExpiryTime: new Date(Date.now() + 60 * 60 * 1000) // Valid for 1 hour
};
console.log('Resource permission settings:', permissions);
}
// Data masking
function maskSensitiveData(document) {
const masked = { ...document };
// Mask personal information
if (masked.profile?.email) {
masked.profile.email = masked.profile.email.replace(/(.{2}).*(@.*)/, '$1***$2');
}
if (masked.profile?.phone) {
masked.profile.phone = masked.profile.phone.replace(/(\d{3}).*(\d{4})/, '$1-****-$2');
}
return masked;
}