Database

Azure Cosmos DB

Overview

Azure Cosmos DB is a globally distributed, multi-model database service provided by Microsoft. It features high availability, automatic scaling, and single-digit millisecond response times, combining NoSQL and vector database capabilities in a modern cloud-native database. It supports multiple APIs and serves applications of any scale worldwide.

Details

Azure Cosmos DB was announced by Microsoft in 2017, evolving from the previous DocumentDB service into a next-generation database platform. It features the following characteristics:

Key Features

  • Global Distribution: Instant data replication across 54+ regions worldwide
  • Multi-Model: Supports SQL, MongoDB, Cassandra, Gremlin, and Table APIs
  • Automatic Scaling: Unlimited scaling with autopilot functionality
  • Low Latency: Sub-10ms read and write operations
  • Multiple Consistency Levels: Choose from Strong, Bounded staleness, Session, Consistent prefix, and Eventual
  • AI Integration: Vector search, full-text search, and hybrid search capabilities
  • Unlimited Throughput: Performance guarantees at any scale
  • Comprehensive SLA: 99.999% availability, latency, throughput, and consistency guarantees

2024-2025 New Features

  • Vector Search: High-performance vector search powered by DiskANN algorithms
  • Full-Text Search: Keyword and phrase search using BM25 algorithm
  • Hybrid Search: Combination of vector search and full-text search
  • Multi-Language Support: English, French, Spanish, and German support
  • AI Foundry Integration: Integration with Azure AI for agent application development
  • Fuzzy Search: Resilience to typos and text variations

Architecture

  • Data Model: Hierarchical structure (Account > Database > Container > Item)
  • Automatic Indexing: Automatic index creation for all fields
  • Schema-Agnostic: Flexible JSON document structure
  • Partitioning: Horizontal scaling through partitioning
  • Multi-Master: Write operations across multiple regions

Consistency Levels

  • Strong: Linearizability guarantee
  • Bounded staleness: Configurable lag bounds
  • Session: Read-your-writes consistency within single session
  • Consistent prefix: Prefix consistency
  • Eventual: Eventual consistency

Advantages and Disadvantages

Advantages

  • Global Scale: Automatic data distribution and replication worldwide
  • High Performance: Single-digit millisecond response times with auto-scaling
  • Comprehensive SLA: 99.999% availability, latency, and throughput guarantees
  • Multi-Model: Multiple APIs and data models in a single service
  • Zero Operations: Fully managed service with no infrastructure management
  • AI Capabilities: Vector search, full-text search, and AI integration
  • Flexible Consistency: Consistency level selection based on application needs
  • Azure Integration: Complete integration with Azure ecosystem

Disadvantages

  • High Cost: Pay-per-use pricing can be expensive for large-scale usage
  • Azure Dependency: Available only on Azure cloud, vendor lock-in
  • Learning Curve: Understanding multiple APIs and configuration options required
  • Complex Pricing: RU (Request Unit) based billing structure
  • Data Size Limits: 2MB limit per item
  • Query Limitations: Constraints on complex queries and aggregation operations
  • Regional Limitations: Not available in all regions

Key Links

Code Examples

Installation & Setup

# Create Cosmos DB account with Azure CLI
az cosmosdb create \
  --name mycosmosaccount \
  --resource-group myresourcegroup \
  --default-consistency-level Session \
  --locations regionName="East US" failoverPriority=0 isZoneRedundant=False

# Create database and container
az cosmosdb sql database create \
  --account-name mycosmosaccount \
  --name mydatabase \
  --resource-group myresourcegroup

az cosmosdb sql container create \
  --account-name mycosmosaccount \
  --database-name mydatabase \
  --name mycontainer \
  --partition-key-path "/partitionKey" \
  --throughput 400 \
  --resource-group myresourcegroup

# Install Node.js SDK
npm install @azure/cosmos

# Install .NET SDK
dotnet add package Microsoft.Azure.Cosmos

# Install Python SDK
pip install azure-cosmos

Basic Operations (CRUD)

// Connection and basic operations with Node.js
const { CosmosClient } = require('@azure/cosmos');

// Connection configuration
const client = new CosmosClient({
  endpoint: 'https://myaccount.documents.azure.com:443/',
  key: 'your-primary-key'
});

const database = client.database('mydatabase');
const container = database.container('mycontainer');

async function basicOperations() {
  try {
    // Create document
    const newItem = {
      id: 'item-001',
      partitionKey: 'electronics',
      name: 'Laptop Computer',
      brand: 'Sample Brand',
      price: 1200,
      category: 'electronics',
      specifications: {
        cpu: 'Intel Core i7',
        memory: '16GB',
        storage: '512GB SSD'
      },
      tags: ['computer', 'business', 'mobile'],
      createdAt: new Date().toISOString()
    };

    const { resource: createdItem } = await container.items.create(newItem);
    console.log('Created item:', createdItem);

    // Read document
    const { resource: readItem } = await container.item('item-001', 'electronics').read();
    console.log('Read item:', readItem);

    // Update document
    readItem.price = 1150;
    readItem.updatedAt = new Date().toISOString();
    const { resource: updatedItem } = await container.item('item-001', 'electronics').replace(readItem);
    console.log('Updated item:', updatedItem);

    // Execute query
    const querySpec = {
      query: 'SELECT * FROM c WHERE c.category = @category AND c.price < @maxPrice',
      parameters: [
        { name: '@category', value: 'electronics' },
        { name: '@maxPrice', value: 1500 }
      ]
    };

    const { resources: results } = await container.items.query(querySpec).fetchAll();
    console.log('Query results:', results);

    // Delete document
    await container.item('item-001', 'electronics').delete();
    console.log('Item deleted');

  } catch (error) {
    console.error('Error during operations:', error);
  }
}

basicOperations();

Data Modeling

// Complex document structure example
const customerDocument = {
  id: 'customer-12345',
  partitionKey: 'customer-12345',
  customerType: 'premium',
  profile: {
    firstName: 'John',
    lastName: 'Doe',
    email: '[email protected]',
    phone: '+1-555-123-4567',
    birthDate: '1985-04-15',
    address: {
      zipCode: '10001',
      state: 'New York',
      city: 'New York',
      street: '123 Main St',
      apartment: 'Apt 4B'
    }
  },
  orders: [
    {
      orderId: 'order-001',
      orderDate: '2024-01-15T10:30:00Z',
      status: 'delivered',
      items: [
        {
          productId: 'prod-001',
          name: 'Wireless Headphones',
          quantity: 1,
          unitPrice: 250,
          totalPrice: 250
        }
      ],
      totalAmount: 250,
      deliveryAddress: {
        zipCode: '10001',
        state: 'New York',
        city: 'New York',
        street: '123 Main St'
      }
    }
  ],
  preferences: {
    language: 'en',
    currency: 'USD',
    notifications: {
      email: true,
      sms: false,
      push: true
    },
    categories: ['electronics', 'books', 'clothing']
  },
  loyaltyProgram: {
    tier: 'gold',
    points: 15000,
    joinDate: '2023-01-01T00:00:00Z'
  },
  metadata: {
    createdAt: '2023-01-01T00:00:00Z',
    updatedAt: '2024-01-15T15:45:00Z',
    version: 2,
    source: 'mobile-app'
  }
};

// Indexing strategy
async function indexingStrategy() {
  // Composite index configuration example
  const indexingPolicy = {
    indexingMode: 'consistent',
    automatic: true,
    includedPaths: [
      {
        path: '/*'
      }
    ],
    excludedPaths: [
      {
        path: '/metadata/*'
      }
    ],
    compositeIndexes: [
      [
        { path: '/customerType', order: 'ascending' },
        { path: '/profile/email', order: 'ascending' }
      ],
      [
        { path: '/orders/[]/status', order: 'ascending' },
        { path: '/orders/[]/orderDate', order: 'descending' }
      ]
    ]
  };

  console.log('Indexing strategy:', indexingPolicy);
}

Vector Search & AI Features

// Vector search implementation example
async function vectorSearchExample() {
  // Document with vector embedding
  const documentWithVector = {
    id: 'doc-vector-001',
    partitionKey: 'documents',
    title: 'Introduction to Azure Cosmos DB',
    content: 'Azure Cosmos DB is a globally distributed, multi-model database service.',
    category: 'technology',
    // Vector embedding generated by OpenAI or similar
    contentVector: [0.1, 0.2, -0.3, 0.4, 0.5, /* ... 1536-dimensional vector */],
    createdAt: new Date().toISOString()
  };

  await container.items.create(documentWithVector);

  // Vector search query
  const searchVector = [0.15, 0.25, -0.25, 0.35, 0.45]; // Search vector
  
  const vectorSearchQuery = {
    query: `
      SELECT TOP 10 c.id, c.title, c.content, 
             VectorDistance(c.contentVector, @searchVector) AS similarity
      FROM c 
      WHERE c.category = @category
      ORDER BY VectorDistance(c.contentVector, @searchVector)
    `,
    parameters: [
      { name: '@searchVector', value: searchVector },
      { name: '@category', value: 'technology' }
    ]
  };

  const { resources: vectorResults } = await container.items.query(vectorSearchQuery).fetchAll();
  console.log('Vector search results:', vectorResults);

  // Hybrid search (Full-text + Vector)
  const hybridSearchQuery = {
    query: `
      SELECT c.id, c.title, c.content,
             VectorDistance(c.contentVector, @searchVector) AS vectorSimilarity,
             RANK FullTextScore(c.content, @searchTerms) AS textScore
      FROM c 
      WHERE CONTAINS(c.content, @searchTerms)
      ORDER BY (VectorDistance(c.contentVector, @searchVector) * 0.6 + 
                (1.0 - RANK FullTextScore(c.content, @searchTerms)) * 0.4)
    `,
    parameters: [
      { name: '@searchVector', value: searchVector },
      { name: '@searchTerms', value: 'database service' }
    ]
  };

  const { resources: hybridResults } = await container.items.query(hybridSearchQuery).fetchAll();
  console.log('Hybrid search results:', hybridResults);
}

Performance Optimization

// High-performance operations implementation
class CosmosDBOptimizer {
  constructor(container) {
    this.container = container;
  }

  // Bulk operations
  async bulkInsert(documents) {
    const operations = documents.map(doc => ({
      operationType: 'Create',
      resourceBody: doc
    }));

    const { result, statusCode } = await this.container.items.bulk(operations);
    console.log(`Bulk insert completed: ${result.length} items, Status: ${statusCode}`);
    return result;
  }

  // Efficient pagination
  async paginateItems(querySpec, pageSize = 100) {
    const iterator = this.container.items.query(querySpec, {
      maxItemCount: pageSize
    });

    const results = [];
    while (iterator.hasMoreResults()) {
      const { resources, continuationToken } = await iterator.fetchNext();
      results.push(...resources);
      
      console.log(`Retrieved: ${resources.length} items, Continuation: ${continuationToken}`);
      
      if (results.length >= 1000) { // Stop at 1000 items max
        break;
      }
    }
    
    return results;
  }

  // Transactional operations
  async transactionalBatch(partitionKey, operations) {
    const transactionalBatch = this.container.items.batch(operations, partitionKey);
    const { result } = await transactionalBatch.execute();
    
    console.log('Transaction execution result:', result);
    return result;
  }

  // Efficient aggregation queries
  async aggregateData() {
    const aggregateQuery = {
      query: `
        SELECT 
          c.category,
          COUNT(1) as itemCount,
          AVG(c.price) as averagePrice,
          SUM(c.price) as totalValue,
          MIN(c.price) as minPrice,
          MAX(c.price) as maxPrice
        FROM c 
        GROUP BY c.category
      `
    };

    const { resources } = await this.container.items.query(aggregateQuery).fetchAll();
    return resources;
  }
}

// Usage example
const optimizer = new CosmosDBOptimizer(container);

// Performance monitoring
async function monitorPerformance() {
  const startTime = Date.now();
  
  try {
    const result = await container.items.query({
      query: 'SELECT * FROM c WHERE c.category = @category',
      parameters: [{ name: '@category', value: 'electronics' }]
    }).fetchAll();
    
    const endTime = Date.now();
    const duration = endTime - startTime;
    
    console.log(`Query execution time: ${duration}ms`);
    console.log(`Result count: ${result.resources.length}`);
    console.log(`RU consumption: ${result.requestCharge}`);
    
  } catch (error) {
    console.error('Query error:', error);
  }
}

Security & Best Practices

// Security configuration and best practices
const { DefaultAzureCredential } = require('@azure/identity');

// Azure AD authentication
const aadClient = new CosmosClient({
  endpoint: 'https://myaccount.documents.azure.com:443/',
  aadCredentials: new DefaultAzureCredential()
});

// Data encryption configuration
const encryptionConfig = {
  encryptionKey: {
    encryptionAlgorithm: 'AEAD_AES_256_CBC_HMAC_SHA256',
    wrappingAlgorithm: 'RSA_OAEP',
    keyWrapMetadata: {
      name: 'my-key',
      type: 'AzureKeyVault',
      value: 'https://my-keyvault.vault.azure.net/keys/my-key'
    }
  }
};

// Secure connection settings
const secureClient = new CosmosClient({
  endpoint: 'https://myaccount.documents.azure.com:443/',
  key: 'your-key',
  connectionPolicy: {
    enableEndpointDiscovery: false,
    preferredLocations: ['East US'],
    useMultipleWriteLocations: false
  },
  plugins: [
    {
      on: 'request',
      plugin: async (context, diagNode) => {
        // Add security headers
        context.headers['x-ms-client-version'] = 'secure-app-v1.0';
        return context;
      }
    }
  ]
});

// Resource-based access control
async function setupResourcePermissions() {
  // Resource token generation example for read-only user
  const permissions = {
    permissionMode: 'Read',
    resource: 'dbs/mydatabase/colls/mycontainer',
    tokenExpiryTime: new Date(Date.now() + 60 * 60 * 1000) // Valid for 1 hour
  };
  
  console.log('Resource permission settings:', permissions);
}

// Data masking
function maskSensitiveData(document) {
  const masked = { ...document };
  
  // Mask personal information
  if (masked.profile?.email) {
    masked.profile.email = masked.profile.email.replace(/(.{2}).*(@.*)/, '$1***$2');
  }
  
  if (masked.profile?.phone) {
    masked.profile.phone = masked.profile.phone.replace(/(\d{3}).*(\d{4})/, '$1-****-$2');
  }
  
  return masked;
}