OpenSearch

Open-source fork of Elasticsearch. Distributed search and analytics engine supporting log analysis, real-time monitoring, and security analytics. Developed under AWS leadership.

Search EngineOpen SourceDistributedAWSElasticsearch ForkAnalyticsVector SearchReal-time

Server

OpenSearch

Overview

OpenSearch is an open-source distributed search and analytics engine that emerged as an AWS-led fork of Elasticsearch. In 2024, it transitioned to Linux Foundation governance and has evolved into a large-scale community project with over 1,400 contributors and 100+ GitHub repositories. It features vector search, hybrid search, and AI-driven search capabilities with excellent integration in AWS environments.

Details

OpenSearch 2024 edition has established its own identity beyond simple Elasticsearch compatibility. It includes Facebook FAISS integration, SIMD hardware acceleration, vector quantization for high-performance semantic search, cross-cluster replication, trace analytics, data streams, transforms, new observability UI, and significant improvements to k-NN, anomaly detection, PPL, SQL, and alerting features. With native integration to AWS IAM, KMS, and CloudWatch, it's optimized for AWS environment operations.

Key Features

  • Vector & Hybrid Search: Next-generation search combining semantic and keyword search capabilities
  • Distributed Architecture: Horizontal scaling and high availability through distributed design
  • AWS Integration: Native integration with IAM, KMS, and CloudWatch
  • Observability: Comprehensive trace analytics and monitoring capabilities
  • AI & Machine Learning: Built-in anomaly detection and neural search features
  • Real-time Analytics: Immediate analysis and alerting on streaming data

Pros and Cons

Pros

  • Open governance and long-term stability under Linux Foundation management
  • Excellent AWS environment integration with managed service (Amazon OpenSearch Service)
  • Support for next-generation workloads through vector search and AI features
  • Freedom from Elasticsearch licensing issues and true open-source nature
  • Active community (1,400+ contributors) providing continuous development
  • Rich library of AWS-authored plugins and features

Cons

  • May underperform Elasticsearch in enterprise-scale or complex query scenarios
  • Limited plugin ecosystem compared to Elasticsearch
  • More mature toolchain available for Elasticsearch outside AWS environments
  • Potential compatibility challenges when migrating from Elasticsearch
  • Some advanced Elastic Stack features are not available
  • Commercial support options are not as extensive as Elasticsearch

Reference Pages

Code Examples

Setup and Installation

# Docker execution
docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:latest

# Docker Compose cluster configuration
cat > docker-compose.yml << 'EOF'
version: '3'
services:
  opensearch-node1:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
      - 9600:9600
    networks:
      - opensearch-net
      
  opensearch-node2:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node2
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node2
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net

  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:latest
    container_name: opensearch-dashboards
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      OPENSEARCH_HOSTS: '["https://opensearch-node1:9200","https://opensearch-node2:9200"]'
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:
EOF

docker-compose up -d

# Binary installation on Linux
wget https://artifacts.opensearch.org/releases/bundle/opensearch/2.11.1/opensearch-2.11.1-linux-x64.tar.gz
tar -xzf opensearch-2.11.1-linux-x64.tar.gz
cd opensearch-2.11.1

# Edit configuration file
vi config/opensearch.yml

# Start OpenSearch
./bin/opensearch

Index Creation and Document Management

# Create index
curl -X PUT "localhost:9200/movies" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "standard"
      },
      "overview": {
        "type": "text",
        "analyzer": "standard"
      },
      "genre": {
        "type": "keyword"
      },
      "release_date": {
        "type": "date"
      },
      "rating": {
        "type": "float"
      },
      "location": {
        "type": "geo_point"
      },
      "embedding": {
        "type": "knn_vector",
        "dimension": 512,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "nmslib"
        }
      }
    }
  }
}'

# Add single document
curl -X POST "localhost:9200/movies/_doc/1" -H 'Content-Type: application/json' -d'
{
  "title": "Avengers: Endgame",
  "overview": "The epic conclusion to the Marvel Cinematic Universe",
  "genre": ["Action", "Adventure", "Sci-Fi"],
  "release_date": "2019-04-26",
  "rating": 8.4,
  "director": "Russo Brothers",
  "studio": "Marvel Studios",
  "location": {
    "lat": 40.7589,
    "lon": -73.9851
  }
}'

# Bulk document addition
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_index":"movies","_id":"2"}}
{"title":"Your Name","overview":"A time-transcending youth love story","genre":["Animation","Romance","Drama"],"release_date":"2016-08-26","rating":8.4,"director":"Makoto Shinkai"}
{"index":{"_index":"movies","_id":"3"}}
{"title":"Parasite","overview":"Korean film depicting class disparity","genre":["Thriller","Drama","Comedy"],"release_date":"2019-05-30","rating":8.6,"director":"Bong Joon-ho"}
{"index":{"_index":"movies","_id":"4"}}
{"title":"Top Gun: Maverick","overview":"Tom Cruise sequel","genre":["Action","Drama"],"release_date":"2022-05-27","rating":8.3,"director":"Joseph Kosinski"}
'

# Update document
curl -X POST "localhost:9200/movies/_update/1" -H 'Content-Type: application/json' -d'
{
  "doc": {
    "rating": 8.5,
    "updated_at": "2024-01-15"
  }
}'

# Delete document
curl -X DELETE "localhost:9200/movies/_doc/1"

Search Query Implementation

# Basic search
curl -X GET "localhost:9200/movies/_search?q=Avengers"

# Structured search
curl -X GET "localhost:9200/movies/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "title": "Avengers"
    }
  },
  "size": 10,
  "from": 0
}'

# Complex search (Bool Query)
curl -X GET "localhost:9200/movies/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        {"match": {"overview": "Marvel"}}
      ],
      "filter": [
        {"range": {"rating": {"gte": 8.0}}},
        {"term": {"genre": "Action"}}
      ],
      "must_not": [
        {"term": {"genre": "Horror"}}
      ],
      "should": [
        {"match": {"director": "Russo Brothers"}}
      ]
    }
  },
  "sort": [
    {"rating": {"order": "desc"}},
    {"release_date": {"order": "desc"}}
  ]
}'

# Faceted search (Aggregations)
curl -X GET "localhost:9200/movies/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "genres": {
      "terms": {
        "field": "genre",
        "size": 10
      }
    },
    "avg_rating": {
      "avg": {
        "field": "rating"
      }
    },
    "rating_histogram": {
      "histogram": {
        "field": "rating",
        "interval": 1
      }
    },
    "release_years": {
      "date_histogram": {
        "field": "release_date",
        "calendar_interval": "year"
      }
    }
  }
}'

# Geographic search
curl -X GET "localhost:9200/movies/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "filter": {
        "geo_distance": {
          "distance": "10km",
          "location": {
            "lat": 40.7589,
            "lon": -73.9851
          }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 40.7589,
          "lon": -73.9851
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}'

Vector Search and AI Features

# k-NN vector search configuration
curl -X PUT "localhost:9200/documents" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "content": {
        "type": "text"
      },
      "embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }
    }
  }
}'

# Add vector document
curl -X POST "localhost:9200/documents/_doc" -H 'Content-Type: application/json' -d'
{
  "title": "AI Technology Advancement",
  "content": "Artificial intelligence technology is rapidly developing, with machine learning and deep learning being utilized across various fields.",
  "embedding": [0.1, 0.2, 0.3, ...]
}'

# Execute k-NN search
curl -X GET "localhost:9200/documents/_search" -H 'Content-Type: application/json' -d'
{
  "size": 5,
  "query": {
    "knn": {
      "embedding": {
        "vector": [0.15, 0.25, 0.35, ...],
        "k": 10
      }
    }
  }
}'

# Hybrid search (Keyword + Vector)
curl -X GET "localhost:9200/documents/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "content": "artificial intelligence"
          }
        },
        {
          "knn": {
            "embedding": {
              "vector": [0.15, 0.25, 0.35, ...],
              "k": 10
            }
          }
        }
      ]
    }
  }
}'

# Neural search (semantic search)
curl -X GET "localhost:9200/documents/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "neural": {
      "embedding": {
        "query_text": "machine learning algorithms",
        "model_id": "huggingface_embeddings",
        "k": 10
      }
    }
  }
}'

Advanced Configuration and Performance Optimization

# Cluster settings
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    "cluster.max_shards_per_node": 3000,
    "search.max_buckets": 65536
  }
}'

# Create index template
curl -X PUT "localhost:9200/_index_template/logs_template" -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs_policy",
      "index.lifecycle.rollover_alias": "logs"
    },
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "level": {
          "type": "keyword"
        },
        "message": {
          "type": "text",
          "analyzer": "standard"
        },
        "service": {
          "type": "keyword"
        },
        "host": {
          "type": "keyword"
        }
      }
    }
  }
}'

# Index State Management (ISM) policy configuration
curl -X PUT "localhost:9200/_plugins/_ism/policies/log_policy" -H 'Content-Type: application/json' -d'
{
  "policy": {
    "description": "Log retention policy",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_size": "5gb",
              "min_doc_count": 1000000,
              "min_index_age": "1d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": {
              "min_index_age": "7d"
            }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "replica_count": {
              "number_of_replicas": 0
            }
          }
        ],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": {
              "min_index_age": "30d"
            }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [
          {
            "delete": {}
          }
        ]
      }
    ]
  }
}'

# Performance monitoring
curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:9200/_nodes/stats?pretty"
curl -X GET "localhost:9200/_cat/indices?v&s=store.size:desc"
curl -X GET "localhost:9200/_cat/shards?v&s=store:desc"

Security and Access Control

# Security plugin configuration (opensearch.yml)
cat >> config/opensearch.yml << 'EOF'
plugins.security.ssl.transport.pemcert_filepath: certs/opensearch.pem
plugins.security.ssl.transport.pemkey_filepath: certs/opensearch-key.pem
plugins.security.ssl.transport.pemtrustedcas_filepath: certs/root-ca.pem
plugins.security.ssl.transport.enforce_hostname_verification: false
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: certs/opensearch.pem
plugins.security.ssl.http.pemkey_filepath: certs/opensearch-key.pem
plugins.security.ssl.http.pemtrustedcas_filepath: certs/root-ca.pem
plugins.security.allow_unsafe_democertificates: true
plugins.security.allow_default_init_securityindex: true
plugins.security.authcz.admin_dn:
  - CN=opensearch-admin,OU=IT,O=Example,L=Tokyo,ST=Tokyo,C=JP
plugins.security.nodes_dn:
  - CN=opensearch-node,OU=IT,O=Example,L=Tokyo,ST=Tokyo,C=JP
plugins.security.audit.type: internal_opensearch
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]
plugins.security.system_indices.enabled: true
plugins.security.system_indices.indices:
  [
    ".opendistro-alerting-config",
    ".opendistro-alerting-alert*",
    ".opendistro-anomaly-results*",
    ".opendistro-anomaly-detector*",
    ".opendistro-anomaly-checkpoints",
    ".opendistro-anomaly-detection-state",
    ".opendistro-reports-*",
    ".opensearch-notifications-*",
    ".opensearch-notebooks",
    ".opensearch-observability",
    ".opendistro-asynchronous-search-response*",
    ".replication-metadata-store"
  ]
EOF

# Create user
curl -X PUT "https://localhost:9200/_plugins/_security/api/internalusers/analyst" \
  -u admin:admin -k -H 'Content-Type: application/json' -d'
{
  "password": "analyst@123",
  "opendistro_security_roles": ["readall"],
  "backend_roles": ["analytics_team"],
  "attributes": {
    "department": "analytics"
  }
}'

# Create role
curl -X PUT "https://localhost:9200/_plugins/_security/api/roles/movie_reader" \
  -u admin:admin -k -H 'Content-Type: application/json' -d'
{
  "cluster_permissions": ["cluster_monitor"],
  "index_permissions": [
    {
      "index_patterns": ["movies*"],
      "allowed_actions": ["read", "indices:data/read/*"]
    }
  ]
}'

# Create API Key
curl -X POST "https://localhost:9200/_plugins/_security/api/account" \
  -u admin:admin -k -H 'Content-Type: application/json' -d'
{
  "current_password": "admin",
  "password": "new_password_123"
}'

AWS Integration and Managed Services

# AWS CloudFormation Template (Amazon OpenSearch Service)
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  OpenSearchDomain:
    Type: AWS::OpenSearch::Domain
    Properties:
      DomainName: my-opensearch-domain
      EngineVersion: OpenSearch_2.11
      ClusterConfig:
        InstanceType: t3.medium.search
        InstanceCount: 3
        DedicatedMasterEnabled: true
        MasterInstanceType: t3.small.search
        MasterInstanceCount: 3
      EBSOptions:
        EBSEnabled: true
        VolumeType: gp3
        VolumeSize: 100
      VPCOptions:
        SecurityGroupIds:
          - !Ref OpenSearchSecurityGroup
        SubnetIds:
          - !Ref PrivateSubnet1
          - !Ref PrivateSubnet2
      EncryptionAtRestOptions:
        Enabled: true
      NodeToNodeEncryptionOptions:
        Enabled: true
      DomainEndpointOptions:
        EnforceHTTPS: true
      AccessPolicies:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'
            Action: 'es:*'
            Resource: !Sub 'arn:aws:es:${AWS::Region}:${AWS::AccountId}:domain/my-opensearch-domain/*'

  OpenSearchSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for OpenSearch domain
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          SourceSecurityGroupId: !Ref ApplicationSecurityGroup
# Python boto3 AWS OpenSearch Service operations
import boto3
from opensearchpy import OpenSearch, RequestsHttpConnection
from aws_requests_auth.aws_auth import AWSRequestsAuth

# AWS credentials configuration
session = boto3.Session()
credentials = session.get_credentials()
region = 'us-east-1'
service = 'es'
host = 'search-my-domain-xxx.us-east-1.es.amazonaws.com'

awsauth = AWSRequestsAuth(credentials, region, service)

# Create OpenSearch client
client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

# Create index
response = client.indices.create(
    index='logs',
    body={
        'settings': {
            'number_of_shards': 2,
            'number_of_replicas': 1
        },
        'mappings': {
            'properties': {
                'timestamp': {'type': 'date'},
                'message': {'type': 'text'},
                'level': {'type': 'keyword'},
                'service': {'type': 'keyword'}
            }
        }
    }
)

# Add document
response = client.index(
    index='logs',
    body={
        'timestamp': '2024-01-15T10:00:00',
        'message': 'Application started successfully',
        'level': 'INFO',
        'service': 'web-app'
    }
)

# Execute search
response = client.search(
    index='logs',
    body={
        'query': {
            'bool': {
                'must': [
                    {'match': {'message': 'error'}}
                ],
                'filter': [
                    {'range': {'timestamp': {'gte': 'now-1d'}}}
                ]
            }
        },
        'sort': [
            {'timestamp': {'order': 'desc'}}
        ]
    }
)

print(f"Found {response['hits']['total']['value']} documents")

Advanced Features and Observability

# Anomaly detection setup
curl -X POST "localhost:9200/_plugins/_anomaly_detection/detectors" -H 'Content-Type: application/json' -d'
{
  "name": "cpu-anomaly-detector",
  "description": "Detect CPU usage anomalies",
  "time_field": "@timestamp",
  "indices": ["system-metrics-*"],
  "feature_attributes": [
    {
      "feature_name": "cpu_usage",
      "feature_enabled": true,
      "aggregation_query": {
        "avg_cpu": {
          "avg": {
            "field": "cpu.percentage"
          }
        }
      }
    }
  ],
  "filter_query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "gte": "now-1h"
            }
          }
        }
      ]
    }
  },
  "detection_interval": {
    "period": {
      "interval": 10,
      "unit": "Minutes"
    }
  },
  "window_delay": {
    "period": {
      "interval": 1,
      "unit": "Minutes"
    }
  }
}'

# SQL queries on OpenSearch
curl -X POST "localhost:9200/_plugins/_sql" -H 'Content-Type: application/json' -d'
{
  "query": "SELECT genre, AVG(rating) as avg_rating FROM movies GROUP BY genre ORDER BY avg_rating DESC"
}'

# Piped Processing Language (PPL) queries
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{
  "query": "source=movies | where rating > 8.0 | stats avg(rating) by genre | sort avg_rating desc"
}'

# Trace analytics configuration
curl -X PUT "localhost:9200/_plugins/_trace/settings" -H 'Content-Type: application/json' -d'
{
  "cluster.trace.enable": true,
  "cluster.trace.indices": ["jaeger-span-*"],
  "cluster.trace.service_map.enabled": true
}'

# Alerting configuration
curl -X POST "localhost:9200/_plugins/_alerting/monitors" -H 'Content-Type: application/json' -d'
{
  "type": "monitor",
  "name": "High Error Rate Monitor",
  "enabled": true,
  "schedule": {
    "period": {
      "interval": 1,
      "unit": "MINUTES"
    }
  },
  "inputs": [
    {
      "search": {
        "indices": ["application-logs-*"],
        "query": {
          "size": 0,
          "query": {
            "bool": {
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-5m"
                    }
                  }
                },
                {
                  "term": {
                    "level": "ERROR"
                  }
                }
              ]
            }
          },
          "aggs": {
            "error_count": {
              "value_count": {
                "field": "_id"
              }
            }
          }
        }
      }
    }
  ],
  "triggers": [
    {
      "name": "High Error Count",
      "severity": "1",
      "condition": {
        "script": {
          "source": "ctx.results[0].aggregations.error_count.value > 100"
        }
      },
      "actions": [
        {
          "name": "Send Email Alert",
          "destination_id": "email-destination-id",
          "message_template": {
            "source": "High error rate detected: {{ctx.results.0.aggregations.error_count.value}} errors in the last 5 minutes"
          },
          "throttle_enabled": true,
          "throttle": {
            "value": 60,
            "unit": "MINUTES"
          }
        }
      ]
    }
  ]
}'

OpenSearch is a modern search and analytics platform that has established its own identity from an Elasticsearch fork. It provides an excellent choice for building next-generation applications that leverage AWS environment operations and AI/vector search capabilities, offering commercial-grade functionality while maintaining the benefits of open-source software.