Apache Solr

Enterprise search platform based on Apache Lucene. Provides rich search capabilities, faceted search, and distributed search. Supports XML and JSON APIs.

Search EngineApache LuceneFull-text SearchDistributedEnterpriseOpen SourceJavaRESTful

Server

Apache Solr

Overview

Apache Solr is an enterprise-grade search platform built on Apache Lucene. It provides advanced search capabilities, rich text analysis, distributed search, and a web-based management interface, enabling the construction of large-scale document management and high-performance search systems.

Details

Apache Solr is a powerful and feature-rich enterprise search platform that has been serving organizations for over 15 years. Built on Apache Lucene, it provides a mature, stable foundation for search applications with comprehensive features including distributed search through SolrCloud, advanced text analysis, faceted search, and extensive customization capabilities. With RESTful APIs and JSON/XML support, Solr integrates seamlessly into enterprise environments while offering robust security, monitoring, and management features.

Key Features

  • Advanced Search Capabilities: Full-text search, complex queries, faceted search, and spell checking
  • Scalable Architecture: Distributed processing through SolrCloud with automatic sharding and replication
  • Enterprise Features: Rich web management interface, RESTful APIs, security authentication/authorization
  • Text Analysis: Comprehensive language processing and analysis capabilities
  • High Performance: Optimized for large-scale document processing and search operations
  • Extensible Platform: Plugin architecture and customizable components

Pros and Cons

Pros

  • Mature and stable platform with extensive enterprise adoption and support
  • Comprehensive feature set including advanced search, faceting, and clustering
  • SolrCloud provides robust distributed architecture with high availability
  • Rich web-based administration interface for easy management and monitoring
  • Strong community support and extensive documentation
  • Flexible schema management and configuration options
  • Built-in security features and enterprise-grade authentication

Cons

  • Higher resource consumption and complexity compared to lightweight alternatives
  • Steeper learning curve for configuration and optimization
  • Java-based platform requiring JVM tuning and management
  • Can be overpowered for simple search use cases
  • Requires careful configuration for optimal performance in large deployments
  • Updates and maintenance can be complex in distributed environments

Reference Pages

Code Examples

Setup and Installation

# Prerequisites - Java 11 or later
java -version

# Install Java 11 (Ubuntu/Debian)
sudo apt update
sudo apt install openjdk-11-jdk

# Install Java 11 (CentOS/RHEL)
sudo yum install java-11-openjdk-devel

# Download and extract Solr
wget https://downloads.apache.org/lucene/solr/9.4.1/solr-9.4.1.tgz
tar -xzf solr-9.4.1.tgz

# Install Solr as a service
sudo tar xzf solr-9.4.1.tgz solr-9.4.1/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-9.4.1.tgz

# Start Solr service
sudo systemctl start solr
sudo systemctl enable solr

# Verify installation
curl "http://localhost:8983/solr/admin/info/system"

Docker Deployment

# Single Solr instance
docker run --name my_solr -d -p 8983:8983 -t solr:9.4

# With persistent data volume
docker run --name my_solr -d -p 8983:8983 \
  -v solr_data:/var/solr \
  -t solr:9.4

# Custom configuration
docker run --name my_solr -d -p 8983:8983 \
  -v $PWD/myconfig:/opt/solr/myconfig \
  -t solr:9.4 \
  solr-create -c mycollection -d /opt/solr/myconfig

# SolrCloud with Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  zookeeper:
    image: zookeeper:3.8
    hostname: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=zookeeper:2888:3888;2181
    volumes:
      - zoo_data:/data
      - zoo_logs:/datalog

  solr1:
    image: solr:9.4
    hostname: solr1
    ports:
      - "8981:8983"
    environment:
      - ZK_HOST=zookeeper:2181
    depends_on:
      - zookeeper
    volumes:
      - solr1_data:/var/solr

volumes:
  zoo_data:
  zoo_logs:
  solr1_data:
EOF

docker-compose up -d

Index Creation and Data Ingestion

# Start Solr
bin/solr start

# Create a collection (standalone)
bin/solr create -c mycore

# Create a collection (SolrCloud)
bin/solr create -c mycollection -d sample_techproducts_configs -shards 2 -replicationFactor 2

# Add documents using JSON
curl -X POST "http://localhost:8983/solr/mycollection/update?commit=true" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "id": "1",
      "title": "Introduction to Apache Solr",
      "content": "Solr is a powerful search engine",
      "category": "technology",
      "published_date": "2024-01-15T10:00:00Z",
      "price": 29.99,
      "tags": ["search", "lucene", "apache"]
    },
    {
      "id": "2",
      "title": "Search System Design",
      "content": "Building efficient search systems",
      "category": "engineering",
      "published_date": "2024-01-16T14:30:00Z",
      "price": 39.99,
      "tags": ["design", "architecture", "systems"]
    }
  ]'

# Add documents using XML
curl -X POST "http://localhost:8983/solr/mycollection/update?commit=true" \
  -H "Content-Type: application/xml" \
  -d '<add>
    <doc>
      <field name="id">3</field>
      <field name="title">Advanced Lucene Techniques</field>
      <field name="content">Advanced features of Lucene</field>
      <field name="category">tutorial</field>
      <field name="price">49.99</field>
    </doc>
  </add>'

# Bulk import from CSV
curl -X POST "http://localhost:8983/solr/mycollection/update?commit=true" \
  -H "Content-Type: application/csv" \
  --data-binary @data.csv

# Atomic updates
curl -X POST "http://localhost:8983/solr/mycollection/update?commit=true" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "id": "1",
      "content": {"set": "Updated content description"},
      "views": {"inc": 1},
      "tags": {"add": ["updated"]}
    }
  ]'

Search Query Implementation

# Basic queries
curl "http://localhost:8983/solr/mycollection/select?q=*:*"
curl "http://localhost:8983/solr/mycollection/select?q=Solr"
curl "http://localhost:8983/solr/mycollection/select?q=title:Apache"
curl "http://localhost:8983/solr/mycollection/select?q=content:\"search engine\""

# Boolean queries
curl "http://localhost:8983/solr/mycollection/select?q=title:Apache AND category:technology"
curl "http://localhost:8983/solr/mycollection/select?q=title:Apache OR title:Lucene"
curl "http://localhost:8983/solr/mycollection/select?q=title:search NOT category:old"

# Range queries and filters
curl "http://localhost:8983/solr/mycollection/select?q=*:*&fq=published_date:[2024-01-01T00:00:00Z TO 2024-12-31T23:59:59Z]"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&fq=price:[10 TO 50]"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&fq=category:technology&fq=published_date:[NOW-30DAYS TO NOW]"

# Faceted search
curl "http://localhost:8983/solr/mycollection/select?q=*:*&facet=true&facet.field=category"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&facet=true&facet.field=category&facet.field=tags"

# Date faceting
curl "http://localhost:8983/solr/mycollection/select?q=*:*&facet=true&facet.date=published_date&facet.date.start=2024-01-01T00:00:00Z&facet.date.end=2024-12-31T23:59:59Z&facet.date.gap=%2B1MONTH"

# Range faceting
curl "http://localhost:8983/solr/mycollection/select?q=*:*&facet=true&facet.range=price&facet.range.start=0&facet.range.end=100&facet.range.gap=10"

# Advanced features
curl "http://localhost:8983/solr/mycollection/select?q=Solr&hl=true&hl.fl=content"
curl "http://localhost:8983/solr/mycollection/mlt?q=id:1&mlt.fl=title,content&mlt.mintf=1&mlt.mindf=1"
curl "http://localhost:8983/solr/mycollection/spell?q=slor&spellcheck=true&spellcheck.build=true"

# Sorting and pagination
curl "http://localhost:8983/solr/mycollection/select?q=*:*&sort=published_date desc"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&start=20&rows=10"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&fl=id,title,score"

Schema Design and Configuration

<!-- managed-schema -->
<schema name="myschema" version="1.6">
  <!-- Field types -->
  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
      <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
  </fieldType>

  <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
  </fieldType>

  <!-- Field definitions -->
  <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
  <field name="title" type="text_general" indexed="true" stored="true"/>
  <field name="content" type="text_en" indexed="true" stored="true"/>
  <field name="category" type="string" indexed="true" stored="true"/>
  <field name="published_date" type="pdate" indexed="true" stored="true"/>
  <field name="price" type="pdouble" indexed="true" stored="true"/>
  <field name="tags" type="string" indexed="true" stored="true" multiValued="true"/>
  <field name="in_stock" type="boolean" indexed="true" stored="true"/>

  <!-- Dynamic fields -->
  <dynamicField name="*_i" type="pint" indexed="true" stored="true"/>
  <dynamicField name="*_s" type="string" indexed="true" stored="true"/>
  <dynamicField name="*_txt" type="text_general" indexed="true" stored="true"/>
  <dynamicField name="*_dt" type="pdate" indexed="true" stored="true"/>

  <!-- Copy fields for unified searching -->
  <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
  <copyField source="title" dest="text"/>
  <copyField source="content" dest="text"/>
  <copyField source="category" dest="text"/>

  <uniqueKey>id</uniqueKey>
</schema>

Performance Optimization

# SolrCloud cluster setup
bin/solr start -c -p 8983 -s example/cloud/node1/solr
bin/solr start -c -p 7574 -s example/cloud/node2/solr -z localhost:9983
bin/solr start -c -p 8984 -s example/cloud/node3/solr -z localhost:9983

# Create distributed collection
bin/solr create -c products -shards 3 -replicationFactor 2

# Index optimization
curl "http://localhost:8983/solr/mycollection/update?optimize=true"

# Commit optimization
curl "http://localhost:8983/solr/mycollection/update?commit=true&softCommit=true"

# Cache configuration in solrconfig.xml
cat >> solrconfig.xml << 'EOF'
<query>
  <filterCache class="solr.CaffeineCache"
               size="512"
               initialSize="512"
               autowarmCount="0"/>
  <queryResultCache class="solr.CaffeineCache"
                    size="512"
                    initialSize="512"
                    autowarmCount="0"/>
  <documentCache class="solr.CaffeineCache"
                 size="512"
                 initialSize="512"
                 autowarmCount="0"/>
</query>
EOF

# JVM tuning
cat > /etc/default/solr.in.sh << 'EOF'
SOLR_HEAP="4g"
SOLR_JAVA_MEM="-Xms4g -Xmx4g"
GC_TUNE="-XX:+UseG1GC -XX:+UseStringDeduplication"
SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"
EOF

Integration and Framework Connectivity

# Java client library
<!-- Maven dependency -->
<dependency>
  <groupId>org.apache.solr</groupId>
  <artifactId>solr-solrj</artifactId>
  <version>9.4.1</version>
</dependency>
// Java SolrJ client example
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.common.SolrInputDocument;

public class SolrExample {
    public static void main(String[] args) throws Exception {
        String urlString = "http://localhost:8983/solr/mycollection";
        SolrClient solr = new HttpSolrClient.Builder(urlString).build();

        // Add document
        SolrInputDocument document = new SolrInputDocument();
        document.addField("id", "1");
        document.addField("title", "Example Document");
        document.addField("content", "This is an example document for Solr");
        
        solr.add(document);
        solr.commit();

        // Search
        SolrQuery query = new SolrQuery();
        query.set("q", "*:*");
        QueryResponse response = solr.query(query);
        
        SolrDocumentList documents = response.getResults();
        for (SolrDocument doc : documents) {
            System.out.println(doc.get("id") + ": " + doc.get("title"));
        }

        solr.close();
    }
}
# Python pysolr client example
import pysolr

# Connect to Solr
solr = pysolr.Solr('http://localhost:8983/solr/mycollection/', always_commit=True)

# Add documents
documents = [
    {
        "id": "1",
        "title": "Python Solr Integration",
        "content": "Using pysolr to interact with Solr",
        "category": "programming"
    },
    {
        "id": "2", 
        "title": "Advanced Search Features",
        "content": "Exploring faceted search and highlighting",
        "category": "search"
    }
]

solr.add(documents)

# Search
results = solr.search('programming')
for result in results:
    print(f"ID: {result['id']}, Title: {result['title']}")

# Faceted search
results = solr.search('*:*', **{
    'facet': 'true',
    'facet.field': 'category',
    'facet.limit': 10
})

print("Facets:", results.facets['facet_fields']['category'])

Advanced Features and Security

# Enable basic authentication
bin/solr auth enable

# Create security.json
cat > security.json << 'EOF'
{
  "authentication":{
    "blockUnknown": true,
    "class":"solr.BasicAuthPlugin",
    "credentials":{"admin":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="},
    "realm":"Solr",
    "forwardCredentials": false
  },
  "authorization":{
    "class":"solr.RuleBasedAuthorizationPlugin",
    "permissions":[
      {"name":"security-edit", "role":"admin"},
      {"name":"collection-admin-edit", "role":"admin"},
      {"name":"core-admin-edit", "role":"admin"}
    ],
    "user-role":{"admin":"admin"}
  }
}
EOF

# SSL/TLS configuration
keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 \
  -keypass secret -storepass secret -validity 9999 \
  -keystore solr-ssl.keystore.jks \
  -ext SAN=DNS:localhost,IP:127.0.0.1 \
  -dname "CN=localhost, OU=IT, O=Organization, L=City, ST=State, C=US"

# Start Solr with SSL
bin/solr start -p 8984 \
  -Dsolr.ssl.keyStore=solr-ssl.keystore.jks \
  -Dsolr.ssl.keyStorePassword=secret \
  -Dsolr.ssl.trustStore=solr-ssl.keystore.jks \
  -Dsolr.ssl.trustStorePassword=secret

# Backup and restore
curl "http://localhost:8983/solr/admin/collections?action=BACKUP&name=backup1&collection=mycollection&location=/var/backups"

curl "http://localhost:8983/solr/admin/collections?action=RESTORE&name=backup1&collection=restored_collection&location=/var/backups"

# Monitoring and metrics
curl "http://localhost:8983/solr/admin/metrics?group=all"
curl "http://localhost:8983/solr/admin/metrics?group=jvm"
curl "http://localhost:8983/solr/admin/metrics?group=core&prefix=CORE.mycollection"

# Learning to Rank (LTR)
curl -XPUT 'http://localhost:8983/solr/mycollection/schema/feature-store' \
  --data-binary '{
    "store": "myFeatureStore",
    "name": "titleMatch",
    "class": "org.apache.solr.ltr.feature.SolrFeature",
    "params": {
      "q": "{!field f=title}${user_query}"
    }
  }' -H 'Content-type:application/json'

# Create LTR model
curl -XPUT 'http://localhost:8983/solr/mycollection/schema/model-store' \
  --data-binary '{
    "store": "myFeatureStore",
    "name": "myModel",
    "class": "org.apache.solr.ltr.model.LinearModel",
    "features": [
      {"name": "titleMatch"}
    ],
    "params": {
      "weights": {
        "titleMatch": 1.0
      }
    }
  }' -H 'Content-type:application/json'

Apache Solr is a powerful and mature enterprise search platform that provides comprehensive search capabilities with high scalability and extensive customization options. Its rich feature set, combined with strong community support and enterprise-grade reliability, makes it an excellent choice for organizations requiring sophisticated search functionality.