Apache Solr
Enterprise search platform based on Apache Lucene. Provides rich search capabilities, faceted search, and distributed search. Supports XML and JSON APIs.
Server
Apache Solr
Overview
Apache Solr is an enterprise-grade search platform built on Apache Lucene. It provides advanced search capabilities, rich text analysis, distributed search, and a web-based management interface, enabling the construction of large-scale document management and high-performance search systems.
Details
Apache Solr is a powerful and feature-rich enterprise search platform that has been serving organizations for over 15 years. Built on Apache Lucene, it provides a mature, stable foundation for search applications with comprehensive features including distributed search through SolrCloud, advanced text analysis, faceted search, and extensive customization capabilities. With RESTful APIs and JSON/XML support, Solr integrates seamlessly into enterprise environments while offering robust security, monitoring, and management features.
Key Features
- Advanced Search Capabilities: Full-text search, complex queries, faceted search, and spell checking
- Scalable Architecture: Distributed processing through SolrCloud with automatic sharding and replication
- Enterprise Features: Rich web management interface, RESTful APIs, security authentication/authorization
- Text Analysis: Comprehensive language processing and analysis capabilities
- High Performance: Optimized for large-scale document processing and search operations
- Extensible Platform: Plugin architecture and customizable components
Pros and Cons
Pros
- Mature and stable platform with extensive enterprise adoption and support
- Comprehensive feature set including advanced search, faceting, and clustering
- SolrCloud provides robust distributed architecture with high availability
- Rich web-based administration interface for easy management and monitoring
- Strong community support and extensive documentation
- Flexible schema management and configuration options
- Built-in security features and enterprise-grade authentication
Cons
- Higher resource consumption and complexity compared to lightweight alternatives
- Steeper learning curve for configuration and optimization
- Java-based platform requiring JVM tuning and management
- Can be overpowered for simple search use cases
- Requires careful configuration for optimal performance in large deployments
- Updates and maintenance can be complex in distributed environments
Reference Pages
- Apache Solr Official Website
- Apache Solr Documentation
- Apache Solr GitHub Repository
- Solr Reference Guide
Code Examples
Setup and Installation
# Prerequisites - Java 11 or later
java -version
# Install Java 11 (Ubuntu/Debian)
sudo apt update
sudo apt install openjdk-11-jdk
# Install Java 11 (CentOS/RHEL)
sudo yum install java-11-openjdk-devel
# Download and extract Solr
wget https://downloads.apache.org/lucene/solr/9.4.1/solr-9.4.1.tgz
tar -xzf solr-9.4.1.tgz
# Install Solr as a service
sudo tar xzf solr-9.4.1.tgz solr-9.4.1/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-9.4.1.tgz
# Start Solr service
sudo systemctl start solr
sudo systemctl enable solr
# Verify installation
curl "http://localhost:8983/solr/admin/info/system"
Docker Deployment
# Single Solr instance
docker run --name my_solr -d -p 8983:8983 -t solr:9.4
# With persistent data volume
docker run --name my_solr -d -p 8983:8983 \
-v solr_data:/var/solr \
-t solr:9.4
# Custom configuration
docker run --name my_solr -d -p 8983:8983 \
-v $PWD/myconfig:/opt/solr/myconfig \
-t solr:9.4 \
solr-create -c mycollection -d /opt/solr/myconfig
# SolrCloud with Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
zookeeper:
image: zookeeper:3.8
hostname: zookeeper
ports:
- "2181:2181"
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=zookeeper:2888:3888;2181
volumes:
- zoo_data:/data
- zoo_logs:/datalog
solr1:
image: solr:9.4
hostname: solr1
ports:
- "8981:8983"
environment:
- ZK_HOST=zookeeper:2181
depends_on:
- zookeeper
volumes:
- solr1_data:/var/solr
volumes:
zoo_data:
zoo_logs:
solr1_data:
EOF
docker-compose up -d
Index Creation and Data Ingestion
# Start Solr
bin/solr start
# Create a collection (standalone)
bin/solr create -c mycore
# Create a collection (SolrCloud)
bin/solr create -c mycollection -d sample_techproducts_configs -shards 2 -replicationFactor 2
# Add documents using JSON
curl -X POST "http://localhost:8983/solr/mycollection/update?commit=true" \
-H "Content-Type: application/json" \
-d '[
{
"id": "1",
"title": "Introduction to Apache Solr",
"content": "Solr is a powerful search engine",
"category": "technology",
"published_date": "2024-01-15T10:00:00Z",
"price": 29.99,
"tags": ["search", "lucene", "apache"]
},
{
"id": "2",
"title": "Search System Design",
"content": "Building efficient search systems",
"category": "engineering",
"published_date": "2024-01-16T14:30:00Z",
"price": 39.99,
"tags": ["design", "architecture", "systems"]
}
]'
# Add documents using XML
curl -X POST "http://localhost:8983/solr/mycollection/update?commit=true" \
-H "Content-Type: application/xml" \
-d '<add>
<doc>
<field name="id">3</field>
<field name="title">Advanced Lucene Techniques</field>
<field name="content">Advanced features of Lucene</field>
<field name="category">tutorial</field>
<field name="price">49.99</field>
</doc>
</add>'
# Bulk import from CSV
curl -X POST "http://localhost:8983/solr/mycollection/update?commit=true" \
-H "Content-Type: application/csv" \
--data-binary @data.csv
# Atomic updates
curl -X POST "http://localhost:8983/solr/mycollection/update?commit=true" \
-H "Content-Type: application/json" \
-d '[
{
"id": "1",
"content": {"set": "Updated content description"},
"views": {"inc": 1},
"tags": {"add": ["updated"]}
}
]'
Search Query Implementation
# Basic queries
curl "http://localhost:8983/solr/mycollection/select?q=*:*"
curl "http://localhost:8983/solr/mycollection/select?q=Solr"
curl "http://localhost:8983/solr/mycollection/select?q=title:Apache"
curl "http://localhost:8983/solr/mycollection/select?q=content:\"search engine\""
# Boolean queries
curl "http://localhost:8983/solr/mycollection/select?q=title:Apache AND category:technology"
curl "http://localhost:8983/solr/mycollection/select?q=title:Apache OR title:Lucene"
curl "http://localhost:8983/solr/mycollection/select?q=title:search NOT category:old"
# Range queries and filters
curl "http://localhost:8983/solr/mycollection/select?q=*:*&fq=published_date:[2024-01-01T00:00:00Z TO 2024-12-31T23:59:59Z]"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&fq=price:[10 TO 50]"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&fq=category:technology&fq=published_date:[NOW-30DAYS TO NOW]"
# Faceted search
curl "http://localhost:8983/solr/mycollection/select?q=*:*&facet=true&facet.field=category"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&facet=true&facet.field=category&facet.field=tags"
# Date faceting
curl "http://localhost:8983/solr/mycollection/select?q=*:*&facet=true&facet.date=published_date&facet.date.start=2024-01-01T00:00:00Z&facet.date.end=2024-12-31T23:59:59Z&facet.date.gap=%2B1MONTH"
# Range faceting
curl "http://localhost:8983/solr/mycollection/select?q=*:*&facet=true&facet.range=price&facet.range.start=0&facet.range.end=100&facet.range.gap=10"
# Advanced features
curl "http://localhost:8983/solr/mycollection/select?q=Solr&hl=true&hl.fl=content"
curl "http://localhost:8983/solr/mycollection/mlt?q=id:1&mlt.fl=title,content&mlt.mintf=1&mlt.mindf=1"
curl "http://localhost:8983/solr/mycollection/spell?q=slor&spellcheck=true&spellcheck.build=true"
# Sorting and pagination
curl "http://localhost:8983/solr/mycollection/select?q=*:*&sort=published_date desc"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&start=20&rows=10"
curl "http://localhost:8983/solr/mycollection/select?q=*:*&fl=id,title,score"
Schema Design and Configuration
<!-- managed-schema -->
<schema name="myschema" version="1.6">
<!-- Field types -->
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<!-- Field definitions -->
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="content" type="text_en" indexed="true" stored="true"/>
<field name="category" type="string" indexed="true" stored="true"/>
<field name="published_date" type="pdate" indexed="true" stored="true"/>
<field name="price" type="pdouble" indexed="true" stored="true"/>
<field name="tags" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="in_stock" type="boolean" indexed="true" stored="true"/>
<!-- Dynamic fields -->
<dynamicField name="*_i" type="pint" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_txt" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_dt" type="pdate" indexed="true" stored="true"/>
<!-- Copy fields for unified searching -->
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
<copyField source="title" dest="text"/>
<copyField source="content" dest="text"/>
<copyField source="category" dest="text"/>
<uniqueKey>id</uniqueKey>
</schema>
Performance Optimization
# SolrCloud cluster setup
bin/solr start -c -p 8983 -s example/cloud/node1/solr
bin/solr start -c -p 7574 -s example/cloud/node2/solr -z localhost:9983
bin/solr start -c -p 8984 -s example/cloud/node3/solr -z localhost:9983
# Create distributed collection
bin/solr create -c products -shards 3 -replicationFactor 2
# Index optimization
curl "http://localhost:8983/solr/mycollection/update?optimize=true"
# Commit optimization
curl "http://localhost:8983/solr/mycollection/update?commit=true&softCommit=true"
# Cache configuration in solrconfig.xml
cat >> solrconfig.xml << 'EOF'
<query>
<filterCache class="solr.CaffeineCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<queryResultCache class="solr.CaffeineCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<documentCache class="solr.CaffeineCache"
size="512"
initialSize="512"
autowarmCount="0"/>
</query>
EOF
# JVM tuning
cat > /etc/default/solr.in.sh << 'EOF'
SOLR_HEAP="4g"
SOLR_JAVA_MEM="-Xms4g -Xmx4g"
GC_TUNE="-XX:+UseG1GC -XX:+UseStringDeduplication"
SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"
EOF
Integration and Framework Connectivity
# Java client library
<!-- Maven dependency -->
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>9.4.1</version>
</dependency>
// Java SolrJ client example
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.common.SolrInputDocument;
public class SolrExample {
public static void main(String[] args) throws Exception {
String urlString = "http://localhost:8983/solr/mycollection";
SolrClient solr = new HttpSolrClient.Builder(urlString).build();
// Add document
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "1");
document.addField("title", "Example Document");
document.addField("content", "This is an example document for Solr");
solr.add(document);
solr.commit();
// Search
SolrQuery query = new SolrQuery();
query.set("q", "*:*");
QueryResponse response = solr.query(query);
SolrDocumentList documents = response.getResults();
for (SolrDocument doc : documents) {
System.out.println(doc.get("id") + ": " + doc.get("title"));
}
solr.close();
}
}
# Python pysolr client example
import pysolr
# Connect to Solr
solr = pysolr.Solr('http://localhost:8983/solr/mycollection/', always_commit=True)
# Add documents
documents = [
{
"id": "1",
"title": "Python Solr Integration",
"content": "Using pysolr to interact with Solr",
"category": "programming"
},
{
"id": "2",
"title": "Advanced Search Features",
"content": "Exploring faceted search and highlighting",
"category": "search"
}
]
solr.add(documents)
# Search
results = solr.search('programming')
for result in results:
print(f"ID: {result['id']}, Title: {result['title']}")
# Faceted search
results = solr.search('*:*', **{
'facet': 'true',
'facet.field': 'category',
'facet.limit': 10
})
print("Facets:", results.facets['facet_fields']['category'])
Advanced Features and Security
# Enable basic authentication
bin/solr auth enable
# Create security.json
cat > security.json << 'EOF'
{
"authentication":{
"blockUnknown": true,
"class":"solr.BasicAuthPlugin",
"credentials":{"admin":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="},
"realm":"Solr",
"forwardCredentials": false
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
{"name":"security-edit", "role":"admin"},
{"name":"collection-admin-edit", "role":"admin"},
{"name":"core-admin-edit", "role":"admin"}
],
"user-role":{"admin":"admin"}
}
}
EOF
# SSL/TLS configuration
keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 \
-keypass secret -storepass secret -validity 9999 \
-keystore solr-ssl.keystore.jks \
-ext SAN=DNS:localhost,IP:127.0.0.1 \
-dname "CN=localhost, OU=IT, O=Organization, L=City, ST=State, C=US"
# Start Solr with SSL
bin/solr start -p 8984 \
-Dsolr.ssl.keyStore=solr-ssl.keystore.jks \
-Dsolr.ssl.keyStorePassword=secret \
-Dsolr.ssl.trustStore=solr-ssl.keystore.jks \
-Dsolr.ssl.trustStorePassword=secret
# Backup and restore
curl "http://localhost:8983/solr/admin/collections?action=BACKUP&name=backup1&collection=mycollection&location=/var/backups"
curl "http://localhost:8983/solr/admin/collections?action=RESTORE&name=backup1&collection=restored_collection&location=/var/backups"
# Monitoring and metrics
curl "http://localhost:8983/solr/admin/metrics?group=all"
curl "http://localhost:8983/solr/admin/metrics?group=jvm"
curl "http://localhost:8983/solr/admin/metrics?group=core&prefix=CORE.mycollection"
# Learning to Rank (LTR)
curl -XPUT 'http://localhost:8983/solr/mycollection/schema/feature-store' \
--data-binary '{
"store": "myFeatureStore",
"name": "titleMatch",
"class": "org.apache.solr.ltr.feature.SolrFeature",
"params": {
"q": "{!field f=title}${user_query}"
}
}' -H 'Content-type:application/json'
# Create LTR model
curl -XPUT 'http://localhost:8983/solr/mycollection/schema/model-store' \
--data-binary '{
"store": "myFeatureStore",
"name": "myModel",
"class": "org.apache.solr.ltr.model.LinearModel",
"features": [
{"name": "titleMatch"}
],
"params": {
"weights": {
"titleMatch": 1.0
}
}
}' -H 'Content-type:application/json'
Apache Solr is a powerful and mature enterprise search platform that provides comprehensive search capabilities with high scalability and extensive customization options. Its rich feature set, combined with strong community support and enterprise-grade reliability, makes it an excellent choice for organizations requiring sophisticated search functionality.