Overview
Pinecone is a high-performance, fully managed vector database service designed for AI applications. It provides storage, indexing, and fast similarity search for vector embeddings, enabling the development of knowledgeable AI applications in large-scale production environments.
Details
Architecture
Pinecone employs a next-generation serverless architecture with the following characteristics:
Control Plane and Data Plane
- Control Plane: Manages indexes and collections (create, delete, configure)
- Data Plane: Handles vector operations (store, query, fetch, delete, update)
Serverless Architecture Features
- Separation of Storage and Compute: Compute resources are used only when needed
- File-based Architecture: Indexes are composed of files
- Disk-based Metadata Filtering: High-cardinality filtering capabilities
- Real-time Indexing: Write operations are immediately reflected
Index Types
Serverless Indexes
- Fully managed with automatic scaling
- Usage-based pricing model
- High availability and reliability
Pod-based Indexes
- Customizable pod types and counts
- Optimized for specific workloads
- Fixed per-minute pricing
Key Features
Search Capabilities
- Semantic Search: Similarity search using dense vectors
- Lexical Search: Keyword matching using sparse vectors
- Hybrid Search: Combination of semantic and lexical search
- Metadata Filtering: Advanced filtering with complex queries
- Namespaces: Logical data separation and multi-tenancy
AI Integration Features
- Pinecone Inference: Built-in embedding models and reranking
- Pinecone Assistant: Rapid development of chatbots and agent applications
- Multiple Embedding Models: OpenAI, Cohere, Sentence Transformers, and more
Enterprise Features
Security
- Encryption at rest and in transit
- Hierarchical encryption keys
- Private networking
- SOC 2, GDPR, ISO 27001, HIPAA certified
BYOC (Bring Your Own Cloud)
- Deploy private Pinecone regions on AWS
- Ensure data sovereignty and compliance
- Maintain benefits of fully managed SaaS
Pros and Cons
Pros
- Fully Managed: No infrastructure management required
- Rapid Setup: Launch vector databases in seconds
- Auto-scaling: Resources automatically adjust to demand
- High Reliability: Robust design for production use
- Rich Integrations: Works with LangChain, OpenAI, Hugging Face, and more
- Global Deployment: Multiple cloud providers and regions
Cons
- Vendor Lock-in: Dependency on proprietary service
- Cost: Can be expensive for large-scale usage
- Limited Customization: Less flexible compared to open-source solutions
- No Offline Usage: Requires constant cloud connectivity
Key Links
Code Examples
Basic Usage
from pinecone import Pinecone, ServerlessSpec, CloudProvider, AwsRegion
# Initialize Pinecone client
pc = Pinecone(api_key="YOUR_API_KEY")
# Create a serverless index
index_config = pc.create_index(
name="semantic-search",
dimension=1536, # OpenAI ada-002 dimension
metric="cosine",
spec=ServerlessSpec(
cloud=CloudProvider.AWS,
region=AwsRegion.US_EAST_1
)
)
# Connect to the index
index = pc.Index(host=index_config.host)
# Upsert vectors (insert/update)
index.upsert(
vectors=[
(
"doc1", # ID
[0.1, 0.2, 0.3, ...], # 1536-dimensional vector
{"title": "AI Fundamentals", "category": "technology"} # Metadata
),
(
"doc2",
[0.2, 0.3, 0.4, ...],
{"title": "Introduction to ML", "category": "technology"}
)
],
namespace="tech-docs"
)
# Query vectors
query_embedding = [0.15, 0.25, 0.35, ...] # Query vector
results = index.query(
vector=query_embedding,
top_k=5,
namespace="tech-docs",
filter={"category": {"$eq": "technology"}},
include_metadata=True
)
# Display results
for match in results.matches:
print(f"ID: {match.id}, Score: {match.score}, Metadata: {match.metadata}")
Using Integrated Embedding Models
from pinecone import Pinecone, ServerlessSpec, CloudProvider, AwsRegion
# Create an index with integrated embedding model
pc = Pinecone(api_key="YOUR_API_KEY")
index_config = pc.create_index(
name="text-search",
dimension=1024,
metric="cosine",
spec=ServerlessSpec(
cloud=CloudProvider.AWS,
region=AwsRegion.US_EAST_1
),
embedding_model="multilingual-e5-large" # Integrated embedding model
)
index = pc.Index(host=index_config.host)
# Upsert text directly (automatically vectorized)
index.upsert(
data=[
{
"id": "article1",
"text": "Artificial intelligence is transforming our lives.",
"metadata": {"language": "en", "topic": "AI"}
},
{
"id": "article2",
"text": "Machine learning learns from large amounts of data.",
"metadata": {"language": "en", "topic": "ML"}
}
]
)
# Query with text (automatically vectorized)
results = index.query(
text="Tell me about recent AI trends",
top_k=3,
include_metadata=True
)
Asynchronous Operations
import asyncio
from pinecone import PineconeAsyncio
async def async_vector_operations():
# Use async client
async with PineconeAsyncio(api_key="YOUR_API_KEY") as pc:
idx = pc.IndexAsyncio(host="YOUR_INDEX_HOST")
# Async upsert vectors
await idx.upsert(vectors=[
("async1", [1.0, 2.0, 3.0, ...]),
("async2", [2.0, 3.0, 4.0, ...])
])
# Async query
results = await idx.query(
vector=[1.5, 2.5, 3.5, ...],
top_k=10
)
return results
# Run async function
results = asyncio.run(async_vector_operations())
RAG Application Example
from pinecone import Pinecone
import openai
# Initialize OpenAI and Pinecone
openai.api_key = "YOUR_OPENAI_API_KEY"
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
index = pc.Index("knowledge-base")
def generate_embedding(text):
"""Generate text embedding using OpenAI API"""
response = openai.embeddings.create(
model="text-embedding-ada-002",
input=text
)
return response.data[0].embedding
def search_knowledge_base(query, top_k=5):
"""Search relevant information from knowledge base"""
query_embedding = generate_embedding(query)
results = index.query(
vector=query_embedding,
top_k=top_k,
include_metadata=True
)
contexts = []
for match in results.matches:
contexts.append(match.metadata['text'])
return contexts
def generate_answer(query, contexts):
"""Generate answer based on search results"""
context_str = "\n\n".join(contexts)
prompt = f"""Answer the question using the following context.
Context:
{context_str}
Question: {query}
Answer:"""
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
# Use RAG system
query = "Tell me about Pinecone's serverless architecture"
contexts = search_knowledge_base(query)
answer = generate_answer(query, contexts)
print(answer)