What is Marqo

Marqo is more than a vector database, it's an end-to-end vector search engine for both text and images. Vector generation, storage and retrieval are handled out of the box through a single API. No need to bring your own embeddings, making it developer-friendly by design.

Key Features

Unified Architecture

  • All-in-One Solution: Handles vector generation through search in one integrated system
  • Multimodal Support: Works with both text and images
  • Automatic Embedding Generation: No pre-embedding preparation required
  • Document-Level Abstraction: Treats data as documents rather than pure vectors

Advanced Search Capabilities

  • Complex Semantic Queries: Build queries with weighted search terms
  • Filtering: Filter results using Marqo's query DSL
  • Searchable Attributes: Limit search to specific fields
  • Hybrid Search: Supports tensor, lexical, and hybrid search types

Latest Features (2024-2025)

  • Stella Embedding Model Support: Supports high-performance models like stella_en_400M_v5
  • FFmpeg-CUDA Integration: GPU acceleration for video processing (up to 5x faster)
  • Video/Audio File Size Limits: Configurable file size restrictions
  • Python 3.9 Support: Enhanced security and compatibility

Pros and Cons

Pros

  • No need to manage embeddings manually
  • Simple API design with low learning curve
  • High flexibility with multimodal support
  • Available on both cloud and on-premises
  • Active development and community support

Cons

  • May have limited features compared to pure vector databases for specific use cases
  • Can be overkill for simple vector search needs
  • Custom embedding model flexibility may be restricted in some cases

Key Links

Installation

Using Docker

# Pull the Docker image
docker pull marqoai/marqo:latest

# Remove existing container (if needed)
docker rm -f marqo

# Run Marqo container
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest

Python Client Installation

pip install marqo

Code Examples

Basic Usage

import marqo

# Create a Marqo client
mq = marqo.Client(url='http://localhost:8882')

# Create an index
mq.create_index("movies-index", model="hf/e5-base-v2")

# Add documents
mq.index("movies-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels"
    },
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection",
        "_id": "article_591"
    }
], tensor_fields=["Description"])

# Search the index
results = mq.index("movies-index").search(
    q="What is the best outfit to wear on the moon?",
    searchable_attributes=["Description"]
)

# Print results
for result in results['hits']:
    print(f"Title: {result['Title']}")
    print(f"Description: {result['Description']}")
    print(f"Score: {result['_score']}")

Image Search Example

# Create an image index
mq.create_index("image-index", model="open_clip/ViT-B-32/openai")

# Add images
mq.index("image-index").add_documents([
    {
        "image_url": "https://example.com/image1.jpg",
        "caption": "Beautiful sunset landscape"
    },
    {
        "image_url": "https://example.com/image2.jpg",
        "caption": "City nightscape"
    }
], tensor_fields=["image_url", "caption"])

# Search images with text
results = mq.index("image-index").search(
    q="sunset scenery"
)

Integrations and Ecosystem

LangChain Integration

from langchain_community.vectorstores import Marqo

# Use Marqo with LangChain
vectorstore = Marqo(
    marqo_url="http://localhost:8882",
    marqo_api_key="",  # Optional
    index_name="langchain-demo"
)

Haystack Integration

Marqo can be used as a Document Store for Haystack pipelines including retrieval-augmented generation (RAG), question answering, and document search.

Summary

Marqo is a powerful tool that simplifies end-to-end vector search implementation. By eliminating the need for manual embedding generation and supporting both text and images, it significantly improves developer productivity. Available as both a cloud service and self-hosted solution, it can accommodate projects of various scales.