Database

ArangoDB

Overview

ArangoDB is a multi-model database that integrates document, graph, and key-value data models within a single engine. Its proprietary query language "AQL (ArangoDB Query Language)" enables efficient execution of complex queries that span across different data models. It's a high-performance NoSQL database solution that combines native graph processing, ACID properties, and horizontal scaling capabilities.

Details

ArangoDB development began in Germany in 2011 and was released as open source in 2012. Its innovative approach allows a single database to solve use cases that traditionally required multiple systems.

Key features of ArangoDB:

  • Multi-model architecture: Integrates document, graph, and key-value models
  • AQL query language: SQL-like unified query language
  • Native graph processing: High-speed traversal algorithms
  • ACID properties: Full transactional guarantees
  • Distributed architecture: Automatic sharding and replication
  • Foxx microservices: In-database application execution
  • RESTful API: HTTP-based access interface
  • Geospatial indexing: GeoJSON support
  • Full-text search: Built-in ArangoSearch engine
  • Web interface: Intuitive management and visualization tools

Pros and Cons

Pros

  • Unity: Manage multiple data models in a single system
  • Flexibility: Schema-less design adapts to evolving data
  • Performance: Optimized query execution engine
  • Scalability: Horizontal scaling and clustering
  • Development efficiency: Unified query language improves productivity
  • Rich features: Built-in graph analysis, full-text search, geospatial processing
  • Microservices: Server-side applications with Foxx
  • Operability: Intuitive management through Web UI

Cons

  • Learning curve: Need to master AQL and multi-model concepts
  • Memory usage: In-memory processing consumes significant memory
  • Ecosystem: More limited than Neo4j or MongoDB in specific domains
  • Complexity: Configuration and tuning complexity due to feature richness
  • Performance trade-offs: Potential performance degradation compared to specialized single-model databases

Key Links

Code Examples

Installation & Setup

# Running with Docker
docker run -e ARANGO_ROOT_PASSWORD=password \
  -p 8529:8529 \
  -v arango-data:/var/lib/arangodb3 \
  -v arango-apps:/var/lib/arangodb3-apps \
  arangodb:latest

# Docker Compose configuration
cat > docker-compose.yml << EOF
version: '3.8'
services:
  arangodb:
    image: arangodb:latest
    environment:
      ARANGO_ROOT_PASSWORD: password
    ports:
      - "8529:8529"
    volumes:
      - arango-data:/var/lib/arangodb3
      - arango-apps:/var/lib/arangodb3-apps
volumes:
  arango-data:
  arango-apps:
EOF

docker-compose up -d

# Web interface access
# http://localhost:8529

# Python driver installation
pip install python-arango

# Node.js driver installation
npm install arangojs

# ArangoDB CLI client (arangosh)
docker exec -it arangodb_container arangosh

Basic Operations (Multi-model CRUD)

// Basic CRUD operations with AQL

// Collection creation
db._create("users");          // Document collection
db._createEdgeCollection("friends"); // Edge collection

// Document creation (Create)
FOR doc IN [
  {name: "John Doe", age: 30, email: "[email protected]", city: "Tokyo"},
  {name: "Jane Smith", age: 25, email: "[email protected]", city: "Osaka"},
  {name: "Bob Johnson", age: 35, email: "[email protected]", city: "Nagoya"}
]
INSERT doc INTO users

// Key-value access
INSERT {_key: "user001", name: "Alice Brown", status: "active"} INTO users

// Document reading (Read)
FOR user IN users
  RETURN user

// Conditional queries
FOR user IN users
  FILTER user.age > 25
  RETURN {name: user.name, age: user.age}

// Direct key access
RETURN DOCUMENT("users/user001")

// Document update (Update)
FOR user IN users
  FILTER user.name == "John Doe"
  UPDATE user WITH {age: 31, location: "Tokyo, Shibuya"} IN users

// Conditional update
UPDATE {_key: "user001"} WITH {lastLogin: DATE_NOW()} IN users

// Document deletion (Delete)
FOR user IN users
  FILTER user.email == "[email protected]"
  REMOVE user IN users

// Key-specified deletion
REMOVE "user001" IN users

// Edge (relationship) creation
INSERT {_from: "users/user1", _to: "users/user2", type: "friend", since: "2024-01-01"} 
INTO friends

Multi-model Queries

// Integrated queries across document + graph + key-value

// Graph traversal
FOR v, e, p IN 1..3 OUTBOUND "users/user1" friends
  RETURN {
    friend: v.name,
    relationship: e.type,
    path_length: LENGTH(p.edges)
  }

// Document search + graph analysis
FOR user IN users
  FILTER user.city == "Tokyo"
  FOR friend IN 1..2 OUTBOUND user friends
    COLLECT friendCity = friend.city WITH COUNT INTO friendCount
    RETURN {
      city: friendCity,
      friends_count: friendCount
    }

// Key-value + document join
LET userKeys = ["user001", "user002", "user003"]
FOR key IN userKeys
  LET user = DOCUMENT(CONCAT("users/", key))
  FILTER user != null
  RETURN {
    key: key,
    user: user,
    is_active: user.status == "active"
  }

// Composite data model query
FOR order IN orders
  LET customer = DOCUMENT(order.customer_id)
  LET items = (
    FOR item_id IN order.items
      RETURN DOCUMENT(CONCAT("products/", item_id))
  )
  RETURN {
    order_id: order._key,
    customer_name: customer.name,
    total_items: LENGTH(items),
    total_value: SUM(items[*].price)
  }

Indexing & Optimization

// Index creation
db.users.ensureIndex({type: "hash", fields: ["email"], unique: true});
db.users.ensureIndex({type: "skiplist", fields: ["age"]});
db.users.ensureIndex({type: "fulltext", fields: ["name", "description"]});

// Geospatial index
db.locations.ensureIndex({type: "geo", fields: ["coordinates"]});

// Composite index
db.users.ensureIndex({type: "skiplist", fields: ["city", "age"]});

// Index inspection
db.users.getIndexes();

// Query optimization analysis
db._explain(`
  FOR user IN users
    FILTER user.age > 25 AND user.city == "Tokyo"
    RETURN user
`);

// Profiling
db._profile(`
  FOR user IN users
    FOR friend IN 1..2 OUTBOUND user friends
    RETURN {user: user.name, friend: friend.name}
`);

// Full-text search index
db.articles.ensureIndex({type: "fulltext", fields: ["title", "content"]});

// Full-text search query
FOR doc IN FULLTEXT(articles, "title,content", "ArangoDB multi-model")
  RETURN doc

Advanced Features

// Geospatial queries
FOR location IN locations
  FILTER GEO_DISTANCE(location.coordinates, [139.6917, 35.6895]) < 1000
  RETURN {
    name: location.name,
    distance: GEO_DISTANCE(location.coordinates, [139.6917, 35.6895])
  }

// Graph algorithms (shortest path)
FOR path IN OUTBOUND SHORTEST_PATH "users/alice" TO "users/bob" friends
  RETURN path

// Centrality analysis
FOR user IN users
  LET connections = LENGTH(
    FOR v IN 1..1 ANY user friends
      RETURN v
  )
  SORT connections DESC
  LIMIT 10
  RETURN {user: user.name, connections: connections}

// Window functions
FOR sale IN sales
  SORT sale.date
  RETURN {
    date: sale.date,
    amount: sale.amount,
    running_total: SUM(
      FOR s IN sales
        FILTER s.date <= sale.date
        RETURN s.amount
    )
  }

// Array operations
FOR user IN users
  FILTER LENGTH(user.skills) > 0
  RETURN {
    name: user.name,
    primary_skill: FIRST(user.skills),
    skill_count: LENGTH(user.skills),
    has_javascript: "JavaScript" IN user.skills
  }

// JSON processing
FOR document IN documents
  LET parsed = JSON_PARSE(document.json_data)
  FILTER parsed.type == "user_event"
  RETURN {
    id: document._key,
    event_type: parsed.event_type,
    timestamp: parsed.timestamp
  }

Practical Examples

// Recommendation system
FOR user IN users
  FILTER user._key == "current_user"
  FOR friend IN 2..3 OUTBOUND user friends
    FOR product IN 1..1 OUTBOUND friend purchases
      FILTER NOT (user)-[:PURCHASED]->(product)
      COLLECT productId = product._key WITH COUNT INTO score
      SORT score DESC
      LIMIT 5
      RETURN {
        product_id: productId,
        recommendation_score: score
      }

// Fraud detection (anomaly patterns)
FOR transaction IN transactions
  FILTER transaction.amount > 100000
  AND transaction.timestamp > DATE_SUBTRACT(DATE_NOW(), 1, "day")
  LET user_transactions = (
    FOR t IN transactions
      FILTER t.user_id == transaction.user_id
      AND t.timestamp > DATE_SUBTRACT(DATE_NOW(), 1, "day")
      RETURN t
  )
  FILTER LENGTH(user_transactions) > 10
  RETURN {
    user_id: transaction.user_id,
    suspicious_transactions: LENGTH(user_transactions),
    total_amount: SUM(user_transactions[*].amount)
  }

// Social network analysis
FOR user IN users
  FILTER user._key == "target_user"
  FOR friend IN 2..2 OUTBOUND user friends
    FILTER friend._key != user._key
    AND NOT (user)-[:FRIENDS_WITH]->(friend)
    COLLECT suggestion = friend WITH COUNT INTO mutualFriends
    SORT mutualFriends DESC
    LIMIT 5
    RETURN {
      suggested_friend: suggestion.name,
      mutual_connections: mutualFriends
    }

// Real-time analytics dashboard
LET stats = {
  total_users: LENGTH(users),
  active_sessions: LENGTH(
    FOR session IN sessions
      FILTER session.last_activity > DATE_SUBTRACT(DATE_NOW(), 15, "minute")
      RETURN session
  ),
  recent_orders: LENGTH(
    FOR order IN orders
      FILTER order.created_at > DATE_SUBTRACT(DATE_NOW(), 1, "hour")
      RETURN order
  )
}
RETURN stats

Python Usage Example

from arango import ArangoClient

# Database connection
client = ArangoClient(hosts='http://localhost:8529')
sys_db = client.db('_system', username='root', password='password')

# Database creation
if not sys_db.has_database('example_db'):
    sys_db.create_database('example_db')

# Database connection
db = client.db('example_db', username='root', password='password')

# Collection creation
if not db.has_collection('users'):
    users = db.create_collection('users')
else:
    users = db.collection('users')

if not db.has_collection('friends'):
    friends = db.create_collection('friends', edge=True)
else:
    friends = db.collection('friends')

# Document operations
def create_user(name, age, email):
    user_data = {
        'name': name,
        'age': age,
        'email': email,
        'created_at': '2024-01-01'
    }
    return users.insert(user_data)

def find_users_by_city(city):
    aql = """
    FOR user IN users
        FILTER user.city == @city
        RETURN user
    """
    return list(db.aql.execute(aql, bind_vars={'city': city}))

def create_friendship(user1_id, user2_id):
    edge_data = {
        '_from': f'users/{user1_id}',
        '_to': f'users/{user2_id}',
        'type': 'friend',
        'created_at': '2024-01-01'
    }
    return friends.insert(edge_data)

def get_user_friends(user_id):
    aql = """
    FOR v, e, p IN 1..1 OUTBOUND @start_vertex friends
        RETURN {
            friend: v,
            relationship: e,
            path_length: LENGTH(p.edges)
        }
    """
    return list(db.aql.execute(aql, bind_vars={'start_vertex': f'users/{user_id}'}))

# Execution example
user1 = create_user("John Doe", 30, "[email protected]")
user2 = create_user("Jane Smith", 25, "[email protected]")

friendship = create_friendship(user1['_key'], user2['_key'])
friends_list = get_user_friends(user1['_key'])

print(f"Created users: {user1['_key']}, {user2['_key']}")
print(f"Friends: {len(friends_list)}")

Configuration and Tuning

# arangod.conf key settings

[server]
endpoint = tcp://0.0.0.0:8529

[database]
maximal-journal-size = 1073741824

[cache]
size = 2147483648

[javascript]
startup-directory = /var/lib/arangodb3-apps

[cluster]
my-role = SINGLE

[rocksdb]
block-cache-size = 1073741824
total-write-buffer-size = 536870912

[log]
level = info
file = /var/log/arangodb3/arangod.log

[ssl]
keyfile = /etc/ssl/arangodb/server.pem

# Performance tuning
[query]
cache-mode = on
smart-joins = true

# Security settings
[server]
authentication = true
jwt-secret = your-secret-key

[ssl]
protocol = 5