DiskCache

PythonLibraryCachingDiskPerformanceDjango

GitHub Overview

grantjenks/python-diskcache

Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.

Stars2,648
Watchers21
Forks155
Created:February 3, 2016
Language:Python
License:Other

Topics

cachefilesystemkey-value-storepersistencepython

Star History

grantjenks/python-diskcache Star History
Data as of: 10/22/2025, 08:07 AM

Library

DiskCache

Overview

DiskCache is a Python library that provides disk-based persistent caching. Unlike memory-based caches, all data is stored on disk and persists after process or system restarts.

Details

DiskCache (python-diskcache) is a high-performance disk-based caching library developed as part of the SQLAlchemy Project. Designed as an alternative to memory-based caches like Redis and Memcached, it delivers exceptional performance especially for read operations. It employs a hybrid storage architecture, storing cache metadata (keys, expiration times, tags) in a SQLite database and large values as files on the filesystem. Small values are stored directly within the SQLite database. The FanoutCache feature provides automatic sharding to improve write performance in high-concurrency environments. It also offers persistent data structures like DjangoCache, Deque, and Index, making it applicable across a wide range of use cases from web applications to data science. The library addresses key performance issues inherent in previous systems like Beaker, particularly eliminating the "double-fetching" problem where values were frequently retrieved multiple times from the cache.

Pros and Cons

Pros

  • Persistence: Data survives process restarts and system failures
  • High Read Performance: Superior read speeds compared to Memcached and Redis in single-process scenarios
  • Scalability: FanoutCache enables improved concurrent write performance
  • Pure Python: Easy installation and deployment with no external dependencies
  • Django Integration: Can be used as a drop-in Django cache backend
  • Rich Features: Includes memoize decorators, distributed locks, function throttling, etc.
  • Flexible Configuration: Supports multiple eviction policies (LRU, LFU, etc.)

Cons

  • Write Performance: Slower writes than memory-based caches due to disk persistence
  • NFS Limitations: Poor performance on network filesystems like NFS due to SQLite constraints
  • Disk Usage: Requires disk space monitoring when caching large amounts of data
  • Async Support: No native async support due to SQLite module limitations
  • Write Contention: High latency possible during concurrent writes without sharding

Key Links

Code Examples

Basic Usage

import diskcache as dc

# Create cache instance
cache = dc.Cache('tmp')

# Set and get values
cache['key'] = 'value'
value = cache['key']
print(value)  # 'value'

# Check key existence
if 'key' in cache:
    print('Key exists')

# Delete key
del cache['key']

Advanced Configuration

import diskcache as dc

# Create cache with configuration
cache = dc.Cache(
    directory='my_cache',
    size_limit=int(1e9),  # 1GB size limit
    timeout=60.0,  # Default timeout
    eviction_policy='least-recently-used'  # LRU eviction policy
)

# Set data with expiration
cache.set('temp_key', 'temporary_value', expire=30)  # Expires in 30 seconds

# Set data with tags
cache.set('tagged_key', 'data', tag='group1')

# Evict by tag
cache.evict('group1')

FanoutCache for Concurrency

import diskcache as dc

# FanoutCache with 4 shards
cache = dc.FanoutCache(
    shards=4,
    timeout=1.0,
    directory='fanout_cache'
)

# Operations optimized for concurrent writes
cache['key1'] = 'value1'
cache['key2'] = 'value2'

# Using memoize decorator
@cache.memoize(expire=300, typed=True)
def expensive_function(x, y):
    # Heavy computation
    import time
    time.sleep(1)  # Simulate processing time
    return x * y + x ** y

# First call executes computation
result1 = expensive_function(2, 3)
# Subsequent calls use cached result
result2 = expensive_function(2, 3)

Django Integration

# settings.py
CACHES = {
    'default': {
        'BACKEND': 'diskcache.DjangoCache',
        'LOCATION': '/var/tmp/django_cache',
        'TIMEOUT': 300,
        'OPTIONS': {
            'size_limit': 2 ** 30,  # 1GB
            'cull_limit': 10,
        },
    }
}

# Usage in Django views
from django.core.cache import cache
from django.views.decorators.cache import cache_page

@cache_page(60 * 15)  # Cache for 15 minutes
def my_view(request):
    # Expensive database operation
    expensive_data = get_expensive_data()
    
    # Manual cache operations
    cache.set('user_data', expensive_data, 300)
    cached_data = cache.get('user_data')
    
    return render(request, 'template.html', {'data': cached_data})

Persistent Data Structures

import diskcache as dc

# Persistent Queue (Deque)
urls = dc.Deque('web_crawler_urls')
urls.append('https://example.com')
url = urls.popleft() if urls else None

# Persistent Index (Dict)
results = dc.Index('crawl_results')
results['https://example.com'] = 'page content'

# Transaction processing
with cache.transact():
    total = cache.incr('total', 123.45)
    count = cache.incr('count')
    
# Enable statistics
cache.stats(enable=True)
# Check statistics after processing
hits, misses = cache.stats(enable=False, reset=True)
print(f'Hit rate: {hits / (hits + misses) * 100:.2f}%')

Asynchronous Operations

import asyncio
import diskcache as dc

cache = dc.Cache()

async def set_async(key, value):
    loop = asyncio.get_running_loop()
    # Use ThreadPoolExecutor for async operations
    future = loop.run_in_executor(None, cache.set, key, value)
    result = await future
    return result

async def get_async(key):
    loop = asyncio.get_running_loop()
    future = loop.run_in_executor(None, cache.get, key)
    result = await future
    return result

# Usage example
async def main():
    await set_async('async_key', 'async_value')
    value = await get_async('async_key')
    print(f'Retrieved value: {value}')

# Execute
asyncio.run(main())