DiskCache
GitHub Overview
grantjenks/python-diskcache
Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
Topics
Star History
Library
DiskCache
Overview
DiskCache is a Python library that provides disk-based persistent caching. Unlike memory-based caches, all data is stored on disk and persists after process or system restarts.
Details
DiskCache (python-diskcache) is a high-performance disk-based caching library developed as part of the SQLAlchemy Project. Designed as an alternative to memory-based caches like Redis and Memcached, it delivers exceptional performance especially for read operations. It employs a hybrid storage architecture, storing cache metadata (keys, expiration times, tags) in a SQLite database and large values as files on the filesystem. Small values are stored directly within the SQLite database. The FanoutCache feature provides automatic sharding to improve write performance in high-concurrency environments. It also offers persistent data structures like DjangoCache, Deque, and Index, making it applicable across a wide range of use cases from web applications to data science. The library addresses key performance issues inherent in previous systems like Beaker, particularly eliminating the "double-fetching" problem where values were frequently retrieved multiple times from the cache.
Pros and Cons
Pros
- Persistence: Data survives process restarts and system failures
- High Read Performance: Superior read speeds compared to Memcached and Redis in single-process scenarios
- Scalability: FanoutCache enables improved concurrent write performance
- Pure Python: Easy installation and deployment with no external dependencies
- Django Integration: Can be used as a drop-in Django cache backend
- Rich Features: Includes memoize decorators, distributed locks, function throttling, etc.
- Flexible Configuration: Supports multiple eviction policies (LRU, LFU, etc.)
Cons
- Write Performance: Slower writes than memory-based caches due to disk persistence
- NFS Limitations: Poor performance on network filesystems like NFS due to SQLite constraints
- Disk Usage: Requires disk space monitoring when caching large amounts of data
- Async Support: No native async support due to SQLite module limitations
- Write Contention: High latency possible during concurrent writes without sharding
Key Links
- DiskCache Official Documentation
- GitHub Repository
- PyPI Package
- Benchmark Results
- SQLAlchemy Project
- Django Integration Guide
Code Examples
Basic Usage
import diskcache as dc
# Create cache instance
cache = dc.Cache('tmp')
# Set and get values
cache['key'] = 'value'
value = cache['key']
print(value) # 'value'
# Check key existence
if 'key' in cache:
print('Key exists')
# Delete key
del cache['key']
Advanced Configuration
import diskcache as dc
# Create cache with configuration
cache = dc.Cache(
directory='my_cache',
size_limit=int(1e9), # 1GB size limit
timeout=60.0, # Default timeout
eviction_policy='least-recently-used' # LRU eviction policy
)
# Set data with expiration
cache.set('temp_key', 'temporary_value', expire=30) # Expires in 30 seconds
# Set data with tags
cache.set('tagged_key', 'data', tag='group1')
# Evict by tag
cache.evict('group1')
FanoutCache for Concurrency
import diskcache as dc
# FanoutCache with 4 shards
cache = dc.FanoutCache(
shards=4,
timeout=1.0,
directory='fanout_cache'
)
# Operations optimized for concurrent writes
cache['key1'] = 'value1'
cache['key2'] = 'value2'
# Using memoize decorator
@cache.memoize(expire=300, typed=True)
def expensive_function(x, y):
# Heavy computation
import time
time.sleep(1) # Simulate processing time
return x * y + x ** y
# First call executes computation
result1 = expensive_function(2, 3)
# Subsequent calls use cached result
result2 = expensive_function(2, 3)
Django Integration
# settings.py
CACHES = {
'default': {
'BACKEND': 'diskcache.DjangoCache',
'LOCATION': '/var/tmp/django_cache',
'TIMEOUT': 300,
'OPTIONS': {
'size_limit': 2 ** 30, # 1GB
'cull_limit': 10,
},
}
}
# Usage in Django views
from django.core.cache import cache
from django.views.decorators.cache import cache_page
@cache_page(60 * 15) # Cache for 15 minutes
def my_view(request):
# Expensive database operation
expensive_data = get_expensive_data()
# Manual cache operations
cache.set('user_data', expensive_data, 300)
cached_data = cache.get('user_data')
return render(request, 'template.html', {'data': cached_data})
Persistent Data Structures
import diskcache as dc
# Persistent Queue (Deque)
urls = dc.Deque('web_crawler_urls')
urls.append('https://example.com')
url = urls.popleft() if urls else None
# Persistent Index (Dict)
results = dc.Index('crawl_results')
results['https://example.com'] = 'page content'
# Transaction processing
with cache.transact():
total = cache.incr('total', 123.45)
count = cache.incr('count')
# Enable statistics
cache.stats(enable=True)
# Check statistics after processing
hits, misses = cache.stats(enable=False, reset=True)
print(f'Hit rate: {hits / (hits + misses) * 100:.2f}%')
Asynchronous Operations
import asyncio
import diskcache as dc
cache = dc.Cache()
async def set_async(key, value):
loop = asyncio.get_running_loop()
# Use ThreadPoolExecutor for async operations
future = loop.run_in_executor(None, cache.set, key, value)
result = await future
return result
async def get_async(key):
loop = asyncio.get_running_loop()
future = loop.run_in_executor(None, cache.get, key)
result = await future
return result
# Usage example
async def main():
await set_async('async_key', 'async_value')
value = await get_async('async_key')
print(f'Retrieved value: {value}')
# Execute
asyncio.run(main())