dill
Serialization Library
dill
Overview
dill is a serialization library that extends Python's standard pickle module. It's a powerful tool that can serialize complex Python objects that pickle cannot handle, such as lambda functions, nested functions, class definitions, and generators. It also allows saving and restoring entire interpreter sessions, making it particularly valuable in scientific computing and machine learning fields.
Details
dill is a library that significantly extends the functionality of Python's pickle module. While providing the same interface as pickle, it enables serialization of a much wider variety of Python objects.
Key Features:
- Broad Object Support: Serializes lambda functions, nested functions, closures, generators, class definitions, and other objects that pickle cannot handle
- Definition Storage: Saves object definitions along with the objects, making restoration in different environments easier
- Interpreter Session Saving: Can save and restore the entire Python interpreter state
- Pickle Compatible: Provides the same API as pickle and can be used as a drop-in replacement
Technical Details:
byrefoption: Save modules and other objects by reference (pickle-compatible behavior)recurseoption: Recursively trace objects in the global dictionaryfmodeoption: Control how file handles and file contents are savedprotocoloption: Specify the pickle protocol level
Pros and Cons
Pros
- Serialization of a wide range of objects beyond pickle's limitations
- Saves function and class definitions, reducing dependency issues during restoration
- Work continuity through saving entire interpreter sessions
- Low learning curve due to the same API as pickle
- Powerful tool in scientific computing and machine learning workflows
Cons
- Security risks (like pickle, data from untrusted sources is dangerous)
- Tends to create large file sizes (due to including definition information)
- Python-specific with no interoperability with other languages
- Potential version compatibility issues
References
- Official GitHub: https://github.com/uqfoundation/dill
- PyPI Page: https://pypi.org/project/dill/
- Documentation: https://dill.readthedocs.io/
Code Examples
Basic Usage
import dill
# Serialize objects
data = {'key': 'value', 'number': 42}
serialized = dill.dumps(data)
# Deserialize
restored = dill.loads(serialized)
print(restored) # {'key': 'value', 'number': 42}
# Save to file
with open('data.pkl', 'wb') as f:
dill.dump(data, f)
# Load from file
with open('data.pkl', 'rb') as f:
loaded_data = dill.load(f)
Serializing Lambda Functions
import dill
# Define lambda functions
square = lambda x: x ** 2
add = lambda x, y: x + y
# Serialize
serialized_square = dill.dumps(square)
serialized_add = dill.dumps(add)
# Deserialize and use
restored_square = dill.loads(serialized_square)
restored_add = dill.loads(serialized_add)
print(restored_square(5)) # 25
print(restored_add(3, 4)) # 7
Nested Functions and Closures
import dill
def outer_function(x):
def inner_function(y):
return x + y
return inner_function
# Create closure
closure = outer_function(10)
# Serialize and deserialize
serialized = dill.dumps(closure)
restored = dill.loads(serialized)
print(restored(5)) # 15
Serializing Classes and Instances
import dill
class Calculator:
def __init__(self, name):
self.name = name
self.history = []
def add(self, a, b):
result = a + b
self.history.append(f"{a} + {b} = {result}")
return result
def get_history(self):
return self.history
# Create and use instance
calc = Calculator("My Calculator")
calc.add(5, 3)
calc.add(10, 20)
# Serialize
serialized = dill.dumps(calc)
# Deserialize elsewhere (class definition is also restored)
restored_calc = dill.loads(serialized)
print(restored_calc.name) # My Calculator
print(restored_calc.get_history()) # ['5 + 3 = 8', '10 + 20 = 30']
Saving Interpreter Sessions
import dill
# Working data
x = 42
y = "Hello, World!"
data = [1, 2, 3, 4, 5]
def process_data(lst):
return [item * 2 for item in lst]
result = process_data(data)
# Save entire session
dill.dump_session('session.pkl')
# Later, restore in another session
# (Run in a new Python session)
import dill
dill.load_session('session.pkl')
# Saved variables and functions are available
print(x) # 42
print(y) # Hello, World!
print(result) # [2, 4, 6, 8, 10]
print(process_data([10, 20, 30])) # [20, 40, 60]
Advanced Configuration Options
import dill
# Complex object
class DataProcessor:
def __init__(self):
self.transform = lambda x: x ** 2
self.data = []
def process(self, items):
return [self.transform(item) for item in items]
processor = DataProcessor()
# Serialize with various options
# protocol: pickle protocol version
# byref: True saves by reference (pickle-compatible behavior)
# recurse: True traces recursively
serialized = dill.dumps(
processor,
protocol=dill.HIGHEST_PROTOCOL,
byref=False,
recurse=True
)
# Saving file handles and contents
with open('example.txt', 'w') as f:
f.write("Hello, dill!")
f.seek(0)
# fmode options
# - dill.HANDLE_FMODE: handle only
# - dill.CONTENTS_FMODE: file contents
# - dill.FILE_FMODE: contents and handle
file_data = dill.dumps(f, fmode=dill.FILE_FMODE)
# Restore
restored_file = dill.loads(file_data)