strace
Debugging Tool
strace
Overview
strace is a powerful system call tracer for Linux. It captures and records all system calls made by a process and signals received, providing insights into how programs interact with the kernel.
Details
strace is a diagnostic, instructional, and debugging tool that traces system calls and signals, originally developed for SunOS and later ported to Linux. First appearing in 1991, it has become an indispensable tool for system administrators and developers troubleshooting programs without access to source code. strace works by using the ptrace system call to attach to a target process, allowing it to intercept and log every interaction between the process and the Linux kernel.
The fundamental principle of strace is that every user-space program must communicate with the kernel through system calls to perform operations like file I/O, network communication, memory allocation, and process management. By intercepting these calls, strace provides a complete picture of what a program is actually doing at the system level. This makes it invaluable for diagnosing issues like "Why won't this program start?", "What files is it trying to access?", or "Why is it running slowly?"
strace operates non-intrusively, requiring no modification to the target program. It can attach to running processes or start new ones under its control. The tool displays system call names, arguments, return values, and error codes, making it possible to understand program behavior without examining source code. It supports filtering specific system calls, saving output to files, and generating statistical summaries of system call usage, helping identify performance bottlenecks and resource usage patterns.
While strace is extremely powerful for debugging and learning how programs work, it introduces significant overhead - traced processes can run 10-50 times slower. This makes it unsuitable for production performance analysis, where newer tools like perf-trace or eBPF-based alternatives are preferred. However, for understanding program behavior, diagnosing configuration issues, and learning about Linux system programming, strace remains one of the most important tools in a developer's arsenal.
Pros and Cons
Pros
- No Source Required: Works with any binary without source code
- Non-intrusive: No need to modify or recompile programs
- Comprehensive Tracing: Captures all system calls and signals
- Real-time Monitoring: Shows system calls as they happen
- Process Attachment: Can attach to running processes
- Detailed Information: Shows arguments, return values, and errno
- Educational Value: Excellent for learning system programming
- Wide Availability: Pre-installed or easily available on most Linux distributions
Cons
- Performance Impact: Slows down traced processes significantly (10-50x)
- Linux Specific: Only works on Linux and some Unix-like systems
- Output Volume: Can generate overwhelming amounts of output
- Interpretation Needed: Requires knowledge of system calls
- Not for Production: Too slow for production performance analysis
- Limited to System Calls: Doesn't show internal program logic
- No Real-time Systems: Unsuitable for debugging real-time applications
Key Links
- strace Manual Page
- strace GitHub Repository
- strace Tutorial
- System Call Reference
- Linux System Call Table
- strace Examples
Usage Examples
Basic System Call Tracing
# Simple program to trace
# hello.c
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main() {
printf("Hello, strace!\n");
// File operations
int fd = open("test.txt", O_CREAT | O_WRONLY, 0644);
if (fd != -1) {
write(fd, "Testing strace\n", 15);
close(fd);
}
return 0;
}
# Compile and trace
gcc -o hello hello.c
# Basic strace usage
strace ./hello
# Output shows system calls:
# execve("./hello", ["./hello"], 0x7ffe3f6a8230 /* 48 vars */) = 0
# brk(NULL) = 0x55de27e97000
# arch_prctl(0x3001 /* ARCH_??? */, 0x7ffee377e6b0) = -1 EINVAL
# access("/etc/ld.so.preload", R_OK) = -1 ENOENT
# openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
# ...
# write(1, "Hello, strace!\n", 15) = 15
# openat(AT_FDCWD, "test.txt", O_WRONLY|O_CREAT, 0644) = 3
# write(3, "Testing strace\n", 15) = 15
# close(3) = 0
# exit_group(0) = ?
# Save output to file
strace -o strace_output.txt ./hello
# Follow child processes
strace -f ./hello
# Show timestamps
strace -t ./hello # Time of day
strace -r ./hello # Relative time between calls
strace -T ./hello # Time spent in each call
Filtering Specific System Calls
# Trace only file operations
strace -e trace=file ls /tmp
# Traces: open, stat, chmod, unlink, etc.
# Trace only network operations
strace -e trace=network curl https://example.com
# Traces: socket, connect, send, recv, etc.
# Trace only process operations
strace -e trace=process bash -c "echo hello | grep h"
# Traces: fork, execve, wait4, exit, etc.
# Trace only memory operations
strace -e trace=memory ./memory_intensive_app
# Traces: mmap, munmap, brk, etc.
# Trace specific system calls
strace -e open,read,write,close cat /etc/passwd
# Exclude specific calls
strace -e trace=\!futex,poll ./application
# Multiple filters combined
strace -e trace=open,openat,read,write -e signal=none ./hello
# Common filter expressions:
# -e trace=file # File operations
# -e trace=process # Process management
# -e trace=network # Network operations
# -e trace=signal # Signal handling
# -e trace=ipc # IPC operations
# -e trace=desc # File descriptor operations
# -e trace=memory # Memory mapping
Attaching to Running Processes
# Start a long-running process
# server.py
import socket
import time
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('localhost', 8080))
server.listen(5)
print(f"Server listening on port 8080, PID: {os.getpid()}")
while True:
client, addr = server.accept()
data = client.recv(1024)
client.send(b"Echo: " + data)
client.close()
# Run the server
python3 server.py &
SERVER_PID=$!
# Attach strace to running process
sudo strace -p $SERVER_PID
# Attach and follow children
sudo strace -p $SERVER_PID -f
# Attach with specific filters
sudo strace -p $SERVER_PID -e trace=network
# Detach with Ctrl+C
# The process continues running after detachment
# Trace a process tree
sudo strace -p $(pgrep -f "apache2") -f
# Trace with output to file
sudo strace -p $SERVER_PID -o server_trace.log
# Attach to multiple PIDs
sudo strace -p $PID1 -p $PID2 -p $PID3
Debugging File Access Issues
# Program with file access issues
# file_access.c
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
int main() {
// Try to open configuration file
int fd = open("/etc/myapp/config.conf", O_RDONLY);
if (fd == -1) {
printf("Failed to open config: %s\n", strerror(errno));
}
// Try to access user file
fd = open("~/.myapp/data.txt", O_RDONLY);
if (fd == -1) {
printf("Failed to open user data: %s\n", strerror(errno));
}
return 0;
}
# Trace to see actual file paths and errors
strace ./file_access 2>&1 | grep -E "open|access"
# Output reveals:
# openat(AT_FDCWD, "/etc/myapp/config.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
# openat(AT_FDCWD, "~/.myapp/data.txt", O_RDONLY) = -1 ENOENT (No such file or directory)
# Note: ~ is not expanded - literal path used!
# Check library loading issues
strace -e trace=open,openat,access ./application 2>&1 | grep -v ENOENT
# Find where program looks for files
strace -e trace=stat,lstat,access,open,openat ./application 2>&1 | grep -E "(conf|config|rc)"
# Debug permission issues
strace -e trace=open,openat,access -o perms.log ./application
grep "EACCES\|EPERM" perms.log
Performance Analysis with strace
# Generate system call statistics
strace -c ls -la /usr/bin > /dev/null
# Output:
# % time seconds usecs/call calls errors syscall
# ------ ----------- ----------- --------- --------- ----------------
# 30.00 0.000120 2 48 mmap
# 20.00 0.000080 1 53 close
# 15.00 0.000060 1 45 openat
# 10.00 0.000040 0 97 lstat
# 8.00 0.000032 0 106 getdents64
# ...
# Trace with timing information
strace -T -o timing.log ./application
# Each line shows time spent: <syscall> = result <time>
# Find slow system calls
strace -T ./application 2>&1 | awk -F'<|>' '$2 > 0.001 {print $0}'
# Analyze I/O patterns
strace -e trace=read,write,pread,pwrite -T ./application 2>&1 | \
awk -F'[<>]' '{sum+=$2; count++} END {print "Avg I/O time:", sum/count}'
# Create syscall histogram
strace -c -S time ./application
# Sorts by time spent in each syscall
# Trace with syscall counts
strace -C ./application
# Shows both regular output and summary
Network Debugging with strace
# Debug network connections
# network_client.c
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <unistd.h>
int main() {
int sock = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in server;
server.sin_addr.s_addr = inet_addr("8.8.8.8");
server.sin_family = AF_INET;
server.sin_port = htons(80);
if (connect(sock, (struct sockaddr *)&server, sizeof(server)) < 0) {
perror("connect failed");
return 1;
}
char *message = "GET / HTTP/1.0\r\n\r\n";
send(sock, message, strlen(message), 0);
char response[2000];
recv(sock, response, 2000, 0);
close(sock);
return 0;
}
# Trace network operations
strace -e trace=network ./network_client
# Detailed network debugging
strace -e trace=socket,connect,bind,listen,accept,send,recv,sendto,recvfrom \
-e read=all -e write=all ./network_client 2>&1
# Show full data transferred
strace -e trace=network -s 1024 -x ./network_client
# Debug DNS resolution
strace -e trace=network,open,read dig google.com
# Trace HTTP client
strace -e trace=network -f curl https://example.com 2>&1 | \
grep -E "socket|connect|send|recv"
# Monitor connection attempts
strace -e connect -f ./application 2>&1 | \
awk -F'[{}]' '/connect/ {print $2}'
Advanced Debugging Techniques
#!/bin/bash
# advanced_strace.sh
# 1. Debug program startup issues
debug_startup() {
local program="$1"
echo "=== Debugging startup of $program ==="
# Check library dependencies
strace -e trace=open,openat "$program" 2>&1 | \
grep -E "\.so|\.conf" | grep -v ENOENT
# Check environment access
strace -e trace=access,stat "$program" 2>&1 | \
grep -E "PATH|HOME|USER"
}
# 2. Monitor file modifications
monitor_file_changes() {
local program="$1"
strace -e trace=open,openat,creat,unlink,rename \
-e signal=none "$program" 2>&1 | \
grep -v "O_RDONLY"
}
# 3. Trace signal handling
trace_signals() {
local pid="$1"
strace -e trace=signal -p "$pid" 2>&1
}
# 4. Debug permission problems
debug_permissions() {
local program="$1"
strace -e trace=access,open,openat,stat,lstat \
"$program" 2>&1 | \
grep -E "EACCES|EPERM"
}
# 5. Find configuration files
find_config_files() {
local program="$1"
strace -e trace=open,openat,access,stat \
"$program" 2>&1 | \
grep -E "(config|conf|rc|ini)" | \
grep -v ENOENT | \
awk -F'"' '{print $2}' | sort -u
}
# 6. Analyze system call patterns
analyze_patterns() {
local logfile="$1"
echo "=== System Call Analysis ==="
# Most frequent syscalls
echo "Top 10 system calls:"
awk '{print $1}' "$logfile" | \
sort | uniq -c | sort -rn | head -10
# Error analysis
echo -e "\nErrors by syscall:"
grep -E "= -[0-9]+ E" "$logfile" | \
awk '{print $1, $NF}' | \
sort | uniq -c | sort -rn
}
# 7. Compare two program runs
compare_traces() {
local prog1="$1"
local prog2="$2"
strace -c "$prog1" 2>&1 | grep -A50 "syscall" > trace1.tmp
strace -c "$prog2" 2>&1 | grep -A50 "syscall" > trace2.tmp
diff -u trace1.tmp trace2.tmp
rm -f trace1.tmp trace2.tmp
}
# Usage examples
# debug_startup /usr/bin/myapp
# monitor_file_changes "./batch_processor"
# find_config_files /usr/sbin/nginx
Production-Safe Alternatives
# For production environments, consider these alternatives:
# 1. Using perf-trace (lower overhead)
perf trace -e syscalls:sys_enter_open* ./application
# 2. Using BPF tools (minimal overhead)
# Install bcc-tools first
opensnoop-bpfcc # Trace open() syscalls
execsnoop-bpfcc # Trace new processes
tcpconnect-bpfcc # Trace TCP connections
biolatency-bpfcc # Block I/O latency
# 3. Using SystemTap (scriptable)
stap -e 'probe syscall.open { printf("%s opened %s\n", execname(), filename) }'
# 4. Using ftrace (kernel tracer)
echo 'sys_open' > /sys/kernel/debug/tracing/set_ftrace_filter
echo function > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on
cat /sys/kernel/debug/tracing/trace
# 5. Container-aware tracing with traceloop
# For Kubernetes/container environments
traceloop --pod my-pod-name
traceloop --container my-container-id
# 6. Selective strace with reduced overhead
# Use seccomp-bpf acceleration (Linux 4.8+)
strace --seccomp-bpf -e trace=open,close ./application
# Compare overhead
time ./application # Baseline
time strace -c ./application 2>/dev/null # With strace
time perf trace -s ./application # With perf
Practical Debugging Scenarios
# Scenario 1: "Why won't this program start?"
strace ./failing_program 2>&1 | head -50
# Look for failed opens, missing libraries, permission errors
# Scenario 2: "What files is this program accessing?"
strace -e trace=open,openat,access -f ./program 2>&1 | \
grep -v ENOENT | cut -d'"' -f2 | sort -u
# Scenario 3: "Why is this program slow?"
strace -c -S time ./slow_program
# Identify syscalls taking the most time
# Scenario 4: "What's this program doing right now?"
sudo strace -p $(pgrep program_name) -s 80
# Shows current activity with 80-char string limit
# Scenario 5: "Debug daemon startup"
strace -o /tmp/daemon.trace -f service mydaemon start
# Review /tmp/daemon.trace for issues
# Scenario 6: "Trace database connections"
strace -e trace=network -f -s 200 ./database_app 2>&1 | \
grep -E "connect.*3306|connect.*5432"
# Scenario 7: "Find memory leaks indications"
strace -e trace=brk,mmap,munmap ./application 2>&1 | \
awk '/brk/ {brk++} /mmap/ {mmap++} /munmap/ {munmap++}
END {print "brk:", brk, "mmap:", mmap, "munmap:", munmap}'
# Scenario 8: "Debug container issues"
docker run --rm --cap-add SYS_PTRACE \
--security-opt seccomp=unconfined \
alpine strace -e trace=all echo "Hello from container"