Query.jl
Package for data manipulation using LINQ-style query syntax. Enables data filtering, projection, joining, and grouping in functional programming style. Allows complex data transformations with intuitive syntax.
Framework
Query.jl
Overview
Query.jl is a data querying package for Julia. Inspired by C#'s LINQ and .NET Framework's query capabilities, it provides intuitive data transformation and aggregation processing.
Details
Query.jl is a package for performing data manipulation in Julia using functional programming approaches. Development began in 2017, and by providing syntax similar to C#'s LINQ (Language Integrated Query), it enables intuitive description of data transformation processes. Its key feature is lazy evaluation, which allows efficient processing of large amounts of data. Chain processing using the pipeline operator (|>
) clearly expresses data transformation steps. It supports rich operations like filter, map, groupby, join, and orderby, enabling concise description of complex data processing. Through IteratorInterface implementation, it achieves memory-efficient processing and high-speed operation even with large datasets. It has interoperability with various data formats including DataFrames.jl, CSV.jl, and JSON, and is seamlessly integrated with Julia's overall data ecosystem.
Pros and Cons
Pros
- Intuitive Syntax: LINQ-style readable code
- Lazy Evaluation: Efficient processing of large-scale data
- Chain Processing: Clear data flow with pipeline operators
- Type Safety: Safe processing through Julia's type system
- Diverse Operations: Rich set of query functions
- Interoperability: Integration with various data formats
Cons
- Learning Cost: Requires functional programming knowledge
- Performance: May be slower than direct DataFrames operations for simple operations
- Debugging Complexity: Difficult to trace processing due to lazy evaluation
- Error Messages: Complex type error messages
Main Use Cases
- Complex data transformation processing
- Efficient processing of large datasets
- ETL pipeline construction
- Statistical data preprocessing
- Integration of multiple data sources
- Report generation systems
- Data analysis workflows
Basic Usage
Installation
using Pkg
Pkg.add("Query")
using Query, DataFrames
Basic Query Operations
# Sample data preparation
df = DataFrame(
name = ["Alice", "Bob", "Charlie", "David", "Eve"],
age = [25, 30, 35, 28, 32],
salary = [50000, 60000, 70000, 55000, 65000],
department = ["Sales", "IT", "IT", "Sales", "HR"]
)
# Basic filtering
result = df |>
@filter(_.age > 28) |>
@map({name=_.name, salary=_.salary}) |>
collect
# Complex filtering
result = df |>
@filter(_.age > 25 && _.department == "IT") |>
@map({name=_.name, adjusted_salary=_.salary * 1.1}) |>
collect
# Sorting
result = df |>
@orderby(_.salary) |>
collect
# Descending sort
result = df |>
@orderby_descending(_.salary) |>
collect
Grouping and Aggregation
# Average salary by department
result = df |>
@groupby(_.department) |>
@map({department=key(_), avg_salary=mean(_.salary), count=length(_)}) |>
collect
# Complex grouping
result = df |>
@groupby(_.department) |>
@map({
department=key(_),
total_salary=sum(_.salary),
avg_age=mean(_.age),
employee_names=join(_.name, ", ")
}) |>
collect
# Conditional grouping
result = df |>
@filter(_.age > 25) |>
@groupby(_.department) |>
@map({
department=key(_),
senior_count=length(_),
senior_avg_salary=mean(_.salary)
}) |>
collect
Join Operations
# Department information preparation
departments = DataFrame(
department = ["Sales", "IT", "HR"],
location = ["Tokyo", "Osaka", "Nagoya"],
budget = [1000000, 1500000, 800000]
)
# Inner join
result = df |>
@join(departments, _.department, _.department,
{name=_.name, age=_.age, salary=_.salary,
location=__.location, budget=__.budget}) |>
collect
# Left outer join-like processing
result = df |>
@map({name=_.name, age=_.age, salary=_.salary,
location=departments |> @filter(_.department == __.department) |>
@map(_.location) |> @take(1) |> collect |> first}) |>
collect
Data Transformation
# Adding new columns
result = df |>
@map({name=_.name, age=_.age, salary=_.salary,
bonus=_.salary * 0.1,
seniority=_.age > 30 ? "Senior" : "Junior"}) |>
collect
# Complex transformation
result = df |>
@map({
name=_.name,
age_group=_.age < 30 ? "Young" : _.age < 35 ? "Middle" : "Senior",
salary_tier=_.salary < 55000 ? "Low" : _.salary < 65000 ? "Medium" : "High",
total_compensation=_.salary + _.salary * 0.1
}) |>
collect
# Conditional transformation
result = df |>
@map({
name=_.name,
adjusted_salary=_.department == "IT" ? _.salary * 1.2 : _.salary,
performance_bonus=_.age > 30 && _.salary > 60000 ? 5000 : 0
}) |>
collect
Advanced Operations
# Multiple filtering conditions
result = df |>
@filter(_.age >= 25 && _.age <= 35) |>
@filter(_.salary > 50000) |>
@orderby(_.salary) |>
collect
# Complex pipeline
result = df |>
@filter(_.age > 25) |>
@map({name=_.name, salary=_.salary, department=_.department}) |>
@groupby(_.department) |>
@map({
department=key(_),
employees=length(_),
total_salary=sum(_.salary),
avg_salary=mean(_.salary)
}) |>
@orderby_descending(_.avg_salary) |>
collect
# Conditional branch processing
result = df |>
@map({
name=_.name,
category=_.department == "IT" ? "Technical" : "Business",
salary_grade=_.salary > 60000 ? "High" : "Standard",
potential=_.age < 30 && _.salary > 50000 ? "High Potential" : "Standard"
}) |>
collect
Statistical Processing
# Basic statistics
stats = df |>
@map({
total_employees=length(df),
avg_age=mean(_.age),
avg_salary=mean(_.salary),
min_salary=minimum(_.salary),
max_salary=maximum(_.salary)
}) |>
@take(1) |>
collect |> first
# Department-wise statistics
dept_stats = df |>
@groupby(_.department) |>
@map({
department=key(_),
employee_count=length(_),
avg_age=mean(_.age),
age_std=std(_.age),
salary_range=maximum(_.salary) - minimum(_.salary)
}) |>
collect
Latest Trends (2025)
- Asynchronous Processing: Asynchronous query execution for large-scale data
- GPU Acceleration: High-speed processing with CUDA support
- Distributed Processing: Query execution across multiple nodes
- Improved Type Inference: More rigorous type checking
- Query Optimization: Automatic query plan optimization
Summary
Query.jl is established as the standard choice for functional data processing in Julia in 2025. Its LINQ-style intuitive syntax enables concise description of complex data transformation processes, achieving efficient processing through lazy evaluation. Combined with DataFrames.jl, it enables building more expressive data analysis workflows.