Query.jl

Package for data manipulation using LINQ-style query syntax. Enables data filtering, projection, joining, and grouping in functional programming style. Allows complex data transformations with intuitive syntax.

Juliadata queryingLINQdata transformationpipelinefunctional programminglazy evaluation

Framework

Query.jl

Overview

Query.jl is a data querying package for Julia. Inspired by C#'s LINQ and .NET Framework's query capabilities, it provides intuitive data transformation and aggregation processing.

Details

Query.jl is a package for performing data manipulation in Julia using functional programming approaches. Development began in 2017, and by providing syntax similar to C#'s LINQ (Language Integrated Query), it enables intuitive description of data transformation processes. Its key feature is lazy evaluation, which allows efficient processing of large amounts of data. Chain processing using the pipeline operator (|>) clearly expresses data transformation steps. It supports rich operations like filter, map, groupby, join, and orderby, enabling concise description of complex data processing. Through IteratorInterface implementation, it achieves memory-efficient processing and high-speed operation even with large datasets. It has interoperability with various data formats including DataFrames.jl, CSV.jl, and JSON, and is seamlessly integrated with Julia's overall data ecosystem.

Pros and Cons

Pros

Intuitive Syntax: LINQ-style readable code
Lazy Evaluation: Efficient processing of large-scale data
Chain Processing: Clear data flow with pipeline operators
Type Safety: Safe processing through Julia's type system
Diverse Operations: Rich set of query functions
Interoperability: Integration with various data formats

Cons

Learning Cost: Requires functional programming knowledge
Performance: May be slower than direct DataFrames operations for simple operations
Debugging Complexity: Difficult to trace processing due to lazy evaluation
Error Messages: Complex type error messages

Main Use Cases

Complex data transformation processing
Efficient processing of large datasets
ETL pipeline construction
Statistical data preprocessing
Integration of multiple data sources
Report generation systems
Data analysis workflows

Basic Usage

Installation

using Pkg
Pkg.add("Query")
using Query, DataFrames

Basic Query Operations

# Sample data preparation
df = DataFrame(
    name = ["Alice", "Bob", "Charlie", "David", "Eve"],
    age = [25, 30, 35, 28, 32],
    salary = [50000, 60000, 70000, 55000, 65000],
    department = ["Sales", "IT", "IT", "Sales", "HR"]
)

# Basic filtering
result = df |> 
    @filter(_.age > 28) |> 
    @map({name=_.name, salary=_.salary}) |> 
    collect

# Complex filtering
result = df |> 
    @filter(_.age > 25 && _.department == "IT") |> 
    @map({name=_.name, adjusted_salary=_.salary * 1.1}) |> 
    collect

# Sorting
result = df |> 
    @orderby(_.salary) |> 
    collect

# Descending sort
result = df |> 
    @orderby_descending(_.salary) |> 
    collect

Grouping and Aggregation

# Average salary by department
result = df |> 
    @groupby(_.department) |> 
    @map({department=key(_), avg_salary=mean(_.salary), count=length(_)}) |> 
    collect

# Complex grouping
result = df |> 
    @groupby(_.department) |> 
    @map({
        department=key(_),
        total_salary=sum(_.salary),
        avg_age=mean(_.age),
        employee_names=join(_.name, ", ")
    }) |> 
    collect

# Conditional grouping
result = df |> 
    @filter(_.age > 25) |> 
    @groupby(_.department) |> 
    @map({
        department=key(_),
        senior_count=length(_),
        senior_avg_salary=mean(_.salary)
    }) |> 
    collect

Join Operations

# Department information preparation
departments = DataFrame(
    department = ["Sales", "IT", "HR"],
    location = ["Tokyo", "Osaka", "Nagoya"],
    budget = [1000000, 1500000, 800000]
)

# Inner join
result = df |> 
    @join(departments, _.department, _.department,
          {name=_.name, age=_.age, salary=_.salary, 
           location=__.location, budget=__.budget}) |> 
    collect

# Left outer join-like processing
result = df |> 
    @map({name=_.name, age=_.age, salary=_.salary,
          location=departments |> @filter(_.department == __.department) |> 
                   @map(_.location) |> @take(1) |> collect |> first}) |> 
    collect

Data Transformation

# Adding new columns
result = df |> 
    @map({name=_.name, age=_.age, salary=_.salary,
          bonus=_.salary * 0.1,
          seniority=_.age > 30 ? "Senior" : "Junior"}) |> 
    collect

# Complex transformation
result = df |> 
    @map({
        name=_.name,
        age_group=_.age < 30 ? "Young" : _.age < 35 ? "Middle" : "Senior",
        salary_tier=_.salary < 55000 ? "Low" : _.salary < 65000 ? "Medium" : "High",
        total_compensation=_.salary + _.salary * 0.1
    }) |> 
    collect

# Conditional transformation
result = df |> 
    @map({
        name=_.name,
        adjusted_salary=_.department == "IT" ? _.salary * 1.2 : _.salary,
        performance_bonus=_.age > 30 && _.salary > 60000 ? 5000 : 0
    }) |> 
    collect

Advanced Operations

# Multiple filtering conditions
result = df |> 
    @filter(_.age >= 25 && _.age <= 35) |> 
    @filter(_.salary > 50000) |> 
    @orderby(_.salary) |> 
    collect

# Complex pipeline
result = df |> 
    @filter(_.age > 25) |> 
    @map({name=_.name, salary=_.salary, department=_.department}) |> 
    @groupby(_.department) |> 
    @map({
        department=key(_),
        employees=length(_),
        total_salary=sum(_.salary),
        avg_salary=mean(_.salary)
    }) |> 
    @orderby_descending(_.avg_salary) |> 
    collect

# Conditional branch processing
result = df |> 
    @map({
        name=_.name,
        category=_.department == "IT" ? "Technical" : "Business",
        salary_grade=_.salary > 60000 ? "High" : "Standard",
        potential=_.age < 30 && _.salary > 50000 ? "High Potential" : "Standard"
    }) |> 
    collect

Statistical Processing

# Basic statistics
stats = df |> 
    @map({
        total_employees=length(df),
        avg_age=mean(_.age),
        avg_salary=mean(_.salary),
        min_salary=minimum(_.salary),
        max_salary=maximum(_.salary)
    }) |> 
    @take(1) |> 
    collect |> first

# Department-wise statistics
dept_stats = df |> 
    @groupby(_.department) |> 
    @map({
        department=key(_),
        employee_count=length(_),
        avg_age=mean(_.age),
        age_std=std(_.age),
        salary_range=maximum(_.salary) - minimum(_.salary)
    }) |> 
    collect

Latest Trends (2025)

Asynchronous Processing: Asynchronous query execution for large-scale data
GPU Acceleration: High-speed processing with CUDA support
Distributed Processing: Query execution across multiple nodes
Improved Type Inference: More rigorous type checking
Query Optimization: Automatic query plan optimization

Summary

Query.jl is established as the standard choice for functional data processing in Julia in 2025. Its LINQ-style intuitive syntax enables concise description of complex data transformation processes, achieving efficient processing through lazy evaluation. Combined with DataFrames.jl, it enables building more expressive data analysis workflows.