Query.jl

Package for data manipulation using LINQ-style query syntax. Enables data filtering, projection, joining, and grouping in functional programming style. Allows complex data transformations with intuitive syntax.

Juliadata queryingLINQdata transformationpipelinefunctional programminglazy evaluation

Framework

Query.jl

Overview

Query.jl is a data querying package for Julia. Inspired by C#'s LINQ and .NET Framework's query capabilities, it provides intuitive data transformation and aggregation processing.

Details

Query.jl is a package for performing data manipulation in Julia using functional programming approaches. Development began in 2017, and by providing syntax similar to C#'s LINQ (Language Integrated Query), it enables intuitive description of data transformation processes. Its key feature is lazy evaluation, which allows efficient processing of large amounts of data. Chain processing using the pipeline operator (|>) clearly expresses data transformation steps. It supports rich operations like filter, map, groupby, join, and orderby, enabling concise description of complex data processing. Through IteratorInterface implementation, it achieves memory-efficient processing and high-speed operation even with large datasets. It has interoperability with various data formats including DataFrames.jl, CSV.jl, and JSON, and is seamlessly integrated with Julia's overall data ecosystem.

Pros and Cons

Pros

  • Intuitive Syntax: LINQ-style readable code
  • Lazy Evaluation: Efficient processing of large-scale data
  • Chain Processing: Clear data flow with pipeline operators
  • Type Safety: Safe processing through Julia's type system
  • Diverse Operations: Rich set of query functions
  • Interoperability: Integration with various data formats

Cons

  • Learning Cost: Requires functional programming knowledge
  • Performance: May be slower than direct DataFrames operations for simple operations
  • Debugging Complexity: Difficult to trace processing due to lazy evaluation
  • Error Messages: Complex type error messages

Main Use Cases

  • Complex data transformation processing
  • Efficient processing of large datasets
  • ETL pipeline construction
  • Statistical data preprocessing
  • Integration of multiple data sources
  • Report generation systems
  • Data analysis workflows

Basic Usage

Installation

using Pkg
Pkg.add("Query")
using Query, DataFrames

Basic Query Operations

# Sample data preparation
df = DataFrame(
    name = ["Alice", "Bob", "Charlie", "David", "Eve"],
    age = [25, 30, 35, 28, 32],
    salary = [50000, 60000, 70000, 55000, 65000],
    department = ["Sales", "IT", "IT", "Sales", "HR"]
)

# Basic filtering
result = df |> 
    @filter(_.age > 28) |> 
    @map({name=_.name, salary=_.salary}) |> 
    collect

# Complex filtering
result = df |> 
    @filter(_.age > 25 && _.department == "IT") |> 
    @map({name=_.name, adjusted_salary=_.salary * 1.1}) |> 
    collect

# Sorting
result = df |> 
    @orderby(_.salary) |> 
    collect

# Descending sort
result = df |> 
    @orderby_descending(_.salary) |> 
    collect

Grouping and Aggregation

# Average salary by department
result = df |> 
    @groupby(_.department) |> 
    @map({department=key(_), avg_salary=mean(_.salary), count=length(_)}) |> 
    collect

# Complex grouping
result = df |> 
    @groupby(_.department) |> 
    @map({
        department=key(_),
        total_salary=sum(_.salary),
        avg_age=mean(_.age),
        employee_names=join(_.name, ", ")
    }) |> 
    collect

# Conditional grouping
result = df |> 
    @filter(_.age > 25) |> 
    @groupby(_.department) |> 
    @map({
        department=key(_),
        senior_count=length(_),
        senior_avg_salary=mean(_.salary)
    }) |> 
    collect

Join Operations

# Department information preparation
departments = DataFrame(
    department = ["Sales", "IT", "HR"],
    location = ["Tokyo", "Osaka", "Nagoya"],
    budget = [1000000, 1500000, 800000]
)

# Inner join
result = df |> 
    @join(departments, _.department, _.department,
          {name=_.name, age=_.age, salary=_.salary, 
           location=__.location, budget=__.budget}) |> 
    collect

# Left outer join-like processing
result = df |> 
    @map({name=_.name, age=_.age, salary=_.salary,
          location=departments |> @filter(_.department == __.department) |> 
                   @map(_.location) |> @take(1) |> collect |> first}) |> 
    collect

Data Transformation

# Adding new columns
result = df |> 
    @map({name=_.name, age=_.age, salary=_.salary,
          bonus=_.salary * 0.1,
          seniority=_.age > 30 ? "Senior" : "Junior"}) |> 
    collect

# Complex transformation
result = df |> 
    @map({
        name=_.name,
        age_group=_.age < 30 ? "Young" : _.age < 35 ? "Middle" : "Senior",
        salary_tier=_.salary < 55000 ? "Low" : _.salary < 65000 ? "Medium" : "High",
        total_compensation=_.salary + _.salary * 0.1
    }) |> 
    collect

# Conditional transformation
result = df |> 
    @map({
        name=_.name,
        adjusted_salary=_.department == "IT" ? _.salary * 1.2 : _.salary,
        performance_bonus=_.age > 30 && _.salary > 60000 ? 5000 : 0
    }) |> 
    collect

Advanced Operations

# Multiple filtering conditions
result = df |> 
    @filter(_.age >= 25 && _.age <= 35) |> 
    @filter(_.salary > 50000) |> 
    @orderby(_.salary) |> 
    collect

# Complex pipeline
result = df |> 
    @filter(_.age > 25) |> 
    @map({name=_.name, salary=_.salary, department=_.department}) |> 
    @groupby(_.department) |> 
    @map({
        department=key(_),
        employees=length(_),
        total_salary=sum(_.salary),
        avg_salary=mean(_.salary)
    }) |> 
    @orderby_descending(_.avg_salary) |> 
    collect

# Conditional branch processing
result = df |> 
    @map({
        name=_.name,
        category=_.department == "IT" ? "Technical" : "Business",
        salary_grade=_.salary > 60000 ? "High" : "Standard",
        potential=_.age < 30 && _.salary > 50000 ? "High Potential" : "Standard"
    }) |> 
    collect

Statistical Processing

# Basic statistics
stats = df |> 
    @map({
        total_employees=length(df),
        avg_age=mean(_.age),
        avg_salary=mean(_.salary),
        min_salary=minimum(_.salary),
        max_salary=maximum(_.salary)
    }) |> 
    @take(1) |> 
    collect |> first

# Department-wise statistics
dept_stats = df |> 
    @groupby(_.department) |> 
    @map({
        department=key(_),
        employee_count=length(_),
        avg_age=mean(_.age),
        age_std=std(_.age),
        salary_range=maximum(_.salary) - minimum(_.salary)
    }) |> 
    collect

Latest Trends (2025)

  • Asynchronous Processing: Asynchronous query execution for large-scale data
  • GPU Acceleration: High-speed processing with CUDA support
  • Distributed Processing: Query execution across multiple nodes
  • Improved Type Inference: More rigorous type checking
  • Query Optimization: Automatic query plan optimization

Summary

Query.jl is established as the standard choice for functional data processing in Julia in 2025. Its LINQ-style intuitive syntax enables concise description of complex data transformation processes, achieving efficient processing through lazy evaluation. Combined with DataFrames.jl, it enables building more expressive data analysis workflows.