kantan.csv

Type-safe CSV library. Utilizes Scala's type system to perform CSV data type checking at compile time. Achieves safe and efficient CSV processing through automatic case class mapping and custom encoder/decoder definitions.

ScalaCSVtype safedata processingcase classencoderdecoderparser

Framework

kantan.csv

Overview

kantan.csv is a type-safe CSV library that leverages Scala's type system. It achieves safe and efficient CSV processing through compile-time CSV data type checking, automatic case class mapping, and custom encoder/decoder definitions.

Details

kantan.csv was developed in 2016 with the goal of type-safe CSV processing in Scala. By leveraging Scala's powerful type system, it ensures type safety of CSV data at compile time, significantly reducing runtime errors. Its key feature is automatic mapping between case classes and CSV, enabling CSV reading and writing with minimal boilerplate code. The type class-based design allows intuitive definition of encoders and decoders for custom data types. Using the Shapeless library for automatic derivation, it enables type-safe CSV operations even for complex data structures. Error handling is also type-safe, properly handling parse errors and type conversion errors. Following functional programming principles, it provides APIs with immutable data structures and pure functions.

Pros and Cons

Pros

Type Safety: Reduces runtime errors through compile-time type checking
Automatic Mapping: Automatic conversion between case classes and CSV
Customizable: Flexible encoder/decoder definitions
Functional API: Predictable behavior through pure functions
Rich Type Support: Broad support for standard and custom types
Error Handling: Type-safe error processing

Cons

Learning Curve: Requires understanding of type classes
Compilation Time: Increased compile time due to type-level computations
Error Messages: Complex type error messages
Performance: Overhead for type safety guarantees

Main Use Cases

Data analysis preprocessing
ETL processing
Log file analysis
Configuration file processing
Report generation
Data validation
Financial data processing

Basic Usage

Adding Dependencies

libraryDependencies += "com.nrinaudo" %% "kantan.csv" % "0.7.0"

Basic CSV Reading

import kantan.csv._
import kantan.csv.ops._

// Reading CSV file
val csvFile = new java.io.File("data.csv")

// Reading as strings
val stringReader = csvFile.asCsvReader[String](',', true)
val strings = stringReader.toVector

// Type-safe reading (tuple)
val tupleReader = csvFile.asCsvReader[(String, Int, Double)](',', true)
val tuples = tupleReader.toVector

// Reading with error handling
tuples.foreach {
  case Success((name, age, salary)) =>
    println(s"Name: $name, Age: $age, Salary: $salary")
  case Failure(error) =>
    println(s"Error: ${error.getMessage}")
}

Automatic Mapping with Case Classes

import kantan.csv._
import kantan.csv.ops._
import kantan.csv.generic._

// Data structure definition
case class Employee(name: String, age: Int, salary: Double, department: String)

// Reading with automatic mapping
val employees = new java.io.File("employees.csv")
  .asCsvReader[Employee](',', true)
  .toVector

// Get only successful records
val validEmployees = employees.collect {
  case Success(employee) => employee
}

println(s"Valid employees: ${validEmployees.size}")
validEmployees.foreach(println)

Custom Data Type Support

import kantan.csv._
import kantan.csv.ops._
import java.time.LocalDate
import java.time.format.DateTimeFormatter

// Custom type decoder definition
implicit val localDateDecoder: CellDecoder[LocalDate] = {
  CellDecoder.from(str => 
    Try(LocalDate.parse(str, DateTimeFormatter.ofPattern("yyyy-MM-dd")))
      .toEither
      .left.map(ex => ParseError(s"Invalid date: $str", ex))
  )
}

// Custom type encoder definition
implicit val localDateEncoder: CellEncoder[LocalDate] = {
  CellEncoder.from(_.format(DateTimeFormatter.ofPattern("yyyy-MM-dd")))
}

// Case class with custom types
case class Person(name: String, birthDate: LocalDate, isActive: Boolean)

// Usage example
val people = new java.io.File("people.csv")
  .asCsvReader[Person](',', true)
  .toVector

val validPeople = people.collect {
  case Success(person) => person
}

validPeople.foreach(println)

CSV Writing

import kantan.csv._
import kantan.csv.ops._

// Data preparation
val employees = List(
  Employee("Alice", 30, 50000.0, "IT"),
  Employee("Bob", 25, 45000.0, "Sales"),
  Employee("Charlie", 35, 60000.0, "IT")
)

// Writing to file
val outputFile = new java.io.File("output.csv")
val writer = outputFile.asCsvWriter[Employee](',', true)

// Write with header
writer.write(employees)
writer.close()

// Writing single records
val singleWriter = outputFile.asCsvWriter[Employee](',', false)
employees.foreach(singleWriter.write)
singleWriter.close()

Advanced CSV Operations

import kantan.csv._
import kantan.csv.ops._
import kantan.csv.generic._

// Complex data structure
case class Address(street: String, city: String, zipCode: String)
case class EmployeeWithAddress(
  name: String,
  age: Int,
  salary: Double,
  address: Address
)

// Nested structure decoder
implicit val addressDecoder: RowDecoder[Address] = RowDecoder.decoder(2, 3, 4)(Address.apply)
implicit val employeeWithAddressDecoder: RowDecoder[EmployeeWithAddress] = 
  RowDecoder.decoder(0, 1, 2, Address.apply)(EmployeeWithAddress.apply)

// Conditional filtering
val highSalaryEmployees = new java.io.File("employees.csv")
  .asCsvReader[Employee](',', true)
  .toVector
  .collect {
    case Success(employee) if employee.salary > 50000 => employee
  }

// Grouping
val employeesByDepartment = highSalaryEmployees.groupBy(_.department)
employeesByDepartment.foreach { case (dept, emps) =>
  println(s"Department: $dept, Count: ${emps.size}")
}

Error Handling and Validation

import kantan.csv._
import kantan.csv.ops._

// Custom validation
case class ValidatedEmployee(name: String, age: Int, salary: Double) {
  require(name.nonEmpty, "Name cannot be empty")
  require(age > 0 && age < 120, "Age must be between 1 and 119")
  require(salary > 0, "Salary must be positive")
}

// Decoder with validation
implicit val validatedEmployeeDecoder: RowDecoder[ValidatedEmployee] = 
  RowDecoder.ordered { (name: String, age: Int, salary: Double) =>
    try {
      Success(ValidatedEmployee(name, age, salary))
    } catch {
      case ex: IllegalArgumentException => 
        Failure(ParseError(ex.getMessage, ex))
    }
  }

// Error detail processing
val results = new java.io.File("employees.csv")
  .asCsvReader[ValidatedEmployee](',', true)
  .toVector

val (errors, validEmployees) = results.partition(_.isFailure)

println(s"Valid records: ${validEmployees.size}")
println(s"Invalid records: ${errors.size}")

errors.foreach {
  case Failure(error) => println(s"Error: ${error.getMessage}")
  case _ => // Never reached
}

Streaming Processing

import kantan.csv._
import kantan.csv.ops._
import scala.util.Using

// Streaming processing of large CSV files
def processLargeCSV(filename: String): Unit = {
  Using(new java.io.File(filename).asCsvReader[Employee](',', true)) { reader =>
    reader
      .filter(_.isSuccess)
      .map(_.get)
      .filter(_.salary > 50000)
      .grouped(100)  // Batch processing
      .foreach { batch =>
        println(s"Processing batch of ${batch.size} employees")
        // Batch processing logic
        batch.foreach(employee => {
          // Individual processing
          println(s"Processing ${employee.name}")
        })
      }
  }
}

processLargeCSV("large_employees.csv")

Configuration and Customization

import kantan.csv._
import kantan.csv.ops._

// Custom CSV configuration
val customConfig = rfc.withCellSeparator(';').withQuote('"')

// Reading with custom configuration
val customReader = new java.io.File("semicolon_separated.csv")
  .asCsvReader[Employee](customConfig)

// Header mapping
case class PersonWithHeader(firstName: String, lastName: String, age: Int)

implicit val personHeaderDecoder: HeaderDecoder[PersonWithHeader] = 
  HeaderDecoder.decoder("first_name", "last_name", "age")(PersonWithHeader.apply)

val headerBasedReader = new java.io.File("with_headers.csv")
  .asCsvReader[PersonWithHeader](rfc.withHeader)
  .toVector

// Empty cell handling
case class OptionalData(name: String, age: Option[Int], salary: Option[Double])

val optionalReader = new java.io.File("optional_data.csv")
  .asCsvReader[OptionalData](',', true)
  .toVector

optionalReader.foreach {
  case Success(data) => println(s"Name: ${data.name}, Age: ${data.age}, Salary: ${data.salary}")
  case Failure(error) => println(s"Error: ${error.getMessage}")
}

Latest Trends (2025)

Scala 3 Support: Leveraging latest type system features
Performance Improvements: ZIO and Cats Effect integration
Enhanced Streaming: Large-scale data processing support
Validation Features: Integration with refined types
JSON Integration: Simplified CSV-JSON conversion

Summary

kantan.csv is established as the standard choice for type-safe CSV processing in Scala in 2025. Through type safety, automatic mapping, and customizability, it enables building robust and maintainable CSV processing applications. It is particularly a practical library that addresses strict data processing requirements in the financial and insurance industries.