kantan.csv
Type-safe CSV library. Utilizes Scala's type system to perform CSV data type checking at compile time. Achieves safe and efficient CSV processing through automatic case class mapping and custom encoder/decoder definitions.
Framework
kantan.csv
Overview
kantan.csv is a type-safe CSV library that leverages Scala's type system. It achieves safe and efficient CSV processing through compile-time CSV data type checking, automatic case class mapping, and custom encoder/decoder definitions.
Details
kantan.csv was developed in 2016 with the goal of type-safe CSV processing in Scala. By leveraging Scala's powerful type system, it ensures type safety of CSV data at compile time, significantly reducing runtime errors. Its key feature is automatic mapping between case classes and CSV, enabling CSV reading and writing with minimal boilerplate code. The type class-based design allows intuitive definition of encoders and decoders for custom data types. Using the Shapeless library for automatic derivation, it enables type-safe CSV operations even for complex data structures. Error handling is also type-safe, properly handling parse errors and type conversion errors. Following functional programming principles, it provides APIs with immutable data structures and pure functions.
Pros and Cons
Pros
- Type Safety: Reduces runtime errors through compile-time type checking
- Automatic Mapping: Automatic conversion between case classes and CSV
- Customizable: Flexible encoder/decoder definitions
- Functional API: Predictable behavior through pure functions
- Rich Type Support: Broad support for standard and custom types
- Error Handling: Type-safe error processing
Cons
- Learning Curve: Requires understanding of type classes
- Compilation Time: Increased compile time due to type-level computations
- Error Messages: Complex type error messages
- Performance: Overhead for type safety guarantees
Main Use Cases
- Data analysis preprocessing
- ETL processing
- Log file analysis
- Configuration file processing
- Report generation
- Data validation
- Financial data processing
Basic Usage
Adding Dependencies
libraryDependencies += "com.nrinaudo" %% "kantan.csv" % "0.7.0"
Basic CSV Reading
import kantan.csv._
import kantan.csv.ops._
// Reading CSV file
val csvFile = new java.io.File("data.csv")
// Reading as strings
val stringReader = csvFile.asCsvReader[String](',', true)
val strings = stringReader.toVector
// Type-safe reading (tuple)
val tupleReader = csvFile.asCsvReader[(String, Int, Double)](',', true)
val tuples = tupleReader.toVector
// Reading with error handling
tuples.foreach {
case Success((name, age, salary)) =>
println(s"Name: $name, Age: $age, Salary: $salary")
case Failure(error) =>
println(s"Error: ${error.getMessage}")
}
Automatic Mapping with Case Classes
import kantan.csv._
import kantan.csv.ops._
import kantan.csv.generic._
// Data structure definition
case class Employee(name: String, age: Int, salary: Double, department: String)
// Reading with automatic mapping
val employees = new java.io.File("employees.csv")
.asCsvReader[Employee](',', true)
.toVector
// Get only successful records
val validEmployees = employees.collect {
case Success(employee) => employee
}
println(s"Valid employees: ${validEmployees.size}")
validEmployees.foreach(println)
Custom Data Type Support
import kantan.csv._
import kantan.csv.ops._
import java.time.LocalDate
import java.time.format.DateTimeFormatter
// Custom type decoder definition
implicit val localDateDecoder: CellDecoder[LocalDate] = {
CellDecoder.from(str =>
Try(LocalDate.parse(str, DateTimeFormatter.ofPattern("yyyy-MM-dd")))
.toEither
.left.map(ex => ParseError(s"Invalid date: $str", ex))
)
}
// Custom type encoder definition
implicit val localDateEncoder: CellEncoder[LocalDate] = {
CellEncoder.from(_.format(DateTimeFormatter.ofPattern("yyyy-MM-dd")))
}
// Case class with custom types
case class Person(name: String, birthDate: LocalDate, isActive: Boolean)
// Usage example
val people = new java.io.File("people.csv")
.asCsvReader[Person](',', true)
.toVector
val validPeople = people.collect {
case Success(person) => person
}
validPeople.foreach(println)
CSV Writing
import kantan.csv._
import kantan.csv.ops._
// Data preparation
val employees = List(
Employee("Alice", 30, 50000.0, "IT"),
Employee("Bob", 25, 45000.0, "Sales"),
Employee("Charlie", 35, 60000.0, "IT")
)
// Writing to file
val outputFile = new java.io.File("output.csv")
val writer = outputFile.asCsvWriter[Employee](',', true)
// Write with header
writer.write(employees)
writer.close()
// Writing single records
val singleWriter = outputFile.asCsvWriter[Employee](',', false)
employees.foreach(singleWriter.write)
singleWriter.close()
Advanced CSV Operations
import kantan.csv._
import kantan.csv.ops._
import kantan.csv.generic._
// Complex data structure
case class Address(street: String, city: String, zipCode: String)
case class EmployeeWithAddress(
name: String,
age: Int,
salary: Double,
address: Address
)
// Nested structure decoder
implicit val addressDecoder: RowDecoder[Address] = RowDecoder.decoder(2, 3, 4)(Address.apply)
implicit val employeeWithAddressDecoder: RowDecoder[EmployeeWithAddress] =
RowDecoder.decoder(0, 1, 2, Address.apply)(EmployeeWithAddress.apply)
// Conditional filtering
val highSalaryEmployees = new java.io.File("employees.csv")
.asCsvReader[Employee](',', true)
.toVector
.collect {
case Success(employee) if employee.salary > 50000 => employee
}
// Grouping
val employeesByDepartment = highSalaryEmployees.groupBy(_.department)
employeesByDepartment.foreach { case (dept, emps) =>
println(s"Department: $dept, Count: ${emps.size}")
}
Error Handling and Validation
import kantan.csv._
import kantan.csv.ops._
// Custom validation
case class ValidatedEmployee(name: String, age: Int, salary: Double) {
require(name.nonEmpty, "Name cannot be empty")
require(age > 0 && age < 120, "Age must be between 1 and 119")
require(salary > 0, "Salary must be positive")
}
// Decoder with validation
implicit val validatedEmployeeDecoder: RowDecoder[ValidatedEmployee] =
RowDecoder.ordered { (name: String, age: Int, salary: Double) =>
try {
Success(ValidatedEmployee(name, age, salary))
} catch {
case ex: IllegalArgumentException =>
Failure(ParseError(ex.getMessage, ex))
}
}
// Error detail processing
val results = new java.io.File("employees.csv")
.asCsvReader[ValidatedEmployee](',', true)
.toVector
val (errors, validEmployees) = results.partition(_.isFailure)
println(s"Valid records: ${validEmployees.size}")
println(s"Invalid records: ${errors.size}")
errors.foreach {
case Failure(error) => println(s"Error: ${error.getMessage}")
case _ => // Never reached
}
Streaming Processing
import kantan.csv._
import kantan.csv.ops._
import scala.util.Using
// Streaming processing of large CSV files
def processLargeCSV(filename: String): Unit = {
Using(new java.io.File(filename).asCsvReader[Employee](',', true)) { reader =>
reader
.filter(_.isSuccess)
.map(_.get)
.filter(_.salary > 50000)
.grouped(100) // Batch processing
.foreach { batch =>
println(s"Processing batch of ${batch.size} employees")
// Batch processing logic
batch.foreach(employee => {
// Individual processing
println(s"Processing ${employee.name}")
})
}
}
}
processLargeCSV("large_employees.csv")
Configuration and Customization
import kantan.csv._
import kantan.csv.ops._
// Custom CSV configuration
val customConfig = rfc.withCellSeparator(';').withQuote('"')
// Reading with custom configuration
val customReader = new java.io.File("semicolon_separated.csv")
.asCsvReader[Employee](customConfig)
// Header mapping
case class PersonWithHeader(firstName: String, lastName: String, age: Int)
implicit val personHeaderDecoder: HeaderDecoder[PersonWithHeader] =
HeaderDecoder.decoder("first_name", "last_name", "age")(PersonWithHeader.apply)
val headerBasedReader = new java.io.File("with_headers.csv")
.asCsvReader[PersonWithHeader](rfc.withHeader)
.toVector
// Empty cell handling
case class OptionalData(name: String, age: Option[Int], salary: Option[Double])
val optionalReader = new java.io.File("optional_data.csv")
.asCsvReader[OptionalData](',', true)
.toVector
optionalReader.foreach {
case Success(data) => println(s"Name: ${data.name}, Age: ${data.age}, Salary: ${data.salary}")
case Failure(error) => println(s"Error: ${error.getMessage}")
}
Latest Trends (2025)
- Scala 3 Support: Leveraging latest type system features
- Performance Improvements: ZIO and Cats Effect integration
- Enhanced Streaming: Large-scale data processing support
- Validation Features: Integration with refined types
- JSON Integration: Simplified CSV-JSON conversion
Summary
kantan.csv is established as the standard choice for type-safe CSV processing in Scala in 2025. Through type safety, automatic mapping, and customizability, it enables building robust and maintainable CSV processing applications. It is particularly a practical library that addresses strict data processing requirements in the financial and insurance industries.