Breeze
Numerical computing library for Scala. Provides comprehensive numerical processing functionality for linear algebra, statistics, and machine learning. Supports matrix operations, signal processing, and optimization algorithms with NumPy-like API.
Framework
Breeze
Overview
Breeze is a numerical computing library for Scala. It provides comprehensive numerical processing functionality for linear algebra, statistics, and machine learning, supporting matrix operations, signal processing, and optimization algorithms with a NumPy-like API.
Details
Breeze was developed starting in 2011 as a numerical computing library for Scala. Developed as part of the ScalaNLP project, it aims to provide numerical computing capabilities comparable to Python's NumPy and MATLAB in Scala. Its key feature is providing an intuitive and expressive numerical computing API while maintaining Scala's type safety. It integrates linear algebra operations centered around DenseVector and DenseMatrix, statistical functions, optimization algorithms, and signal processing capabilities. Integration with BLAS (Basic Linear Algebra Subprograms) enables high-speed matrix operations. Integration with Spark MLlib is also provided, enabling use in distributed machine learning environments. Following functional programming principles, it is based on APIs with immutable data structures and pure functions, while also supporting mutable operations when performance is required.
Pros and Cons
Pros
- NumPy-like API: Intuitive and easy-to-learn interface
- Type Safety: Reduced runtime errors through Scala's type system
- High-Speed Operations: Optimized matrix operations through BLAS integration
- Rich Features: Comprehensive support for linear algebra, statistics, and optimization
- Spark Integration: Available for distributed machine learning
- Functional Support: Predictable behavior through pure functions
Cons
- Memory Usage: High memory consumption for large matrices
- Learning Curve: Requires knowledge of both Scala and numerical computing
- Performance: Some operations may be inferior compared to NumPy
- Ecosystem: Limited compared to Python's scientific computing environment
Main Use Cases
- Scientific computing and simulation
- Machine learning algorithm implementation
- Signal processing and image processing
- Statistical analysis and data analysis
- Optimization problem solving
- Numerical analysis
- Research and educational purposes
Basic Usage
Adding Dependencies
libraryDependencies ++= Seq(
"org.scalanlp" %% "breeze" % "2.1.0",
"org.scalanlp" %% "breeze-natives" % "2.1.0" // BLAS optimization
)
Basic Vector Operations
import breeze.linalg._
import breeze.numerics._
// Vector creation
val v1 = DenseVector(1.0, 2.0, 3.0, 4.0)
val v2 = DenseVector(2.0, 3.0, 4.0, 5.0)
// Basic operations
val sum = v1 + v2
val product = v1 * v2 // Element-wise product
val dotProduct = v1 dot v2 // Dot product
val norm = norm(v1) // Norm
println(s"Sum: $sum")
println(s"Element-wise product: $product")
println(s"Dot product: $dotProduct")
println(s"Norm: $norm")
// Vector transformations
val doubled = v1.map(_ * 2)
val filtered = v1.filter(_ > 2.0)
val normalized = v1 / norm(v1)
println(s"Doubled: $doubled")
println(s"Filtered: $filtered")
println(s"Normalized: $normalized")
Matrix Operations
import breeze.linalg._
import breeze.numerics._
// Matrix creation
val matrix1 = DenseMatrix(
(1.0, 2.0, 3.0),
(4.0, 5.0, 6.0),
(7.0, 8.0, 9.0)
)
val matrix2 = DenseMatrix(
(2.0, 0.0, 1.0),
(1.0, 3.0, 2.0),
(0.0, 1.0, 4.0)
)
// Matrix operations
val matrixSum = matrix1 + matrix2
val matrixProduct = matrix1 * matrix2
val transpose = matrix1.t
val inverse = inv(matrix2)
println(s"Matrix sum:\n$matrixSum")
println(s"Matrix product:\n$matrixProduct")
println(s"Transpose:\n$transpose")
println(s"Inverse:\n$inverse")
// Matrix decomposition
val svd.SVD(u, s, vt) = svd(matrix1)
println(s"SVD - U:\n$u")
println(s"SVD - S: $s")
println(s"SVD - Vt:\n$vt")
Statistical Functions
import breeze.linalg._
import breeze.stats._
// Data preparation
val data = DenseVector(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0)
// Basic statistics
val mean = breeze.stats.mean(data)
val variance = breeze.stats.variance(data)
val stddev = breeze.stats.stddev(data)
val median = breeze.stats.median(data)
println(s"Mean: $mean")
println(s"Variance: $variance")
println(s"Standard deviation: $stddev")
println(s"Median: $median")
// Distributions
import breeze.stats.distributions._
val normal = Gaussian(0.0, 1.0)
val samples = normal.sample(1000)
val density = normal.pdf(0.0)
println(s"Normal distribution density at 0: $density")
println(s"Sample size: ${samples.length}")
// Histogram
val hist = breeze.stats.hist(DenseVector(samples: _*), 20)
println(s"Histogram bins: ${hist.hist.length}")
Optimization
import breeze.optimize._
import breeze.linalg._
// Objective function definition (Rosenbrock function)
def rosenbrock(x: DenseVector[Double]): Double = {
val a = 1.0
val b = 100.0
(a - x(0)) * (a - x(0)) + b * (x(1) - x(0) * x(0)) * (x(1) - x(0) * x(0))
}
// Gradient definition
def rosenbrockGradient(x: DenseVector[Double]): DenseVector[Double] = {
val a = 1.0
val b = 100.0
DenseVector(
-2.0 * (a - x(0)) - 4.0 * b * x(0) * (x(1) - x(0) * x(0)),
2.0 * b * (x(1) - x(0) * x(0))
)
}
// Optimization problem setup
val f = new DiffFunction[DenseVector[Double]] {
def calculate(x: DenseVector[Double]) = (rosenbrock(x), rosenbrockGradient(x))
}
// Initial guess
val initialGuess = DenseVector(-1.0, 1.0)
// LBFGS optimization
val lbfgs = new LBFGS[DenseVector[Double]]()
val result = lbfgs.minimize(f, initialGuess)
println(s"Optimization result: $result")
println(s"Minimum value: ${rosenbrock(result)}")
Machine Learning Example
import breeze.linalg._
import breeze.numerics._
import breeze.stats._
// Linear regression implementation
object LinearRegression {
def fit(X: DenseMatrix[Double], y: DenseVector[Double]): DenseVector[Double] = {
// Normal equation: β = (X^T X)^(-1) X^T y
val Xt = X.t
val XtX = Xt * X
val XtXinv = inv(XtX)
val Xty = Xt * y
XtXinv * Xty
}
def predict(X: DenseMatrix[Double], weights: DenseVector[Double]): DenseVector[Double] = {
X * weights
}
}
// Sample data generation
val n = 100
val X = DenseMatrix.rand(n, 3)
val trueWeights = DenseVector(2.0, -1.0, 0.5)
val noise = DenseVector.rand(n) * 0.1
val y = X * trueWeights + noise
// Linear regression execution
val estimatedWeights = LinearRegression.fit(X, y)
val predictions = LinearRegression.predict(X, estimatedWeights)
// Evaluation
val mse = mean((y - predictions) :* (y - predictions))
println(s"True weights: $trueWeights")
println(s"Estimated weights: $estimatedWeights")
println(s"Mean Squared Error: $mse")
Signal Processing
import breeze.linalg._
import breeze.numerics._
import breeze.signal._
// Signal generation
val t = linspace(0.0, 1.0, 1000)
val frequency1 = 50.0
val frequency2 = 120.0
val signal = sin(2.0 * math.Pi * frequency1 * t) + 0.5 * sin(2.0 * math.Pi * frequency2 * t)
// Noise addition
val noise = DenseVector.rand(signal.length) * 0.2 - 0.1
val noisySignal = signal + noise
// Filtering
val windowSize = 10
val filtered = DenseVector.zeros[Double](signal.length)
for (i <- windowSize until signal.length - windowSize) {
filtered(i) = mean(noisySignal(i - windowSize to i + windowSize))
}
// FFT
val fft = fourierTr(noisySignal)
val magnitude = fft.map(c => math.sqrt(c.real * c.real + c.imag * c.imag))
println(s"Original signal length: ${signal.length}")
println(s"Filtered signal length: ${filtered.length}")
println(s"FFT magnitude length: ${magnitude.length}")
Image Processing Example
import breeze.linalg._
import breeze.numerics._
// Image data representation (grayscale)
val width = 100
val height = 100
val image = DenseMatrix.rand(height, width)
// Gaussian filter creation
def gaussianKernel(size: Int, sigma: Double): DenseMatrix[Double] = {
val kernel = DenseMatrix.zeros[Double](size, size)
val center = size / 2
var sum = 0.0
for (i <- 0 until size; j <- 0 until size) {
val x = i - center
val y = j - center
val value = math.exp(-(x * x + y * y) / (2 * sigma * sigma))
kernel(i, j) = value
sum += value
}
kernel / sum // Normalization
}
// Convolution operation
def convolve(image: DenseMatrix[Double], kernel: DenseMatrix[Double]): DenseMatrix[Double] = {
val result = DenseMatrix.zeros[Double](image.rows, image.cols)
val kSize = kernel.rows
val kCenter = kSize / 2
for (i <- kCenter until image.rows - kCenter;
j <- kCenter until image.cols - kCenter) {
var sum = 0.0
for (ki <- 0 until kSize; kj <- 0 until kSize) {
sum += image(i - kCenter + ki, j - kCenter + kj) * kernel(ki, kj)
}
result(i, j) = sum
}
result
}
// Filter application
val gaussianFilter = gaussianKernel(5, 1.0)
val filteredImage = convolve(image, gaussianFilter)
println(s"Original image size: ${image.rows} x ${image.cols}")
println(s"Filtered image size: ${filteredImage.rows} x ${filteredImage.cols}")
Latest Trends (2025)
- Scala 3 Support: Integration with latest Scala features
- GPU Acceleration: Speed enhancement with CUDA support
- Spark 3.5 Integration: Enhanced distributed machine learning
- Deep Learning Support: Integration with TensorFlow Scala
- WebAssembly: Numerical computing in browsers
Summary
Breeze continues to be used as the foundation of the Scala scientific computing ecosystem in 2025. Through type safety, NumPy-like API, and high-speed operations, it supports numerical computing and machine learning development in Scala. Combined with Spark MLlib, it is utilized for distributed machine learning processing and is standardly adopted in Scala educational curricula at universities and research institutions.