symbolic-regression: Symbolic Regression in Haskell

[ data, library, mit ] [ Propose Tags ] [ Report a vulnerability ]

Modules

  • Symbolic
    • Symbolic.Regression

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.0.1
Change log CHANGELOG.md
Dependencies attoparsec (>=0.14.4 && <0.15), attoparsec-expr (>=0.1.1.2 && <0.2), base (>=4 && <5), binary (>=0.8.9.1 && <0.9), bytestring (>=0.11 && <0.13), containers (>=0.6.7 && <0.8), dataframe (>=0.4 && <0.5), directory, dlist (>=1.0 && <1.1), exceptions (>=0.10.7 && <0.11), filepath (>=1.4.0.0 && <1.6), hashable (>=1.4.4.0 && <1.6), ieee754 (>=0.8.0 && <0.9), lens (>=5.2.3 && <5.4), list-shuffle (>=1.0.0.1 && <1.1), massiv (>=1.0.4.1 && <1.1), mtl (>=2.2 && <2.4), optparse-applicative (>=0.17 && <0.19), random (>=1.2 && <1.4), scheduler (>=2.0.0.1 && <3), split (>=0.2.5 && <0.3), srtree (>=2.0.1.5 && <3), statistics (>=0.16.2.1 && <0.17), text (>=2.0 && <3), time, transformers (>=0.6.1.0 && <0.7), unliftio (>=0.2.10 && <1), unliftio-core (>=0.2.1 && <1), unordered-containers (>=0.2 && <0.3), vector (>=0.12 && <0.14), zlib (>=0.6.3 && <0.8) [details]
License MIT
Author DataHaskell
Maintainer mschavinda@gmail.com
Uploaded by mchav at 2026-01-20T16:56:04Z
Category Data
Home page https://github.com/DataHaskell/symbolic-regression
Bug tracker https://github.com/DataHaskell/symbolic-regression/issues
Distributions
Downloads 0 total (0 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
All reported builds failed as of 2026-01-20 [all 2 reports]

Readme for symbolic-regression-0.1.0.1

[back to package description]

symbolic-regression

A Haskell library (based on eggp which is in turn based on srtree) for symbolic regression on DataFrames. Automatically discover mathematical expressions that best fit your data using genetic programming with e-graph optimization.

Overview

symbolic-regression integrates symbolic regression capabilities into a DataFrame workflow. Given a target column and a dataset, it evolves mathematical expressions that predict the target variable, returning a Pareto front of expressions trading off complexity and accuracy.

Quick Start

ghci> import qualified DataFrame as D
ghci> import DataFrame.Functions ((.=))
ghci> import Symbolic.Regression

-- Load your data
ghci> df <- D.readParquet "./data/mtcars.parquet"

-- Run symbolic regression to predict 'mpg'
-- NOTE: ALL COLUMNS MUST BE CONVERTED TO DOUBLE FIRST
-- e.g df' = D.derive "some_column" (F.toDouble (F.col @Int "some_column")) df
-- Symbolic regression will by default only use the double columns
-- otherwise.
ghci> exprs <- fit defaultRegressionConfig mpg df

-- View discovered expressions (Pareto front from simplest to most complex)
ghci> map D.prettyPrint exprs
-- [ qsec,
-- , 57.33 / wt
-- , 10.75 + (1557.67 / disp)]

-- Create named expressions that we'll use in a dataframe.
ghci> levels = zipWith (.=) ["level_1", "level_2", "level_3"] exprs

-- Show the various predictions in our dataframe.
ghci> D.deriveMany levels df

-- Or pick the best one
ghci> D.derive "prediction" (last exprs) df

Configuration

Customize the search with RegressionConfig:

data RegressionConfig = RegressionConfig
    { generations              :: Int      -- Number of evolutionary generations (default: 100)
    , maxExpressionSize        :: Int      -- Maximum tree depth/complexity (default: 5)
    , numFolds                 :: Int      -- Cross-validation folds (default: 3)
    , showTrace                :: Bool     -- Print progress during evolution (default: True)
    , lossFunction             :: Distribution  -- MSE, Gaussian, Poisson, etc. (default: MSE)
    , numOptimisationIterations :: Int     -- Parameter optimization iterations (default: 30)
    , numParameterRetries      :: Int      -- Retries for parameter fitting (default: 2)
    , populationSize           :: Int      -- Population size (default: 100)
    , tournamentSize           :: Int      -- Tournament selection size (default: 3)
    , crossoverProbability     :: Double   -- Crossover rate (default: 0.95)
    , mutationProbability      :: Double   -- Mutation rate (default: 0.3)
    , unaryFunctions           :: [...]    -- Unary operations to include
    , binaryFunctions          :: [...]    -- Binary operations to include
    , numParams                :: Int      -- Number of parameters (-1 for auto)
    , generational             :: Bool     -- Use generational replacement (default: False)
    , simplifyExpressions      :: Bool     -- Simplify output expressions (default: True)
    , maxTime                  :: Int      -- Time limit in seconds (-1 for none)
    , dumpTo                   :: String   -- Save e-graph state to file
    , loadFrom                 :: String   -- Load e-graph state from file
    }

Example: Custom Configuration

myConfig :: RegressionConfig
myConfig = defaultRegressionConfig
    { generations = 200
    , maxExpressionSize = 7
    , populationSize = 200
    }

exprs <- fit myConfig targetColumn df

Output

fit returns a list of expressions representing the Pareto front, ordered by complexity (simplest first). Each expression:

  • Is a valid Expr Double that can be used with DataFrame operations
  • Represents a different trade-off between simplicity and accuracy
  • Has optimized numerical constants

How It Works

  1. Genetic Programming: Evolves a population of expression trees through selection, crossover, and mutation
  2. E-graph Optimization: Uses equality saturation to discover equivalent expressions and simplify
  3. Parameter Optimization: Fits numerical constants using nonlinear optimization
  4. Pareto Selection: Returns expressions across the complexity-accuracy frontier

Dependencies

System dependencies

To install symbolic-regression you'll need:

  • libz: sudo apt install libz-dev
  • libnlopt: sudo apt install libnlopt-dev
  • libgmp: sudo apt install libgmp-dev