dataframe-persistent
Persistent database integration for the Haskell DataFrame library.
Overview
This package provides seamless integration between the dataframe library and the persistent database library, allowing you to:
- Load database entities directly into DataFrames
- Perform DataFrame operations on database data
- Save DataFrame results back to the database
- Work with type-safe database entities
Installation
Add to your package.yaml:
dependencies:
- dataframe ^>= 0.3
- dataframe-persistent ^>= 0.1
- persistent >= 2.14
- persistent-sqlite >= 2.13 # or your preferred backend
Or to your .cabal file:
build-depends:
dataframe ^>= 0.3,
dataframe-persistent ^>= 0.1,
persistent >= 2.14,
persistent-sqlite >= 2.13
Quick Start
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE UndecidableInstances #-}
import Control.Monad.IO.Class (liftIO)
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH
import qualified DataFrame as DF
import qualified DataFrame.Functions as F
import DataFrame.IO.Persistent
import DataFrame.IO.Persistent.TH
import qualified Data.Vector as V
-- Define your entities
share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
TestUser
name Text
age Int
active Bool
deriving Show Eq
|]
-- Derive DataFrame instances
$(derivePersistentDataFrame ''TestUser)
-- Example usage
main :: IO ()
main = runSqlite "example.db" $ do
-- Run migrations
runMigration migrateAll
-- Insert some test data
_ <- insert $ TestUser "Alice" 25 True
_ <- insert $ TestUser "Bob" 30 False
_ <- insert $ TestUser "Charlie" 35 True
-- Load from database
allUsersDF <- fromPersistent @TestUser []
liftIO $ putStrLn $ "Loaded " ++ show (nRows allUsersDF) ++ " users"
-- Load with filters
activeUsersDF <- fromPersistent @TestUser [TestUserActive ==. True]
liftIO $ putStrLn $ "Active users: " ++ show (nRows activeUsersDF)
-- Process with DataFrame operations
let youngUsers = DF.filter @Int "age" (< 30) allUsersDF
ages = V.toList $ DF.columnAsVector @Int "age" youngUsers
liftIO $ putStrLn $ "Young user ages: " ++ show ages
-- Custom configuration
let config = defaultPersistentConfig
{ pcIdColumnName = "user_id"
, pcIncludeId = True
}
customDF <- fromPersistentWith @TestUser config []
liftIO $ putStrLn $ "Columns with custom config: " ++ show (DF.columnNames customDF)
Features
- Type-safe conversions between Persistent entities and DataFrames
- Template Haskell support for automatic instance generation
- Configurable loading with batch size and column selection
- Column name cleaning - removes table prefixes automatically (e.g.,
test_user_name → name)
- Type preservation - maintains proper types for Text, Int, Bool, Day, etc.
- Empty DataFrame support - preserves column structure even with no data
- Support for all Persistent backends (SQLite, PostgreSQL, MySQL, etc.)
Configuration Options
data PersistentConfig = PersistentConfig
{ pcBatchSize :: Int -- Number of records to fetch at once (default: 10000)
, pcIncludeId :: Bool -- Whether to include entity ID as column (default: True)
, pcIdColumnName :: Text -- Name for the ID column (default: "id")
}
Advanced Usage
You can also extract fields from individual entities:
let user = TestUser "Alice" 25 True
columns = persistFieldsToColumns user
-- Result: [("name", SomeColumn ["Alice"]), ("age", SomeColumn [25]), ("active", SomeColumn [True])]
Working with Vector Data
-- Extract specific column data
let names = V.toList $ DF.columnAsVector @Text "name" df
ages = V.toList $ DF.columnAsVector @Int "age" df
activeFlags = V.toList $ DF.columnAsVector @Bool "active" df
Examples
For comprehensive examples and test cases, see:
Status
This package is actively maintained and tested. Current test coverage includes:
- ✅ Entity loading with and without filters
- ✅ Custom configuration options
- ✅ DataFrame operations on Persistent data
- ✅ Empty result set handling
- ✅ Field extraction utilities
- ✅ Multi-table relationships
Documentation
For detailed documentation, see:
License
GPL-3.0-or-later (same as the main dataframe package)