# postgresql-replicant A PostgreSQL streaming logical replication client for Haskell. React to database changes in real-time. This library is currently _EXPERIMENTAL_ and is not ready for production yet. Developers are encouraged to test out the library, add issues, contribute and make it awesome. For more, read the find [documentation](https://agentultra.github.io/postgresql-replicant "postgresql-replicant documentation") ## Setup In order to use a logical replication client you need to set the `wal_level` in Postgres to `logical`. You can set that, for example, in your `postgres.conf` file: wal_level = logical Then restart your server, log in as `postgres` or another super user and check: SHOW wal_level It should show _logical_. You will also need a user who is [allowed](https://www.postgresql.org/docs/9.1/sql-alterrole.html "PostgreSQL user role documentation") to use replication features. Then add a database and your user with the `REPLICATION` trait to the database, grant all on it if you need to. We haven't added authorization support yet to the library so make sure in `pg_hba.conf` you `trust` local connections: host all all 127.0.0.1/32 trust Caveat emptor. You will also need to install the `wal2json` plugin for your PostgreSQL server. For example in Ubuntu-like distributions you can: $ sudo apt install postgresql-12-wal2json Assuming you have installed PostgreSQL from the package repositories. You can then setup a basic "hello world" program using this library: ``` haskell module Main where import Control.Exception import Database.PostgreSQL.Replicant main :: IO () main = do let settings = PgSettings "my-user" (Just "my-password") "my-database" "localhost" "5432" "testing" withLogicalStream settings $ \change -> do print change return $ changeNextLSN change `catch` \err -> print err ``` Which should connect, create a `testing` replication slot, and start sending your callback the changes to your database as they arrive. The type of the callback to `withLogicalStream` is: Change -> IO LSN Note the return type. The *LSN* or _Log Sequence Number_ is used by replicant after running your callback to update its internal stream state and tell PostgreSQL that you have consumed this change. This means that if the connection fails and replicant reconnects to the _same_ slot it will restart the stream at the last *LSN* replicant was able to successfully send to the server. All you have to do is return `changeNextLSN change` at the end of your callback. ### WAL Output Plugins Presently we only support the `wal2json` WAL output plugin and only the v1 format. Support for the v2 format is planned as are more output plugins. ## Example Program Included is a simple program, `replicant-example` which should help you test your connection settings and make sure that you can connect and receive WAL changes. You can change the connection settings through environment variables: PG_USER=myuser PG_DATABASE=mydb replicant-example The configuration settings are: - `PG_USER`: The replication user to connect as - `PG_DATABASE`: The database to connect to - `PG_HOST`: The host name of the database to connect to - `PG_PORT`: The port to connect on - `PG_SLOTNAME`: The replication slot to create or connect to - `PG_UPDATEDELAY`: The frequency (in _ms_) to update PostgreSQL on the stream status ## Profiling To enable profile builds, see `stack.yaml` Build `stack build` Then run `stack exec --profile -- replicant-example +RTS -p` (make sure to set the right database configs) Run `bin/bench.sh my-database create` if this is the first time running the bench. This will create a table called, `replicant_profile`. Run `bin/bench.sh my-database`, wait a bit, cancel it. It will insert a bunch of rows into `replicant_profile`. Feel free to run other SQL statements that modify the table while the bench is running. Cancel the script when you're done. Then you can generate reports, see the [GHC Manual on Profiling](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html# "GHC Manual on Profiling") ## FAQ ### Why not use LISTEN/NOTIFY You don't have to set up triggers for each table with logical replication. You also don't have the message size limitation. That limitation often forces clients to perform a query for each message to fetch the data and can use more than one connection. We only use one connection. ### What are the trade offs? This library uses logical replication slots. If your program loses the connection to the database and cannot reconnect the server will hold onto WAL files until your program can reconnect and resume the replication stream. On a busy database, depending on your configuration, this can eat up disk space quickly. It's a good idea to monitor your replication slots. If `postgresql-replicant` cannot reconnect to the slot and resume the stream you may need to have a strategy for dropping the slot and recreating it once conditions return to normal. Have a strategy.