Readme =============================================================================== This file will attempt to detail the assumptions and workflow of the project. There is a [ticket system](http://ideastest.science.uu.nl/trac) to keep track of what has been done and what still needs to be done. Installation ------------------------------------------------------------------------------- ### GHC We develop in a bare Haskell Platform environment. Stack is not used at the moment, due to the complexity of importing local packages that are not yet in Hackage. At the moment, the code needs to work with our Ubuntu 16.04 LTS (Xenial) server, which uses GHC 7.10.3. On that distribution, it should be enough to do: sudo apt install haskell-platform{,-doc,-prof} On other OSes, the easiest way to get this specific version is perhaps to use the [generic installer](https://www.haskell.org/platform/): wget -O /tmp/hp.tar.gz \ https://www.haskell.org/platform/download/7.10.3/haskell-platform-7.10.3-unknown-posix-x86_64.tar.gz tar xf /tmp/hp.tar.gz sudo ./install-haskell-platform.sh # We also need to change some flags sed -i 's/\(.*"C compiler flags",\s*"\)\(.*\)/\1-fno-PIE \2/g ;s/\(.*"C compiler link flags",\s*"\)\(.*\)/\1-no-pie \2/g ;s/\(.*"ld flags",\s*"\)\(.*\)/\1-no-pie \2/g' \ /usr/local/haskell/ghc-7.10.3-x86_64/lib/ghc-7.10.3/settings ### Database Since the database is SQLite3, we need the SQLite binary and libraries. On Debian-based distributions, this amounts to: sudo apt install sqlite3 libsqlite3-dev On Windows, you can get the required executables and DLLs at [sqlite.org](http://sqlite.org/download.html). The initial live database can later be built with the `database-builder.exe` binary, like so: ./database-builder.exe -o advise-me.db ### Web server To run the binary locally, you can use any web server with CGI support. We can do the following to use Apache to serve CGI scripts from the `/usr/lib/cgi-bin` directory on Debian-based distributions: sudo apt install apache2 sudo a2enmod cgid For other OSes, check this [guide](https://httpd.apache.org/docs/2.4/howto/cgi.html). ### Haskell environment The source code of the project is contained in Git and Subversion repositories. To obtain it: git clone \ https://github.com/ideas-edu/ideas cd ideas; make src/Ideas/Main/Revision.hs; cd - svn checkout \ https://ideastest.science.uu.nl/svn/ideas/Tutors/math-types svn checkout \ https://ideastest.science.uu.nl/svn/ideas/Tutors/Advise-Me/trunk Install the sandbox: cd trunk cabal sandbox init cabal sandbox add-source ../ideas cabal sandbox add-source ../math-types cabal install \ --only-dependencies \ --enable-tests \ --enable-executable-profiling \ --enable-library-profiling cabal configure \ --enable-tests \ --enable-executable-profiling \ --enable-coverage We use `make`, because there are many different files and interdependencies. Reading the `Makefile` should give an idea of the workflow. It is also recommended to make a `config.mk` file, overriding the variables in the `Makefile` so that they point to the correct directories: tee config.mk << EOF IDEAS_DIR = ../ideas/src MATHTYPES_DIR = ../math-types/src CGI_BIN = /usr/lib/cgi-bin EOF ### Bayesian networks To *create* the Bayesian networks, [Genie](https://www.bayesfusion.com/genie/) is used. We used to interface with the SMILE library for *using* the networks, but that is now done in Haskell itself by transforming the original `.xdsl` files into a Haskell interface. See `network-builder.exe`. ### Compiling Now, we can compile the binaries. `make processing` should take care of everything for us, but of course the binaries can also be created by `cabal` separately. Note that there is an `xlsx` cabal flag that is on by default, because building the `xlsx` library (used for reading human assessments) is not straightforward on every machine. If you find that the `xlsx` library is causing issues and you do not need its functionality, do `cabal configure -flags="-xlsx"` before building. Project structure ------------------------------------------------------------------------------- The following directories are important to know. - `app/`: Haskell executables and scripts. - `src/`: Haskell sources to the Advise-Me library. - `tests/`: Haskell sources to the testing suite. - `test-data/`: Test input requests for the testing suite and shell scripts to send test input to the server. - `hpc-*`: Haskell code coverage reports as generated by the recipe in the Makefile. - `pilots/`: - `raw/`: Databases, mostly untouched as they were collected during pilot or evaluation studies. - `processed/`: Databases that are created from the raw data after the fact, by processing it in various ways using `database-builder.exe`. The `Makefile` contains recipes to create these files. - `assessments/`: Excel spreadsheets that mirror the names in the `processed/` directory. These spreadsheets contain evaluations by humans of the same data. They can be used to evaluate or debug the application, using `report.exe`, or to change or annotate the processed data. There are also documents in this directory that are non-machine readable, containing remarks of IDEAS' output by a human examiner. - `regressions/`: This directory contains `.exp` files that concatenates the expected output of the processed databases. This allows for a rudimentary regression test, using `diff`. - `networks/`: Bayesian networks created in Genie, and a supporting XML file containing translations of the labels. Apart from the main `advise-me.cgi` binary, there are a couple of auxiliary binaries to use: - The `advise-me.cgi` binary provides the service: you provide `input` via a POST or GET request, and it will respond with the information you requested. There are also additional commands that can be given to make it do other things, like rerunning or reporting. Some of these are deprecated, and they aren't documented well. - `network-builder.exe` builds, given an `.xdsl` file from `networks/`, the interface file necessary for running that network in our Haskell environment. Unfortunately, it cannot itself be actually built: it depends on the Advise-Me library, which itself depends on the files that it is supposed to generate! From `cabal-install` version 2, I believe that we could use its autogeneration facilities. For now, as a crutch, we run `app/NetworkBuilder.hs` as a script — see the `Makefile`. - The `database-builder.exe` binary is a tool to create the initial database and process existing databases. It gives us the ability to reuse input data collected from a previous run and generate new output for it, as well as annotate the database with information tables. As there are many flags and switches, call it with `--help` for more info. To inspect the resulting databases or to examine statistics, there are multiple options. - `advise-me-admin.cgi` provides a web interface to inspect the databases and report on statistics. - `report.exe` can be used offline to compare assessments from IDEAS in the database against human assessments with the `humanvsmachine` subcommand. It can also count how often evidence occurs with the `priors` subcommand. Finally, it can generate a legacy HTML page with diagnostics info, similar to the overview in `advise-me-admin.cgi`. Testing ------------------------------------------------------------------------------- Tests that are implemented now relate exclusively to finding the evidence. Other tests are mostly non-existent, so functionality may break without warning. (For more fine-grained information on how well the evidence matches our expectations, see `report.exe`.) Rudimentary regression tests can be performed with a `diff`, simply to check whether the output has changed since the last update. `make regressions` does this for you. `cabal test` runs the `tasty` test suite with particular example requests, to check if they still find the evidence we expect. Whenever you fix a specific bug, please add a test along with the relevant request XML. Coverage ------------------------------------------------------------------------------- To inspect code coverage, do `cabal clean` and `cabal configure --enable-coverage` and rebuild the binaries that you want to test. After running the binaries, `.tix` files will be created (that you can optionally combine with `hpc sum *.tix`). From the `tix` and `mix` files, you can generate a HTML coverage index or a statistics report. For example: hpc report \ --hpcdir=dist/hpc/vanilla/mix/Advise-me-0.1 \ --hpcdir=dist/hpc/vanilla/mix/database-builder.exe \ database-builder.exe.tix Profiling ------------------------------------------------------------------------------- If you have installed the libraries with `--enable-library-profiling` and configured cabal with `--enable-library-profiling --enable-executable-profiling`, then you can build a profiling version of the main CGI binary. The `Makefile` contains a recipe for a PDF report.