MicroHs: A compiler for a subset of Haskell

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

A compiler for a subset of Haskell. The compiler translates to combinators and can compile itself.


[Skip to Readme]

Properties

Versions 0.8, 0.8, 0.8.1.0, 0.8.2.0, 0.8.2.1, 0.8.2.2, 0.8.3.0, 0.8.4.0, 0.8.5.0, 0.9.1.0, 0.9.2.0, 0.9.3.0, 0.9.5.0, 0.9.8.0
Change log None available
Dependencies base (>=4.10 && <4.20), containers (>=0.5 && <0.8), deepseq (>=1.1 && <1.6), directory (>=1.2 && <1.5), ghc-prim (>=0.5 && <0.12), mtl (>=2.0 && <2.4), pretty (>=1.0 && <1.2), process (>=1.6 && <1.8), time (>=1.1 && <1.15) [details]
License Apache-2.0
Copyright 2023 Lennart Augustsson
Author lennart@augustsson.net
Maintainer lennart@augustsson.net
Category language
Source repo head: git clone https://github.com/augustss/MicroHs
Uploaded by LennartAugustsson at 2023-11-22T19:28:58Z

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for MicroHs-0.8

[back to package description]

Micro Haskell

This directory contains an implementation of a small subset of Haskell. It uses combinators for the runtime execution.

The compiler can compile itself.

Compiling MicroHs

There are two different ways to compile MicroHs

These different ways of compiling need slightly different imports etc. To accomodate this each source file is preprocessed for the first target. When compiling with GHC the string --X is removed from the source file. This way anything special needed with GHC is just treated as comments by mhs.

Compiling MicroHs is really best done using make, but there is also a MicroHs.cabal file for use with cabal. This only builds what corresponds to the first target.

Also note that there is no need to have a Haskell compiler to run MicroHs. All you need is a C compiler, and MicroHs can bootstrap, given the included combinator file.

To install mhs use make install. You also need to set the environment variable MHSDIR.

Language

The language is an extended subset of Haskell-98.

Differences:

Example

The file Example.hs contains the following:

module Example(main) where
import Prelude

fac :: Int -> Int
fac 0 = 1
fac n = n * fac(n-1)

main :: IO ()
main = do
  let
    rs = map fac [1,2,3,10]
  putStrLn "Some factorials"
  print rs

First, make sure the compiler is built by doing make. Then compile the file by bin/mhs Example -oEx which produces Ex. Finally, run the binary file by ./Ex. This should produce

Some factorials
[1,2,6,3628800]

Libraries

There are a number of libraries that have some of the standard Haskell functions. But in general, the Prelude contains less.

Types

There are some primitive data types, e.g Int, IO, Ptr, and Double. These are known by the runtime system and various primitive operations work on them. The function type, ->, is (of course) also built in.

All other types are defined with the language. They are converted to lambda terms using the Scott encoding. The runtime system knows how lists and booleans are encoded.

Compiler

The compiler is written in Micro Haskell. It takes a name of a module and compiles to a target (see below). This module should contain the function main of type IO () and it will be the entry point to the program.

Compiler flags

With the -v flag the processing time for each module is reported. E.g.

importing done MicroHs.Exp, 284ms (91 + 193)

which means that processing the module MicroHs.Exp took 284ms, with parsing taking 91ms and typecheck&desugar taking 193ms.

Environment variables

Compiler modules

Interactive mode

If no module name is given the compiler enters interactive mode. You can enter expressions to be evaluated, or top level definitions (including import). Simple line editing is available.

All definitions are saved in the file Interactive.hs and all input lines as saved in .mhsi. The latter file is read on startup so the command history is persisted.

Available commands:

NOTE When you import a module it is cached. If the file changes and you import it again it will not reload. You can use :clear (or :reload) to get back to an empty cache. This is a bug.

Runtime

The runtime system is written in C and is in src/runtime/eval.c. It uses combinators for handling variables, and has primitive operations for built in types and for executing IO operations. There is a also a simple mark-scan garbage collector. The runtime system is written in a reasonably portable C code.

Runtime flags

Runtime flags are given between the flags +RTS and -RTS. Between those the runtime decodes the flags, everything else is available to the running program.

For example, bin/mhseval +RTS -H1M -v -RTS hello runs out.comb and the program gets the argument hello, whereas the runtime system sets the heap to 1M cells and is verbose.

FFI

MicroHs supports calling C functions, but all such functions must be in a table in the runtime system.

Features

The runtime system can serialize and deserialize any expression and keep its graph structure (sharing and cycles). The only exceptions to this are C pointers file handles, which cannot be serialized (except for stdin, stdout, and stderr).

Memory layout

Memory allocation is based on cells. Each cell has room for two pointers (i.e., two words, typically 16 bytes), so it can represent an application node. One bit is used to indicate if the cell is an application or something else. If it is something else one word is a tag indicating what it is, e.g., a combinator or an integer. The second word is then used to store any payload, e.g., the number itself for an integer node.

Memory allocation has a bitmap with one bit per cell. Allocating a cell consists of finding the next free cell using the bitmap, and then marking it as used. The garbage collector first clears the bitmap and then (recursively) marks every used cell in the bitmap. There is no explicit scan phase since that is baked into the allocation. Allocation is fast assuming the CPU has some kind of FindFirstSet instruction.

It is possible to use smaller cells by using 32 bit "pointers" instead of 64 bit pointers. This has a performance penalty, though.

Portability

The C code for the evaluator does not use any special features, and should be portable to many platforms. It has mostly been test with MacOS and Linux, and somewhat with Windows.

The code has only been tested on 64 bit platforms, so again, there are lurking problems with other word sizes, but they should be easy to fix.

The src/runtime/ directory contains configuration files for different platform. Edit src/runtime/eval.c to #include the right one.

Bootstrapping

The compiler can compile itself. To replace bin/mhs with a new version, do make bootstrap. This will recompile the compiler twice and compare the outputs to make sure the new compiler still works.

FAQ