hoppy-docs-0.2.1: C++ FFI generator - Documentation

Safe HaskellNone
LanguageHaskell2010

Foreign.Hoppy.Documentation.UsersGuide

Contents

Description

The Hoppy User's Guide

Synopsis

Overview

Hoppy is a foreign function interface (FFI) generator for interfacing Haskell with C++. It lets developers specify C++ interfaces in pure Haskell, and generates code to expose that functionality to Haskell. Hoppy is made up of a few different packages that provide interface definition data structures and code generators, some runtime support for Haskell bindings, and interface definitions for the C++ standard library.

Bindings using Hoppy have three parts:

  • A Haskell generator program (in /generator) that knows the interface definition and generates code for the next two parts.
  • A C++ library (in /cpp) that gets compiled into a shared object containing the C++ half of the bindings.
  • A Haskell library (in /hs) that links against the C++ library and exposes the bindings.

The path names are suggested subdirectories of a project, and are used in this document, but are not required. Only the latter two items need to be packaged and distributed to users of the binding (plus Hoppy itself which is a dependency of the generated bindings).

Getting started

This section is for getting out of the gate running.

Project setup

To bind to a C++ library, first the binding author writes a generator program (/generator) in Haskell. This program should define the complete C++ interface that is to be exposed. The binding author also writes a Main.hs file for invoking the generator (usually deferring to Foreign.Hoppy.Generator.Main). If necessary, she should also write wrappers for C++ things that she doesn't want to expose directly (in /cpp).

Then, her build process should perform the following steps:

  1. Compile the generator (/generator).
  2. Run the generator to create the C++ and Haskell sides of the bindings in /cpp and /hs/src respectively. See the documentation for run for how to invoke a generator.
  3. Compile the C++ side of the bindings into a shared object. Make sure to compile with the version of the C++ standard that matches what the generator was run with (see activeCppVersion).
  4. Compile the Haskell side of the bindings, linking with the C++ library.

For this last step, the .cabal file in /hs should have

extra-libraries: foo

to link against a shared object libfoo.so. If this library is not on the system's library search path, then she will need to specify --extra-lib-dirs=.../cpp to the cabal configure for /hs.

The unit tests provide some simple examples of this setup.

Concepts

A complete C++ API is specified using Haskell data structures in Foreign.Hoppy.Generator.Spec. At the top level is the Interface type. An interface contains Modules which correspond to a portion of functionality of the interface (collections of classes, functions, files, etc.). Functionality can be grouped arbitrarily into modules and doesn't have to follow the structure of existing C++ files. Modules contain Exports which refer to concrete things that provide bindings. Binding definitions take advantage of Haskell's laziness, and can be highly circular, a simple case being a class that includes a method that makes use of the class in its parameter or return types.

Each export has an external name that uniquely identifies it within an interface. This name can be different from the name of the C++ entity the export is referring to. An external name is munged by the code generators and must be a valid identifier in all languages a set of bindings will use, so it is restricted to characters in the range [a-zA-Z0-9_], and must start with an alphabetic character. Character case in external names will be preserved as much as possible in generated code, although case conversions are sometimes necessary (e.g. Haskell requiring identifiers to begin with upper or lower case characters).

C++ bindings for exportable things usually need #includes in order to access those things. This is done with Include and Reqs. All exportable things have an instance of HasReqs and addReqIncludes can be used to add includes.

C++ identifiers are represented by the Identifier data type and support basic template syntax (no metaprogramming).

All C++ types are represented with the Type data type, values of which are in the Foreign.Hoppy.Generator.Types module. This includes primitive numeric types, object types, function types, void, the const qualifier, etc. When passing values back and forth between C++ and Haskell, generally, primitive types are converted to equivalent types on both ends, and pointer types in C++ are represented by corresponding pointer types in Haskell.

Raw object types (not pointers or references, just the by-value object types, i.e. objT) are treated differently. When an object is taken or returned by value, this typically indicates a lightweight object that is easy to copy, so Hoppy will attempt to convert the C++ object to a native Haskell object, if a Haskell type is defined for the class. Other options are available, such as having objects be handed off to a foreign garbage collector. See ClassConversion for more on object conversions.

Generators

This section describes the behaviour of the code generators. The code generators live at Foreign.Hoppy.Generator.Language.<language>. The top-level module for a language is internal to Hoppy and contains the bulk of the generator. General submodules expose functionality that can control generator behaviour.

C++

The C++ code generator generates C++ bindings that other languages' bindings will link against. This generator lives in Foreign.Hoppy.Generator.Language.Cpp, with internal parts in Foreign.Hoppy.Generator.Language.Cpp.Internal.

Module structure

Generated modules consist of a source and a header file. The source file contains all of the bindings for foreign languages to make use of. The header file contains things that may be depended on from other generated modules. Currently this consists only of generated callback classes.

Cycles between generated C++ modules are not supported. This can currently only happen because of #include cycles involving callbacks, since callbacks are the only Exports that can be referenced by other generated C++ code.

Object passing

ptrT :: Type -> Type
refT :: Type -> Type
objT :: Class -> Type
constT :: Type -> Type

We consider all of the following cases as passing an object, both into and out of C++, and independently, as an argument and as a return value:

  1. objT _
  2. refT (constT (objT _))
  3. refT (objT _)
  4. ptrT (constT (objT _))
  5. ptrT (objT _)

The first is equivalent to constT (objT _). When passing an argument from a foreign language to C++, the first two are equivalent, and it's recommended to use the first, shorter form (T and const T& are functionally equivalent in C++, and are the same as far as what values foreign bindings will accept).

When passing any of the above types as an argument in either direction, an object is passed between C++ and a foreign language via a pointer. Cases 1, 2, and 4 are passed as const pointers. For a foreign language passing a objT _ to C++, this means converting a foreign value to a temporary C++ object. Passing a objT _ argument into or out of C++, the caller always owns the object.

When returning an object, again, pointers are always what is passed across the language boundary in either direction. Returning a objT _ transfers ownership: a C++ function returning a objT _ will copy the object to the heap, and return a pointer to the object which the caller owns; a callback returning a objT _ will internally create a C++ object from a foreign value, and hand that object off to the C++ side (which will return it and free the temporary).

Object lifetimes can be managed by a foreign language's garbage collector. toGcT is a special type that is only allowed in certain forms, and only when passing a value from C++ to a foreign language (i.e. returning from a C++ function, or C++ invoking a foreign callback), to put the object under the collector's management. Only object types are allowed:

  1. toGcT (objT cls)
  2. toGcT (refT (constT (objT cls)))
  3. toGcT (refT (objT cls))
  4. toGcT (ptrT (constT (objT cls)))
  5. toGcT (ptrT (objT cls))

Cases 2-5 are straightforward: the existing object is given to the collector. Case 1 without the toGcT would cause the object to be converted, but instead here the (temporary) object gets copied to the heap, and a managed pointer to the heap object is returned. Case 1 is useful when you want to pass a handle that has a non-trivial C++ representation (so you don't define a conversion for it), but it's still a temporary that you don't want users to have to delete manually.

Objects are always managed manually unless given to a garbage collector. In particular, constructors always return unmanaged pointers. When a managed pointer is passed into C++, that it is managed is lost in the FFI conversion, and if this pointer is then passed back into the foreign language, it will arrive in an unmanaged state (although the object is still managed, and it should not be assigned to the collector a second time).

Callbacks

data Callback = Callback ExtName [Type] Type ...  -- Parameter and return types.

callbackT :: Callback -> Type

We want to call some foreign code from C++. What C++ type do we associate with such an entry point? (Both the C++ and foreign sides of the callback will need to perform en-/decoding of arguments/return values.)

Function pointer: Create a function pointer to a foreign wrapper which does en-/decoding on the foreign side. But then we need to wrap this in a C++ function (pointer) which does the C++-side conversions. Function pointers can't close over variables, so this doesn't work.

C++ functor: Create a class G that takes a foreign function pointer and implements operator(), performing the necessary conversions around invoking the pointer. In the event that the function pointer is dynamically allocated (as in Haskell), then this class also ties the lifetime of the function pointer to the lifetime of the class. But this would cause problems for passing this object around by value, so instead we make G non-copyable and non-assignable, allocate our G instance on the heap, and create a second class F that holds a shared_ptr<G> and whose operator() calls through to G.

This way, the existance of the F and G objects are invisible to the foreign language, and (for now) passing these callbacks back to the foreign language is not supported.

When a binding is declared to take a callback type, the generated foreign side of the binding will take a foreign function (the callback) with foreign-side types, and use a function (Haskell: callbackName) generated for the callback type to wrap the callback in a foreign function that does argument decoding and return value encoding: this wrapped function will have C-side types. The binding will then create a G object (above) for this wrapped function (Haskell: using callbackName'), and pass a G pointer into the C side of the binding. The binding will decode this C pointer by wrapping it in a temporary F object, and passing that to the C++ function. The C++ code is free to copy this F object as much as it likes. If it doesn't store a copy somewhere before returning, then the when the temporary F object is destructed, the G object will get deleted.

Haskell

The Haskell code generator lives in Foreign.Hoppy.Generator.Language.Haskell, with internal parts in Foreign.Hoppy.Generator.Language.Haskell.Internal.

Central to generated Haskell bindings is the idea of type sidedness and the HsTypeSide enum. When a value is passed to or from C++, it needs to be converted so that the receiving language knows what to do with it. The C++ side of bindings just exchanges C types across the language boundary and does not do conversions, so it is up to the Haskell side to do so. Internally, the Haskell generator refers to types exchanged with C++ as C-side types, and types the bindings exchange with user Haskell code as Haskell-side types. These are both Haskell types! The terminology is overlapped a bit but generally, type or C++ type refers to a Type, and in the context of the Haskell generator, C-side or Haskell-side apply to a HsType, calculated from a Type and a HsTypeSide using cppTypeToHsTypeAndUse. For many primitive C++ types, the C-side and Haskell-side types are the same.

Module structure

The result of generating a Hoppy module is a single Haskell module that contains bindings for everything exported from the Hoppy module. The Haskell module name is the concatenation of the interface's interfaceHaskellModuleBase and the module's moduleHaskellName.

The contents of the module depends on the what Exports the module has.

Variable exports

A Variable is exposed in Haskell as a getter function and a setter function. For a variable with external name foo with Haskell-side type Bar, the following functions are created:

foo_get :: IO Bar
foo_set :: Bar -> IO ()

Enum exports

A CppEnum is exposed in Haskell as an enumerable data type. For an enum defined as follows:

alignment :: CppEnum
alignment =
  makeEnum (ident "Alignment") Nothing
  [ (0, ["left", "align"])
  , (1, ["center", "align"])
  , (2, ["right", "align"])
  ]

the following data type will be generated:

data Alignment =
    Alignment_LeftAlign
  | Alignment_CenterAlign
  | Alignment_RightAlign

with instances for Bounded, Enum, Eq, Ord, and Show.

Bitspace exports

Bitspaces, unlike enums, materialize in Haskell using a single data constructor and bindings for values, rather than multiple data constructors. A bitspace declaration such as

formatFlags :: Bitspace
formatFlags =
  makeBitspace (toExtName "Format") intT
  [ (1, ["format", "letter"])
  , (2, ["format", "jpeg"])
  , (4, ["format", "c"])
  ]

will generate the following:

newtype Format

instance Bits Format
instance Bounded Format
instance Eq Format
instance Ord Format
instance Show Format

fromFormat :: Format -> CInt

class IsFormat a where
  toFormat :: a -> Format

instance IsFormat CInt

format_FormatLetter :: Format
format_FormatJpeg :: Format
format_FormatC :: Format

Function exports

For a Function export, a single Haskell function will be generated named after the external name of the export. The function will take the Haskell-side types of its arguments, and return the Haskell-side type of its return type. If the function is Nonpure then it will return a value in IO, otherwise it will return a pure value using unsafePerformIO.

For most Types, the corresponding Haskell parameter type will be a concrete type. This differs for objects (and references and pointers to them), where typeclass constraints are used to implement C++ parameter type contravariance. See the section on Haskell object passing for more details.

Callback exports

Despite needing to be exported as with other Export choices, Callbacks do not expose anything to the user. Instead, they provide machinery for functions to be able to use callbackT.

Class exports

Classes expose quite a few things to the user. Take a simple class definition such as:

compressor :: Class

zipper :: Class
zipper =
  makeClass (ident "Zipper") Nothing [compressor]
  [ mkCtor "new" [] ]
  [ mkStaticMethod "canZip" [] boolT
  , mkConstMethod "hasZipped" [] voidT
  , mkMethod "zip" [] voidT
  ]

Let's focus on zipper. Two data types will be generated that represent const and non-const pointers to Zipper objects:

data Zipper
data ZipperConst

Internally, these types hold Ptrs, and they can be converted to Ptrs with toPtr (though this conversion is lossy for pointers managed by the garbage collector, see the section on object passing).

Several typeclass instances are generated for both types:

  • Eq, Ord, and Show compare and render based on the underlying pointer address.
  • CppPtr and Deletable instances provide object management.
  • A single Decodable (Ptr Zipper) Zipper instance is generated for converting raw Ptrs into object handles. This is the opposite operation of toPtr.
  • If the class -- Zipper in this case -- has an operator= method that takes either a objT zipper or a refT (constT (objT zipper)), then an instance ZipperValue a => Assignable Zipper a is generated to allow assigning of general zipper-like values to Zipper objects; see below for an explanation of ZipperValue. This instance is for the non-const Zipper only.

There will also be some typeclasses generated, for types that represent Zipper objects:

class ZipperValue a where
  withZipperPtr :: a -> (ZipperConst -> IO b) -> IO b

instance CompressorPtrConst a => ZipperValue a

class CompressorPtrConst a => ZipperPtrConst a where
  toZipperConst :: a -> ZipperConst

class (ZipperPtrConst a, CompressorPtr a) => ZipperPtr a where
  toZipper :: a -> Zipper

instance ZipperPtrConst ZipperConst
instance ZipperPtr Zipper
... instances required by superclasses ...

Ignoring the first typeclass and instance for a moment, the two Ptr typeclasses represent const and non-const pointers respectively, and allow upcasting pointer types. The const typeclass has as superclasses the const typeclasses for all of the C++ class's superclasses (or just CppPtr if this list is empty). The non-const typeclass has as superclasses the non-const typeclasses for all of the C++ class's superclasses, plus the current const typeclass. Instances will be generated for all of the appropriate typeclasses for Zipper and ZipperConst, all the way up to CppPtr.

The ZipperValue class represents general Zipper values, of which pointers are one type (hence the first instance above). Values of these types can be converted to a temporary const pointer. If Zipper were to have a native Haskell type (see classHaskellConversion), then an additional instance would be generated for that type. This second instance in this case is overlapping, and the above instance is overlappable. These typeclasses allow for mixing pointer, reference, and object types when calling C++ functions.

For downcasting, separate const and non-const typeclasses are generated with instances for all direct and indirect superclasses of Zipper:

-- Enables downcasting from any non-const superclass of Zipper.
class ZipperSuper a where
  downToZipper :: a -> Zipper

-- Enables downcasting from any const superclass of Zipper.
class ZipperSuperConst a where
  downToZipperConst :: a -> ZipperConst

instance ZipperSuper Compressor
... instances for other non-const superclasses ...
instance ZipperSuperConst CompressorConst
... instances for other const superclasses ...

The downcast functions are wrappers around dynamic_cast, and will return a null pointer if the argument is not a supertype of the target type.

Finally, Haskell functions are generated for all of the class's constructors and methods. These work much the same as function exports, but non-static methods take a this object as the first argument. Const methods take a ZipperValue on the assumption that it's safe to create a temporary C++ object from a Haskell value if necessary to call a const method. Non-const methods take a ZipperPtr, since it's potentially a mistake to perform side-effects on a temporary object that is thrown away immediately.

zipper_new :: IO Zipper
zipper_canZip :: IO Bool
zipper_hasZipped :: ZipperValue this => this -> IO Bool
zipper_zip :: ZipperPtr this => this -> IO Bool

Module dependencies

While generated C++ modules get their objects from #includes of underlying headers and only depend on each other in the case of callbacks, Haskell modules depend on each other any time something in one references something in another (somewhat mirroring the dependency graph of the binding definitions), so cycles are much more common (for example, when a C++ interface uses a forward class declaration to break an #include cycle). Fortunately, GHC supports dependency cycles, so Hoppy automatically detects and breaks cycles with the use of .hs-boot files. The boot files contain everything that could be used from another generated module, for example class casting functions needed to coerce pointers to the right type for a foreign call, or enum data declarations. The result of this cycle breaking is deterministic: for each non-trivial strongly connected component in the module dependency graph, .hs-boot files are generated for all modules, and all .hs files' dependencies within the SCC import .hs-boot files.

Object passing

All of the comments about argument passing for the C++ generator apply here. The following types are used for passing arguments from Haskell to C++:

 C++ type   | Pass over FFI | HsCSide  | HsHsSide
------------+---------------+----------+-----------------
 Foo        | Foo const*    | FooConst | FooValue a => a
 Foo const& | Foo const*    | FooConst | FooValue a => a
 Foo&       | Foo*          | Foo      | FooPtr a => a
 Foo const* | Foo const*    | FooConst | FooValue a => a
 Foo*       | Foo*          | Foo      | FooPtr a => a

FooPtr contains pointers to nonconst Foo (and all subclasses). FooValue contains pointers to const and nonconst Foo (and all subclasses), as well as the convertible Haskell type, if there is one. The rationale is that FooValue is used where the callee will not modify the argument, so both a const pointer to an existing object, and a fresh const pointer to a temporary on the case of passing a Foo, are fine. Because functions taking Foo& and Foo* may modify their argument, we disallow passing a temporary converted from a Haskell value implicitly; withCppObj can be used for this.

For values returned from C++, and for arguments and return values in callbacks, the HsCSide column above is the exposed type; polymorphism as in the HsHsSide column is not provided.

Object pointer types in Haskell hide whether they are managed (garbage collected) or unmanaged pointers in their runtime representation. The APIs that bindings expose to Haskell users should generally not require them to be concerned about object lifetimes, and also having separate data types for managed pointers would balloon the size of bindings. Unmanaged objects can be converted to managed objects with toGc; after calling this function, the value it returns should always be used in place of any existing pointers.