% graphviz - FAQ % Ivan Lazar Miljenovic Fortuitously Anticipated Queries (FAQ) ====================================== Note that to distinguish it from [Graphviz], the library shall be henceforth referred to as _graphviz_. Graphviz vs _graphviz_ ---------------------- ### What is the difference between Graphviz and _graphviz_? ### [Graphviz] is an open source library and collection of utility programs using that library to visualise [graphs] (which are specified using the [Dot] language). _graphviz_ is a library for the purely functional programming language [Haskell] that provides "bindings" to Graphviz's programs. It does so by allowing programmers to specify the layout of the graph and then converts that to Dot code before calling the appropriate program to perform the visualisation. [Graphviz]: http://www.graphviz.org/ [graphs]: http://en.wikipedia.org/wiki/Graph_theory [Dot]: http://www.graphviz.org/doc/info/lang.html [Haskell]: http://haskell.org/ ### Why should I use graphviz over one of the other Haskell Graphviz libraries? ### Various Haskell libraries have support for Graphviz to one extent or another; however _graphviz_ has the most comprehensive support available out of all of them: * Two different methods of specifying Dot graphs: 1. Strict, which matches the layout of `dot -Tcanon`. 2. Liberal, which allows statements to be in any order. There are also conversion functions between the two of them. * The ability to parse and generate most aspects of Dot [syntax] and [attributes]. This includes taking into account escaping and quoting rules where applicable. [syntax]: http://graphviz.org/doc/info/lang.html [attributes]: http://graphviz.org/doc/info/attrs.html * The ability to use a custom node type for Dot graphs. * Support for the all five layout algorithm programs and all specified [output formats]. [output formats]: http://www.graphviz.org/doc/info/output.html * Functions to convert [FGL] graphs to and from the internal Dot representations. [FGL]: http://web.engr.oregonstate.edu/~erwig/fgl/haskell/ * The ability to augment Dot and FGL graphs with positioning information by round-trip passing through Graphviz. ### Is the API of _graphviz_ stable? ### For the most part, yes: the only items that are likely to change in the future are those with bugs/errors or if a radically better way of doing things is found. For most uses, however, the API should not change for the foreseeable future. Note that _graphviz_'s version numbers follow the [package versioning policy]; this means that you can immediately tell when the API has had a backwards-incompatible change by comparing the first two elements of the version. However, these changes won't affect most users. [package versioning policy]: http://www.haskell.org/haskellwiki/Package_versioning_policy ### What aspects of Dot syntax and attributes are covered? ### It's easier to state which aspects of Dot [syntax] and [attributes] _aren't_ covered: #### Overall syntax items not covered #### * Cannot specify a sub-graph as an end point in an edge; * Comments, pre-processor lines and split lines are (currently) not supported within HTML-like labels. * _graphviz_ is currently locale-specific: Dot graphs are meant to be encoded in UTF-8 by default unless specified to be Latin-1, but this isn't verified or checked. Dot code that is parsed in is assumed to be in UTF-8; in future this will be enforced (both for printing and parsing purposes). * Graphviz is more liberal in accepting "invalid" values (e.g. accepting a floating-point value when only integer values are meant to be accepted); _graphviz_ is more strict in this aspect (and will indeed throw an exception if it cannot parse something properly). * No extensions (e.g. postscript-specific attributes) are available. #### Attribute and value items not covered #### * The global `orientation` attribute is not defined; however its behaviour is duplicated by the `rotate` attribute. * The deprecated `overlap` algorithms have not been defined. * `pointf` and `point` values have been combined into one datatype; however the optional `!` and third value for `point` values is not accepted. * Only polygon-based `shape`s are available. * The default `layersep` is used when printing and parsing `layerRange` and `layerList` values; this will be fixed in a future release (when state-based printing and parsing is implemented). * The `/ssss/yyyy` and `//yyyy` forms of printing and parsing `color`s are not yet available. #### Available items of note #### There are a few items of note that are available that are worthy of special note (as they may not be immediately obvious from the generated documentation): * _graphviz_ is able to parse (but not print) the following special aspects of specifying edges in Dot code: - The `node:port` method of specifying of head/tail `portPos` values. - Stating multiple edges with common interior nodes (e.g. `a -> b -> c`). - Stating edges with a grouping of nodes (e.g. `a -> {b c}`). * Sub-graphs are specified as being clusters when the subgraph name starts with either `"cluster"` or `"cluster_"`; note that this prefix is removed when determining the subraph's name for the internal datatypes. * Anonymous subgraphs (where not even the `subgraph` keyword is specified) are also parseable. * HTML-like and record labels are available, and feature proper escaping/unescaping when printing/parsing. Getting _graphviz_ and more documentation ----------------------------------------- ### Where can I obtain _graphviz_? ### The best place to get _graphviz_ is from its [HackageDB] page. [HackageDB]: http://hackage.haskell.org/package/graphviz ### Where can I find the API documentation for _graphviz_? ### Also on its [HackageDB] page. ### Is it safe to install and use _graphviz_ from its darcs repository? ### No; unlike other projects I make no guarantees as to the stability of the live version of _graphviz_. Whilst the [darcs] [repository] is _usually_ stable, it's often in a state of flux and at times patches that break the repository are recorded (when it's simpler/cleaner to break one patch into several smaller patches). [darcs]: http://darcs.net/ [repository]: http://code.haskell.org/graphviz/ ### How is _graphviz_ licensed? ### _graphviz_ is licensed under a [3-Clause BSD License] (note that the ColorBrewer Color Schemes found in `Data.GraphViz.Attributes.Colors` are covered under [their own license](http://graphviz.org/doc/info/colors.html#brewer_license)). [3-Clause BSD License]: http://www.opensource.org/licenses/bsd-license.php Simplistically, this means that you can do whatever you want with _graphviz_ as long as you cite both myself and [Matthew Sackman] (the original author) as being the authors of _graphviz_. However, I would appreciate at least an [email] letting me know how _graphviz_ is being used. [Matthew Sackman]: http://www.wellquite.org/ [email]: mailto:Ivan.Miljenovic+graphviz@gmail.com ### Where can I find more information on _graphviz_? ### From its [home page]. [home page]: http://projects.haskell.org/graphviz/ ### Are there any tutorials on how to use _graphviz_? ### There will be soon. ### What other packages use _graphviz_? ### This is a list of all known packages that use _graphviz_: if you know of any others please let me know and I'll add it to the list. * [Graphalyze](http://hackage.haskell.org/package/Graphalyze) * [SourceGraph](http://hackage.haskell.org/package/SourceGraph) ### What is the history of _graphviz_? ### _graphviz_ was originally written by [Matthew Sackman] (if you want his reasons for doing so, you'll have to ask him yourself) with the first known release being on 10 July, 2008. In 2008 I (Ivan Miljenovic) needed a library that provided bindings to Graphviz with clustering support; at the time _graphviz_ was the most fully featured and closest to what I wanted, so I submitted a patch that provided support for both clustering and undirected graphs. In April 2009, Matthew wanted to step down from maintaining _graphviz_ and asked if I wanted to take over. Since then the library has been almost completely re-written with greatly improved coverage of the Dot language and extra features. However, the original outline of the library still remains. Using _graphviz_ ---------------- ### Can I start using _graphviz_ without knowing anything about Graphviz? ### Unfortunately, no: the layout and design of _graphviz_ is heavily based upon the Dot language and the various [attributes] that Graphviz supports. As such, you can't just suffice on the documentation available in _graphviz_ (unless you're doing something _very_ simplistic). ### Can I just use _graphviz_ without reading its documentation? ### You should _at least_ read the various messages about possible ambiguities, etc. at the top of each module and for the attributes you use before you use _graphviz_. ### Do I need to have Graphviz installed to use _graphviz_? ### Technically, no if you're only dealing with the Dot language aspects. However, usage of the functions in the Commands module, or the augmentation of pretty-printing functions in the GraphViz module _do_ require Graphviz to be installed. ### Why didn't you use FFI to bind to the Graphviz library? ### Because I just kept working where [Matthew Sackman] left off and it was already using Graphviz's tools rather than the actual library. However, most other language bindings (for Python, Perl, etc.) seem to do the same: generate Dot code and pass that to the relevant tool. This, however, does provide a fortunate side effect where the ability to print and parse Dot code means that _graphviz_ can be used for more than just visualising graphs created solely in Haskell: it can also import pre-defined graphs, or else generate Dot code for use with other tools. ### What's the difference between DotGraph and GDotGraph? ### The layout of `DotGraph` matches the output of `dot -Tcanon`. It has a fixed layout which makes it easier to reason about and get sub-components. `GDotGraph` on the other hand is more liberal in its layout, allowing you to put statements in any order you please. This is useful in cases where you want to use the common Graphviz "hack" of specifying global attributes that don't apply to sub-graphs _after_ the sub-graphs in question. ### What's the best way to parse Dot code? ### In both cases below, you should use the `parseDotGraph` function to parse the Dot code: this is because it will strip out comments and pre-processor lines and join together split lines (if any of these remain the parser will fail). Also, if you are not sure what the type of the nodes are, use either String or else the `GraphID` type as it explicitly caters for both Strings and numbers (whereas just assuming it being a String will result in numbers being stored internally as a String). If you can, first run `dot -Tcanon` on the Dot code and parse it as a `DotGraph` value. This is because `DotGraph` types are easier to deal with. If, however, this isn't possible (e.g. it uses an image that isn't in the current working directory) then use the `GDotGraph` type. ### There are too many attributes!!! Which ones should I use? ### The following attributes are easy to use and recommended: * `ArrowHead` and `ArrowTail` (for directed graphs) to set the styles of the ends of edges: note that in Graphviz parlance, "Head" refers to the end node and "Tail" refers to the start node of the edge (see below). * When wanting to use different colours, use the following criteria to pick the correct attribute. Note: for the first two, you should also have `SItem Filled []` set as one of the `Style` values for that item. - `BgColor` to set the background colour of a graph/cluster. - `FillColor` to set the background colour for a node. - `Color` to set the colour of an edge; if you supply more than one value then the edge is drawn using parallel splines/lines (one per colour in the list). - `PenColor` to set the colour of the bounding box for a cluster. When choosing a `Color` value for one of the above, it is better to use one of the `X11Color` values (note: these are **not** the same as the standard X11 colours) or - if you have to - one of the manual colours over a `BrewerColor`, as they come under a different license and have no real standard on what the values are. * `Label`: `StrLabel` can be used for both nodes and edges; the other two only for nodes. `RecordLabel` and `HtmlLabel` provide ways of having more fine-grained control over a node's layout with different sub-components, etc.; in most cases these won't be needed. * `Rank`: this lets you control relative placement of sub-graphs and clusters. * `Shape`: Use `Record` and `MRecord` for `RecordLabel` labels; feel free to use any other ones at any time (though you probably want to use `PlainText` for `HtmlLabel` labels). * `Style`: use this to set line types, etc. for nodes and edges. You should **not** use a `DD` (device-dependent) value. The following attributes are **not** recommended for use: * `Charset`: the only accepted options are `"UTF-8"` and `"Latin-1"`, but in future _graphviz_ will not contain this attribute and will only allow UTF-8 usage. * `Color` for anything except edge colours. * `ColorScheme`: just stick with X11 colours. * `Comment`: pretty useless, but will interfere with the augmentation functions (since they use the `Comment` attribute to distinguish between multiple edges). ### Can I use any attribute wherever I want? ### No: attributes are all defined in one big datatype for the sake of simplicity, but not all attributes are valid in all places. Read the documentation (either in Graphviz or _graphviz_) to determine which is suitable where. ### How can I use _graphviz_ to visualise non-FGL graphs? ### At the moment, you unfortunately have to write your own manual conversion functions (see `graphToDot`, etc. in the GraphViz module for ideas on how to do this). In future, it should be possible to convert any graph-like datatype into a `DotGraph` (this requires me to go write another library first...). ### How can I use/process multiple graphs like Graphviz does? ### At one stage, _graphviz_ supported dealing with lists of `DotGraph`s; however, it was found to be faster to deal with each graph individually rather than try to get Graphviz to deal with them all in one go. In future, once the problem causing this has been tracked down and fixed this feature will be returned. ### How can I use custom datatypes for node IDs? ### The important thing here is to ensure that your custom datatype has defined instances of `PrintDot` and `ParseDot`. Probably the easiest way of doing this is to have functions that convert between your type and `String` and let graphviz determine how to print and parse those. Here is an example of a more difficult type that should be printed like "1: Foo": ~~~~~~~~~~~~~~~~~~~~ {.haskell} data MyType = MyType String Int instance PrintDot MyType where unqtDot (MyType s i) = unqtDot i <> colon <+> unqtDot s -- We have a space in there, so we need quotes. toDot = doubleQuotes . unqtDot instance ParseDot MyType where parseUnqt = do i <- parseUnqt character ':' whitespace s <- parseUnqt return $ MyType s i -- Has at least one space, so it will be quoted. parse = quotedParse parseUnqt ~~~~~~~~~~~~~~~~~~~~ Things to note from this example: * Whilst `PrintDot` and `ParseDot` have default definitions for `toDot` and `parse`, they assume the datatype doesn't need quotes; as such if the value will [need quoting](http://www.graphviz.org/doc/info/lang.html), then you should do so explicitly. * It is better to use the `PrintDot` instances for common types such as `Int` and `String` rather than using the pretty-printers inbuilt conversion functions (`int`, `text`, etc.) to ensure that quotations, etc. are dealt with correctly. * Be as liberal as you can when printing, especially with whitespace: when printing only one space is used, yet when parsing we use the `whitespace` parsing combinator that will parse all whitespace characters (but it must consume _at least_ one; there is a variant that does not need to parse any). However, we're not being so liberal as to allow parsing of newline characters (for which there is a separate parsing combinator). ### When parsing Dot code, do I have to worry about the case? ### Not at all: _graphviz_'s parser is case-insensitive; however, the correct case is checked first so there is a slight degradation in performance when the wrong case is used. ### How do I set portPos values for nodes in edges? ### Graphviz allows you to specify edges such as `from:a -> to:b` where the nodes "from" and "to" are defined with either `RecordLabel` or `HtmlLabel` labels and have different sections; the edge is then drawn from the "a" section of the "from" node to the "b" section of the "to" node. Whilst _graphviz_ can parse this, you can't define this yourself: instead, do it the manual way: ~~~~~~~~~~~~~~~~~~~~ {.haskell} DotEdge "from" "to" True [ TailPort (LabelledPort (PN "a") Nothing) , HeadPort (LabelledPort (PN "b") Nothing) ] ~~~~~~~~~~~~~~~~~~~~ Note where `TailPort` and `HeadPort` are used; the next question explains this. ### Is there anything else I should know? ### A few other things of note that you should know about: * For an edge `a -> b`, Graphviz terms "a" to be the _tail_ node and "b" to be the _head_ node. * When creating `GraphID` values for the graphs and sub-graphs, you should ensure that they won't clash with any of the `nodeID` values when printed to avoid possible problems. * It is a good idea to have unique IDs for sub-graphs to ensure that global attributes are applied only to items in that sub-graph and so that clusters aren't combined (it took me a _long_ time to find out that this was the case). * You should specify an ID for the overall graph when outputting to a format such as SVG as it becomes the title of that image. * It is possible to specify a graph as being directed/undirected but having individual edges being the opposite; care should be taken to avoid this (this possible issue may be resolved in future). * Graphviz allows a node to be "defined" twice with different attributes; in practice they are combined into one node. Running Dot code through `dot -Tcanon` before parsing removes this problem. * Several attributes are defined with taking a list of items; all of these assume that the provided lists are non-empty (sub-values are a different story). * If a particular Dot graph is not parseable, the parser throws an error rather than failing gracefully. Design Decisions ---------------- ### Why does _graphviz_ use Polyparse rather than Parsec? ### Short answer: because _graphviz_ was already using [Polyparse] when I started working on it (and I hadn't done any parsing before so I had no preference either way). [Polyparse]: http://www.cs.york.ac.uk/fp/polyparse/ Longer answer: Polyparse has several advantages I feel over [Parsec]: * Simpler types. * Avoids the whole "but Parsec-3 is slower than Parsec-2" debate (with its associated baggage/problems). * Few inbuilt combinators: since there is no inbuilt `character` parsing combinator, there are no problems with _graphviz_ using its own case-less one. * [Easier backtracking](http://www.cs.york.ac.uk/fp/polyparse/#how) [Parsec]: http://hackage.haskell.org/package/parsec ### Why do you have two different representations of Dot graphs? ### _graphviz_ has [two different representations](#whats-the-difference-between-dotgraph-and-gdotgraph) of Dot graphs. Apart from the reasons given before, `DotGraph` was the original representation, whereas `GDotGraph` was only introduced in the 2999.8.0.0 release. Note, however, that I was thinking of adding something like `GDotGraph` back around the time of the [2999.0.0.0 release](http://www.haskell.org/pipermail/haskell-cafe/2009-July/064436.html), yet [people didn't like the idea](http://www.haskell.org/pipermail/haskell-cafe/2009-July/064442.html). As such, `GDotGraph` is there if anyone needs/wants to use it, but usage of `DotGraph` is recommended/preferred. ### Why are only FGL graphs supported? ### Love them or hate them, [FGL] currently provides the best graph datatype and library available for Haskell at this time. As such, if any one graph type had to be chosen to have conversion functions written for it then FGL is the best option. Furthermore, I needed FGL graph support (which is the much more important reason!). ### Why are the version numbers so high? ### To make sure the latest release has the highest version number: Matthew Sackman originally made releases with date-based versioning, but when I switched to using the [package versioning policy] I had to change this. I could have started with 2010.x.y.z or so, but at the time I had initial hopes of introducing compatibility with other graphs (not just [FGL] ones) soon and wanted to make that the 3000.0.0.0 release; however that has not yet come to pass. ### Why do you use the American spelling of colour in _graphviz_? ### Because that's how Graphviz spells it, and I was following upstream to avoid confusion. Bugs, Feature Requests and Development -------------------------------------- ### Do you have any future plans for _graphviz_? ### Yes, I do! See the TODO file for more information. ### Does _graphviz_ have a test suite? ### Yes, there is: to get it, you have to build it with the `test` flag enabled; for example: ~~~~~~~~~~~~~~~~~~~~ {.bash} cabal install graphviz --flags=test ~~~~~~~~~~~~~~~~~~~~ Then run the `graphviz-testsuite` executable. This test suite uses [QuickCheck] to ensure that _graphviz_ can parse the Dot code it generates (as well as a few other things). Note that it isn't perfect: there are no guarantees that the Dot graphs that are generated are indeed valid, and those more extensive tests are not yet available. [QuickCheck]: http://hackage.haskell.org/package/QuickCheck Furthermore, you can do more controlled testing to try and track down the source of a bug as the above flag will also expose several testing modules which give you access to the various tests used as well as the `Arbitrary` instances for use with [QuickCheck]. For proper testing of real-life Dot code, there is also the `TestParsing.hs` script that comes in the _graphviz_ tarball (but is not installed). Once you have _graphviz_ installed you can just run this script, passing it any files containing Dot graphs you wish to test. It will attempt to parse each Dot graph as a `GDotGraph`, and then test to see if the canonicalised form is parseable as a `DotGraph`. ### I've found a bug! ### Oh-oh... please [email] me the specifics of what you were doing (including the Dot file in question if it's a parsing problem) and I'll get right on it. ### I have a feature request. ### Is it in the TODO? If not, [email] me and I'll consider implementing it (depending on time and how well I think it will fit in the overall library). ### I want to help out with developing _graphviz_. ### Great! Whether you have a specific feature in mind or want to help clear the TODO list, please [email] me to check with what you're doing (who knows, I could already be implementing that very feature). Once we've discussed what you're going to do, first get yourself a copy of the darcs repository: ~~~~~~~~~~~~~~~~~~~~ {.bash} darcs get --lazy http://code.haskell.org/ ~~~~~~~~~~~~~~~~~~~~ Once you've made your changes, make sure you build and run the testsuite (and ensure it passes!). Then record the patch[es] and `darcs send` them. I'll then review them and if I'm happy with them, I'll apply them. ### What is the purpose of the AttributeGenerator.hs file? ### Graphviz has a large number of attributes. Rather than try to edit everything manually each time I want to change how I use the large `Attribute` datatype, the AttributeGenerator script generates the datatype, instances, etc. for me.