Emping User Guide, Version 0.2

Author: Hans van Thiel, May 2007
email: hthiel.char@zonnet.nl

1. Overview

1.1. What

Emping is a utility that derives heuristic rules from nominal data. Nominal data are qualitative and unordered, as in:

Class is actually an ordinal attribute, but when the order is disregarded, it is nominal.

Heuristic rules consist of attribute values (predicates) that together imply another attribute value. For example:

Color:green and Proposition 1:True is Class:B

Heuristic rules are purely empirical, with no foundation in a theory or model. The input of Emping is just a table of nominal facts. The user has to select which attribute is to be the consequent. Then Emping derives all shortest rules which, in the table, imply the values of the selected consequent. Each reduced rule is a generalization of one or more original rules, and therefore reduced rules may imply or be equivalant to others. If this is the case, these logical dependencies are also derived.

1.2. How

Emping reads a file in a comma seperated format (.csv) as produced by the Open Office Calc spreadsheet, and returns the results as .csv files that can be read by OO Calc.

You start the utility from a terminal and provide the file name as a command line parameter. For example:

$ ./emping QuinLanFacts.csv

Emping then asks you to supply the name of the attribute that is to be the consequent of the rules. The reduced normal form is saved in a file with prefix "RNF_" and the name of the attribute. If there are logical dependencies, these will be stored in a file with prefix "DPT_". Finally, original rules may be ambiguous, that is, the same antecedent may imply two or more different values. The reduction algorithm also works if ambiguous rules are present, but Emping informs you which rules are ambiguous in a third file with prefix "AMB_".

The reduced normal form file and, if present, the others, can then be loaded into OO Calc.

1.3. More

More about the principles on which emping is based can be found in the white paper, Deriving Heuristic Rules from Facts , which is included in the distribution (pdf).

2. Example

2.1. Step 1

Enter the data in Open Office Calc as shown:

As you can see, the table can have empty lines and does not have to start in the first column. But:

2.2. Step 2

Save the table in Text CSV format. Choose double quotes as the text delimiter (default). Whole numbers will be stored without delimiters, and emping will use them after checking if they are all digits (no negatives, no fractions).

2.3. Step 3

Open the terminal and type emping, followed by the filename of the table (including the path). You may have to precede the command with the directory, which contains the emping executable. For example, if it is in your working directory:

$ ./emping (followed by the file name)

2.4. Step 4

The program will now ask for the attribute which is to be predicted. This can be any one of the names in the table header.

There are no ambiguous rules for Fishing but the reduced normal form has dependency trees as well as unconnected rules.

2.5. Step 5

View the reduced normal form in file RNF_Fishing.csv in OO Calc.

and all branches of the dependency trees, together with the singletons, in DPT_Fishing.csv

Note: For an example with ambiguities and equals (implications both ways) choose Windy as the consequent attribute.

3. Miscellaneous

The emping utility is written in Haskell, and has been developed and tested on the Fedora Core 6 Linux platform, using the Haskell tools which are available as FC6 packages. To use it you must compile the package with the Haskell compiler on your platform or build it with Cabal. See the README file for details.

(Potential) users will probably be somewhat wary, in particular if their data is critical. Keep in mind that emping derives the rules, which is the hard part. Checking the results for correctness is easy.

Emping stands for empirical reasoning or the Indonesian snack with that name.