CPSA 2.0 User Guide

The Cryptographic Protocol Shapes Analyzer (CPSA) attempts to enumerate all essentially different executions possible for a cryptographic protocol. We call them the shapes of the protocol. Naturally occurring protocols have only finitely many, indeed very few shapes. Authentication and secrecy properties are easy to determine from them, as are attacks and anomalies.

For each input problem, the CPSA program is given some initial behavior, and it discovers what shapes are compatible with it. Normally, the initial behavior is from the point of view of one participant. The analysis reveals what the other participants must have done, given the participant's view. The search is based on a high-level algorithm shown to be complete, i.e. every shape can in fact be found in a finite number of steps. The implemented algorithm is not known to be complete, but appears to come close. The shapes analysis is performed within a pure Dolev-Yao model, and the analyzer handles message terms compatible with the signature of an Order Sorted Algebra. In this release, the analyzer, cpsa handles two algebras, the Basic Cryptographic Algebra and the Diffie-Hellman Algebra. The source release contains an interface designed to ease the task of adding support for other message term algebras.

The analyzer is designed to work well with other tools. S-expressions are used for both input and output. The analyzer reads all the problems in its input, writes out the solution to each problem, and then exits. This release contains five tools. The cpsagraph program provides a visualization of answers using Scalable Vector Graphics (SVG). The cpsashapes program removes intermediate results from analyzer runs making the shapes easy to identify. The cpsaannotations program uses protocol annotations to annotate shapes and generate protocol soundness obligations. The cpsaparameters program detects some specification errors by performing a base term usage analysis. The cpsapp program pretty prints its input using the CPSA specific algorithm.

The input syntax is essentially the same as the output syntax. A Lisp aware editor will pretty print input, and the output is pretty printed. We use Emacs' Scheme mode to prepare input. This document describes the syntax, but provides little assistance for its interpretation. See the CPSA Tutorial document for this information.

The typical pattern of usage is to enter the set of problems to be analyzed into a file, in this example, prob.scm, analyze the problems, and if something interesting is produced, visualize the answers.

$ cpsa +RTS -M512m -RTS -o prob.txt prob.scm
$ cpsagraph -o prob.xml prob.txt
$ firefox -remote "openFile(`pwd`/prob.xml)"

Often a summary of the analysis is more enlightening.

$ cpsashapes -o prob_shapes.txt prob.txt
$ cpsagraph -o prob_shapes.xml prob_shapes.txt
$ firefox -remote "openFile(`pwd`/prob_shapes.xml)"

The distribution comes with the file cpsa.mk for inclusion into your makefile. A sample makefile follows. If you cut-and-paste from a browser window, but sure to convert the leading spaces in the last line to a tab character.

CPSAFLAGS = +RTS -M512m -RTS

include cpsa.mk

all:    $(patsubst %.scm,%_shapes.xml,$(wildcard *.scm))

clean:
        -rm *.txt *.xml

For platforms without GNU make, the Haskell program Make.hs can be loaded into a Haskell interpreter and perform a similar function. Users are expected to modify the program to suit their needs.

Syntax

The syntax is extensible. Association lists in which the key is always a symbol are allowed at various points in the grammar. Key-value pairs with unrecognized keys are ignored, and are available for use by other tools. On output, unrecognized key-value pairs are preserved when printing protocols, but elided when printing skeletons, with the exception of the comment key.

The input is a sequence of protocol definitions and problem statements. A problem statement describes the initial behavior as a skeleton. The grammar for a protocol follows.

PROTOCOL   ::= (defprotocol ID ALG ROLE+ PROT-ALIST)
ID         ::= SYMBOL
ALG        ::= SYMBOL
ROLE       ::= (defrole ID VARS TRACE ROLE-ALIST)
VARS       ::= (vars DECL*)
DECL       ::= (ID+ SORT)
TRACE      ::= (trace EVENT+)
EVENT      ::= (send TERM) | (recv TERM)
ROLE-ALIST ::= (non-orig HT-TERM*) ROLE-ALIST
            |  (uniq-orig TERM*) ROLE-ALIST | ...
HT-TERM    ::= TERM | (INT TERM)
PROT-ALIST ::= ...

The protocol ID is a symbol that names the protocol, and the role ID is a symbol that names the role. The ALG symbol identifies the algebra used by the protocol. For the Basic Cryptographic Algebra, the symbol is basic. For the Diffie-Hellman Algebra, the symbol is diffie-hellman. The var form contains symbols that declare the sort of the variables used in this role. The set of sort symbols is algebra specific. The protocol association list has no predefined keys, while the role association list has two. The value associated with both of these keys must be base terms in the algebra. Each non-orig term must not be carried by any event in the role's trace, but must occur in some term. Each uniq-orig term must originate in the role's trace. Every non-base variable must be acquired by the role's trace.

The structure of sorts and terms in the Basic Cryptographic Algebra follows.

SORT :: = text | data | name | skey | akey | mesg
TERM ::= ID | STRING | (cat TERM+) | (enc TERM+ TERM)
      |  (pubk ID) | (privk ID) | (invk ID) | (ltk ID ID)
      |  (pubk STRING ID) | (privk STRING ID)

The form (cat a b c d e) is sugar for (cat a (cat b (cat c (cat d e)))).

SKELETON   ::= (defskeleton ID VARS STRAND+ SKEL-ALIST)
STRAND     ::= (defstrand ID INT MAPLET*) | (deflistener TERM)
MAPLET     ::= (TERM TERM)
SKEL-ALIST ::= (non-orig TERM*) SKEL-ALIST
            |  (uniq-orig TERM*) SKEL-ALIST
            |  (precedes NODE-PAIR*) SKEL-ALIST | ...
NODE-PAIR  ::= (NODE NODE)
NODE       ::= (INT INT)

The ID in the skeleton form names a protocol. It refers to the last protocol definition of that name which precedes the skeleton form. The ID in the strand form names a role. The integer in the strand form gives the height of the strand. The sequence of pairs of terms in the strand form specify an environment used to construct the messages in a strand from its role's trace. The first term is interpreted using the role's variables and the second term uses the skeleton's variables. The environment used to produce the strand's trace is derived by matching the second term using the first term as a pattern.

The first integer in a node identifies a strand, and the second one specifies the position of the node in the strand. Zero-based indexing is used. Zero identifies the first strand, and the first node in a strand has position zero.

A variable may occur in more then one role within a protocol. The reader performs a renaming so as to ensure these occurrences do not overlap. Furthermore, the maplets used to specify a strand need not specify how to map every role variable. The reader inserts missing mappings, and renames every skeleton variable that also occurs in a role of its protocol. The sort of every skeleton variable that occurs in the non-orig or uniq-orig list or in a maplet must be declared.

On output, key-value pairs are added to all skeleton's association list. Every skeleton in the output is labeled with a unique identifier with (label INT) A skeleton has (parent INT) if it is a member of the cohort of the identified parent. A skeleton has (seen INT+) when members of its cohort are isomorphic to previously seen skeletons. A skeleton lists its unrealized nodes with (unrealized NODE*). The traces associated with each strand is given by the (traces ...) form. Finally, the operation used to derive this skeleton from its parent is recorded with (operation TEST KIND TERM NODE TERM*), where TEST is the authentication test encryption-test or nonce, KIND is (added-strand ID INT), (contracted MAPLET*), or (added-listener TERM), TERM is the critical term, NODE in the test node, and the remaining terms specify the escape set. When the operation kind is augmenting, the instance's role name and height are provided. For kind listening, a term is provided. For kind contracting, the substitution is provided. When a substitution refers to a variable not in the skeleton, its name is unpredictable. For generalization, the operation is recorded as (operation generalization METHOD), where METHOD is one of deleted NODE, weakened NODE-PAIR, separated TERM and forgot TERM. Generalization is used to find shapes from realized skeletons. For shape collapsing, the operation is recorded as (operation collapsed INT INT), where the two INTs identify the strands merged. Shape collapsing is used to find related shapes.

The additional sorts and terms in the Diffie-Hellman Algebra follow.

SORT :: = ... | base | expn
TERM ::= ... | (gen) | (exp TERM TERM)
      |  (one) | (mul TERM*) | (rec TERM)

Macros

After reading the input, cpsa expands macros before in analyzing the results. A macro definition is a top-level form.

MACRO ::= (defmacro (NAME ARG*) BODY)

The cpsa program expands all calls to macros in non-macro defining S-expressions using the macros that have been defined previously. A macro definition can be used to expand a call if the first element of a list matches the name of the macro, and the length of the remaining elements in the list matches the length of the macro's argument list. When two macros definitions are applicable, the last definition takes precedence. The cpsa program omits macro definitions from its output.

Usage of CPSA

$ cpsa -h
Usage: cpsa [OPTIONS] [FILE]
  -o FILE    --output=FILE     output FILE
  -l INT     --limit=INT       step count limit (default 2000)
  -b INT     --bound=INT       strand count bound (default 8)
  -m INT     --margin=INT      set output margin (default 72)
  -e         --expand          expand macros only; don't analyze
  -n         --noisochk        disable isomorphism checks
  -a STRING  --algebra=STRING  algebra (default basic)
  -s         --show-algebras   show algebras
  -h         --help            show help message
  -v         --version         show version number

This program will abort if too many steps are taken. A skeleton is printed for each step taken by the program. The step count limit option is used to override the default step count limit. It will also abort when it detects a skeleton with too many strands. The strand count bound option is used to override the default strand count bound. Another way to limit resources used by the program is to limit the amount of memory it may use. The command-line option +RTS -M512m -RTS limits memory usage to 512m.

For long running problems, SMP parallelism is available. For example, on a quad-core machine, we would probably use +RTS -N4 -RTS.

When run with isomorphism checks disabled (--noisochk), CPSA searches for realized skeletons, not shapes. It attempts to speed up analysis by not identifying duplicate skeletons or generalizing realized skeletons, however in many cases, run times increase. It is used when normal analysis takes too much time on the chance that avoiding the checks makes more progress.

In no isomorphism checking mode output, every realized skeleton is labeled a shape even when it is not. This allows the extraction of every realized skeletons from the output using the cpsashapes program.

An error message that begins with "No test for unrealized node" identifies a severe error that should be reported as a bug.

Visualization

The cpsagraph program produces a graphical rendering of the output or input of an analyzer using SVG. It is viewable only with a standards-compliant web browser such as FireFox.

When applied to the output of CPSA, the program groups together all the skeletons related to each problem statement in the input. The program assembles the related skeletons into a directed acyclic graph using the parent identifier and renders it as a tree using SVG. Each vertex in the tree is the label added to the skeleton by the CPSA program. Click the label in the tree to view the skeleton. Hover over a node in a skeleton drawing to see the term associated with it. Hover over a role to see the mapping from role variables to skeleton variables.

A node ordering edge in a skeleton drawing indicates that the message transmission at the source of the edge happened before message reception at the destination edge. The edge is solid if the transmitting term is syntactically identical to the receiving term, otherwise the edge is dashed. Thus in an algebra with a commutative operation *, the graphing program sometimes draws a dashed line between (send (* a b)) and (recv (* b a)).

In the tree drawing, for dead skeletons, the label is red, unless it has been seen before, in which case it is orange. Otherwise the label is green for skeletons seen before. Seen skeleton labels are rendered in an italic font.

By the default, cpsagraph generates a view of CPSA S-expressions as a compound document that contains SVG within XHTML. This view integrates graphics with the input text.

In compact mode, cpsagraph generates an SVG document. The tree is displayed immediately to the left of skeleton drawings. When there is more than one tree, the left-hand-side of the drawing contains a vertical listing of the trees. Compact mode output should never be used as a replacement for studying the text version of the output. The text version contains strictly more information, and should be displayed next to its graphical rendition.

In LaTeX mode, cpsagraph generates LaTeX source. XY-pic is used for drawings of skeletons. The margin specified in cpsa.mk produces good results.

Usage

$ cpsagraph -h
Usage: cpsagraph [OPTIONS] [FILE]
  -o FILE  --output=FILE  output FILE
  -x       --expanded     use expanded format (default)
  -c       --compact      use compact format
  -l       --latex        use LaTeX format
  -m INT   --margin=INT   set output margin (default 72)
  -h       --help         show help message
  -v       --version      show version number

Shape Extraction

The cpsashapes program extracts the original problems and the shapes from the output of a CPSA run. The shapes are linked to their problem so the output can be graphed.

Usage

$ cpsashapes -h
Usage: cpsashapes [OPTIONS] [FILE]
  -o FILE  --output=FILE  output FILE
  -m INT   --margin=INT   set output margin (default 72)
  -h       --help         show help message
  -v       --version      show version number

Annotations

The cpsaannotations program uses protocol annotations to annotate shapes and generate protocol soundness obligations.

Usage

$ cpsaannotations -h
Usage: cpsashapes [OPTIONS] [FILE]
  -o FILE    --output=FILE     output FILE
  -m INT     --margin=INT      set output margin (default 72)
  -a STRING  --algebra=STRING  algebra (default basic)
  -s         --show-algebras   show algebras
  -h         --help            show help message
  -v         --version         show version number

Syntax

To be analyzed, each role in a protocol must include an annotations form. The TERM is a role base term that when instantiated, is the name of a principal in the shape. What follows is sequences of pairs. The integer gives the position of the event in the trace that is annotated by the formula. Zero-based indexing is used.

The language of formulas is first-order logic extended with a modal says operator. Formula terms may include function symbols that are not part of a protocol's message signature.

ANNOTATION ::= (annotations TERM (INT FORMULA)*)
FORMULA    ::= (ID FTERM*) | (not FORMULA)
            |  (and FORMULA*) | (or FORMULA*)
            |  (implies FORMULA* FORMULA)
            |  (iff FORMULA FORMULA)
            |  (says TERM FORMULA)
            |  (forall (DECL*) FORMULA)
	    |  (exists (DECL*) FORMULA)
FTERM      ::= TERM | (ID FTERM*)

Use (and) for truth and (or) for falsehood.

Output

On output, each shape contains an annotations form and a obligations form. The annotations form presents every non-trivial formula derived from the protocol. The obligations form presents every non-trivial soundness obligation.

Parameter Analysis

The parameters of a role are the base terms that are not acquired by the role's trace, but must be available before a complete execution of the trace is possible. This program computes sets of sets of parameters consistent with the role. If the expected parameter set is not a member, a specification error is indicated.

Usage

$ cpsaparameters -h
Usage: cpsaparameters [OPTIONS] [FILE]
  -o FILE    --output=FILE     output FILE
  -m INT     --margin=INT      set output margin (default 72)
  -a STRING  --algebra=STRING  algebra (default basic)
  -s         --show-algebras   show algebras
  -h         --help            show help message
  -v         --version         show version number

Output

On output, each role contains a parameters form with a list of sets of base terms.

(defrole resp (vars (b a name) (n2 n1 text))
  (trace
    (recv (enc n1 a (pubk b)))
    (send (enc n1 n2 (pubk a)))
    (recv (enc n2 (pubk b))))
  (parameters
    (n2 n1 a (pubk b) (pubk a))
    (n2 (privk b) (pubk a))))

In this example, the second parameter set is what is expected for a responder role in this version of the Needham-Schroeder protocol.

Macro expansion is not performed by this program. Use the -e option with cpsa to preprocess input that contains macro definitions.

Pretty Printing

The cpsapp program program pretty prints its input using the CPSA specific algorithm.

Usage

$ cpsapp -h
Usage: cpsapp [OPTIONS] [FILE]
  -o FILE  --output=FILE  output FILE
  -m INT   --margin=INT   set output margin (default 72)
  -h       --help         show help message
  -v       --version      show version number

S-expressions

The S-expressions used are restricted so that most dialects of Lisp can read them, and characters within symbols and strings never need quoting. ASCII is used to encode characters. Every list is proper. An atom is either a symbol, an integer, or a string. The characters that make up a symbol are the letters, the digits, and the special characters in "-*/<=>!?:$%_&~^". A symbol may not begin with a digit or a sign followed by a digit. The characters that make up a string are the ASCII printing characters omitting double quote and backslash. Double quotes delimit a string. A comment begins with a semicolon, or is an S-expression list at top-level that starts with the comment symbol.


Copyright (c) 2009 The MITRE Corporation. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, this copyright notice and the title of the publication and its date appear, and notice in given that copying is by permission of The MITRE Corporation.