Portability	portable
Stability	provisional
Maintainer	{magr,klee}@cs.tu-berlin.de

Harpy.X86Disassembler

Contents

Types
Functions

Description

Disassembler for x86 machine code.

This is a disassembler for object code for the x86 architecture. It provides functions for disassembling byte arrays, byte lists and memory blocks containing raw binary code.

Features:

Disassembles memory blocks, lists or arrays of bytes into lists of instructions.
Abstract instructions provide as much information as possible about opcodes, addressing modes or operand sizes, allowing for detailed output.
Provides functions for displaying instructions in Intel or AT&T style (like the GNU tools)

Differences to GNU tools, like gdb or objdump:

Displacements are shown in decimal, with sign if negative.

Missing:

LOCK and repeat prefixes are recognized, but not contained in the opcodes of instructions.
Support for 16-bit addressing modes. Could be added when needed.
Complete disassembly of all 64-bit instructions. I have tried to disassemble them properly but have been limited to the information in the docs, because I have no 64-bit machine to test on. This will probably change when I get GNU as to produce 64-bit object files.
Not all MMX and SSESSE2SSE3 instructions are decoded yet. This is just a matter of missing time.
segment override prefixes are decoded, but not appended to memory references

On the implementation:

This disassembler uses the Parsec parser combinators, working on byte lists. This proved to be very convenient, as the combinators keep track of the current position, etc.

Synopsis

Types

data Opcode Source

All opcodes are represented by this enumeration type.

Instances

Eq Opcode
Show Opcode

data Operand Source

All operands are in one of the following locations:

Constants in the instruction stream
Memory locations
Registers

Memory locations are referred to by on of several addressing modes:

Absolute (address in instruction stream)
Register-indirect (address in register)
Register-indirect with displacement
Base-Index with scale
Base-Index with scale and displacement

Displacements can be encoded as 8 or 32-bit immediates in the instruction stream, but are encoded as Int in instructions for simplicity.

Constructors

OpImm Word32	Immediate value
OpAddr Word32 InstrOperandSize	Absolute address
OpReg String Int	Register
OpFPReg Int	Floating-point register
OpInd String InstrOperandSize	Register-indirect
OpIndDisp String Int InstrOperandSize	Register-indirect with displacement
OpBaseIndex String String Int InstrOperandSize	Base plus scaled index
OpIndexDisp String Int Int InstrOperandSize	Scaled index with displacement
OpBaseIndexDisp String String Int Int InstrOperandSize	Base plus scaled index with displacement

Instances

Eq Operand

data InstrOperandSize Source

Some opcodes can operate on data of several widths. This information is encoded in instructions using the following enumeration type..

Constructors

OPNONE	No operand size specified
OP8	8-bit integer operand
OP16	16-bit integer operand
OP32	32-bit integer operand
OP64	64-bit integer operand
OP128	128-bit integer operand
OPF32	32-bit floating point operand
OPF64	64-bit floating point operand
OPF80	80-bit floating point operand

Instances

Eq InstrOperandSize
Show InstrOperandSize

data Instruction Source

The disassembly routines return lists of the following datatype. It encodes both invalid byte sequences (with a useful error message, if possible), or a valid instruction. Both variants contain the list of opcode bytes from which the instruction was decoded and the address of the instruction.

Constructors

BadInstruction Word8 String Int [Word8]	Invalid instruction
PseudoInstruction Int String	Pseudo instruction, e.g. label
Instruction	Valid instruction
Fields opcode :: Opcode Opcode of the instruction opsize :: InstrOperandSize Operand size, if any operands :: [Operand] Instruction operands address :: Int Start address of instruction bytes :: [Word8] Instruction bytes

Instances

Eq Instruction
Show Instruction

data ShowStyle Source

Instructions can be displayed either in Intel or AT&T style (like in GNU tools).

Intel style:

Destination operand comes first, source second.
No register or immediate prefixes.
Memory operands are annotated with operand size.
Hexadecimal numbers are suffixed with H and prefixed with 0 if necessary.

AT&T style:

Source operand comes first, destination second.
Register names are prefixes with %.
Immediates are prefixed with $.
Hexadecimal numbers are prefixes with 0x
Opcodes are suffixed with operand size, when ambiguous otherwise.

Constructors

IntelStyle	Show in Intel style
AttStyle	Show in AT&T style

Functions

disassembleBlock :: Ptr Word8 -> Int -> IO (Either ParseError [Instruction])Source

Disassemble a block of memory. Starting at the location pointed to by the given pointer, the given number of bytes are disassembled.

disassembleList :: Monad m => [Word8] -> m (Either ParseError [Instruction])Source

Disassemble the contents of the given list.

disassembleArray :: (Monad m, IArray a Word8, Ix i) => a i Word8 -> m (Either ParseError [Instruction])Source

Disassemble the contents of the given array.

showIntel :: Instruction -> [Char]Source

Show an instruction in Intel style.

showAtt :: Instruction -> [Char]Source

Show an instruction in AT&T style.