harpy-0.4.1: Runtime code generation for x86 machine code

Portabilityportable
Stabilityprovisional
Maintainer{magr,klee}@cs.tu-berlin.de

Harpy.X86Disassembler

Contents

Description

Disassembler for x86 machine code.

This is a disassembler for object code for the x86 architecture. It provides functions for disassembling byte arrays, byte lists and memory blocks containing raw binary code.

Features:

  • Disassembles memory blocks, lists or arrays of bytes into lists of instructions.
  • Abstract instructions provide as much information as possible about opcodes, addressing modes or operand sizes, allowing for detailed output.
  • Provides functions for displaying instructions in Intel or AT&T style (like the GNU tools)

Differences to GNU tools, like gdb or objdump:

  • Displacements are shown in decimal, with sign if negative.

Missing:

  • LOCK and repeat prefixes are recognized, but not contained in the opcodes of instructions.
  • Support for 16-bit addressing modes. Could be added when needed.
  • Complete disassembly of all 64-bit instructions. I have tried to disassemble them properly but have been limited to the information in the docs, because I have no 64-bit machine to test on. This will probably change when I get GNU as to produce 64-bit object files.
  • Not all MMX and SSESSE2SSE3 instructions are decoded yet. This is just a matter of missing time.
  • segment override prefixes are decoded, but not appended to memory references

On the implementation:

This disassembler uses the Parsec parser combinators, working on byte lists. This proved to be very convenient, as the combinators keep track of the current position, etc.

Synopsis

Types

data Opcode Source

All opcodes are represented by this enumeration type.

Instances

data Operand Source

All operands are in one of the following locations:

  • Constants in the instruction stream
  • Memory locations
  • Registers

Memory locations are referred to by on of several addressing modes:

  • Absolute (address in instruction stream)
  • Register-indirect (address in register)
  • Register-indirect with displacement
  • Base-Index with scale
  • Base-Index with scale and displacement

Displacements can be encoded as 8 or 32-bit immediates in the instruction stream, but are encoded as Int in instructions for simplicity.

Constructors

OpImm Word32

Immediate value

OpAddr Word32 InstrOperandSize

Absolute address

OpReg String Int

Register

OpFPReg Int

Floating-point register

OpInd String InstrOperandSize

Register-indirect

OpIndDisp String Int InstrOperandSize

Register-indirect with displacement

OpBaseIndex String String Int InstrOperandSize

Base plus scaled index

OpIndexDisp String Int Int InstrOperandSize

Scaled index with displacement

OpBaseIndexDisp String String Int Int InstrOperandSize

Base plus scaled index with displacement

Instances

data InstrOperandSize Source

Some opcodes can operate on data of several widths. This information is encoded in instructions using the following enumeration type..

Constructors

OPNONE

No operand size specified

OP8

8-bit integer operand

OP16

16-bit integer operand

OP32

32-bit integer operand

OP64

64-bit integer operand

OP128

128-bit integer operand

OPF32

32-bit floating point operand

OPF64

64-bit floating point operand

OPF80

80-bit floating point operand

data Instruction Source

The disassembly routines return lists of the following datatype. It encodes both invalid byte sequences (with a useful error message, if possible), or a valid instruction. Both variants contain the list of opcode bytes from which the instruction was decoded and the address of the instruction.

Constructors

BadInstruction Word8 String Int [Word8]

Invalid instruction

PseudoInstruction Int String

Pseudo instruction, e.g. label

Instruction

Valid instruction

Fields

opcode :: Opcode

Opcode of the instruction

opsize :: InstrOperandSize

Operand size, if any

operands :: [Operand]

Instruction operands

address :: Int

Start address of instruction

bytes :: [Word8]

Instruction bytes

data ShowStyle Source

Instructions can be displayed either in Intel or AT&T style (like in GNU tools).

Intel style:

  • Destination operand comes first, source second.
  • No register or immediate prefixes.
  • Memory operands are annotated with operand size.
  • Hexadecimal numbers are suffixed with H and prefixed with 0 if necessary.

AT&T style:

  • Source operand comes first, destination second.
  • Register names are prefixes with %.
  • Immediates are prefixed with $.
  • Hexadecimal numbers are prefixes with 0x
  • Opcodes are suffixed with operand size, when ambiguous otherwise.

Constructors

IntelStyle

Show in Intel style

AttStyle

Show in AT&T style

Functions

disassembleBlock :: Ptr Word8 -> Int -> IO (Either ParseError [Instruction])Source

Disassemble a block of memory. Starting at the location pointed to by the given pointer, the given number of bytes are disassembled.

disassembleList :: Monad m => [Word8] -> m (Either ParseError [Instruction])Source

Disassemble the contents of the given list.

disassembleArray :: (Monad m, IArray a Word8, Ix i) => a i Word8 -> m (Either ParseError [Instruction])Source

Disassemble the contents of the given array.

showIntel :: Instruction -> [Char]Source

Show an instruction in Intel style.

showAtt :: Instruction -> [Char]Source

Show an instruction in AT&T style.