| Portability | portable |
|---|---|
| Stability | provisional |
| Maintainer | {magr,klee}@cs.tu-berlin.de |
Harpy.X86Disassembler
Description
Disassembler for x86 machine code.
This is a disassembler for object code for the x86 architecture. It provides functions for disassembling byte arrays, byte lists and memory blocks containing raw binary code.
Features:
- Disassembles memory blocks, lists or arrays of bytes into lists of instructions.
- Abstract instructions provide as much information as possible about opcodes, addressing modes or operand sizes, allowing for detailed output.
- Provides functions for displaying instructions in Intel or AT&T style (like the GNU tools)
Differences to GNU tools, like gdb or objdump:
- Displacements are shown in decimal, with sign if negative.
Missing:
- LOCK and repeat prefixes are recognized, but not contained in the opcodes of instructions.
- Support for 16-bit addressing modes. Could be added when needed.
- Complete disassembly of all 64-bit instructions. I have tried to disassemble them properly but have been limited to the information in the docs, because I have no 64-bit machine to test on. This will probably change when I get GNU as to produce 64-bit object files.
- Not all MMX and SSESSE2SSE3 instructions are decoded yet. This is just a matter of missing time.
- segment override prefixes are decoded, but not appended to memory references
On the implementation:
This disassembler uses the Parsec parser combinators, working on byte lists. This proved to be very convenient, as the combinators keep track of the current position, etc.
- data Opcode
- data Operand
- = OpImm Word32
- | OpAddr Word32 InstrOperandSize
- | OpReg String Int
- | OpFPReg Int
- | OpInd String InstrOperandSize
- | OpIndDisp String Int InstrOperandSize
- | OpBaseIndex String String Int InstrOperandSize
- | OpIndexDisp String Int Int InstrOperandSize
- | OpBaseIndexDisp String String Int Int InstrOperandSize
- data InstrOperandSize
- data Instruction
- = BadInstruction Word8 String Int [Word8]
- | PseudoInstruction Int String
- | Instruction { }
- data ShowStyle
- = IntelStyle
- | AttStyle
- disassembleBlock :: Ptr Word8 -> Int -> IO (Either ParseError [Instruction])
- disassembleList :: Monad m => [Word8] -> m (Either ParseError [Instruction])
- disassembleArray :: (Monad m, IArray a Word8, Ix i) => a i Word8 -> m (Either ParseError [Instruction])
- showIntel :: Instruction -> [Char]
- showAtt :: Instruction -> [Char]
Types
All opcodes are represented by this enumeration type.
All operands are in one of the following locations:
- Constants in the instruction stream
- Memory locations
- Registers
Memory locations are referred to by on of several addressing modes:
- Absolute (address in instruction stream)
- Register-indirect (address in register)
- Register-indirect with displacement
- Base-Index with scale
- Base-Index with scale and displacement
Displacements can be encoded as 8 or 32-bit immediates in the instruction stream, but are encoded as Int in instructions for simplicity.
Constructors
| OpImm Word32 | Immediate value |
| OpAddr Word32 InstrOperandSize | Absolute address |
| OpReg String Int | Register |
| OpFPReg Int | Floating-point register |
| OpInd String InstrOperandSize | Register-indirect |
| OpIndDisp String Int InstrOperandSize | Register-indirect with displacement |
| OpBaseIndex String String Int InstrOperandSize | Base plus scaled index |
| OpIndexDisp String Int Int InstrOperandSize | Scaled index with displacement |
| OpBaseIndexDisp String String Int Int InstrOperandSize | Base plus scaled index with displacement |
data InstrOperandSize Source
Some opcodes can operate on data of several widths. This information is encoded in instructions using the following enumeration type..
Constructors
| OPNONE | No operand size specified |
| OP8 | 8-bit integer operand |
| OP16 | 16-bit integer operand |
| OP32 | 32-bit integer operand |
| OP64 | 64-bit integer operand |
| OP128 | 128-bit integer operand |
| OPF32 | 32-bit floating point operand |
| OPF64 | 64-bit floating point operand |
| OPF80 | 80-bit floating point operand |
Instances
data Instruction Source
The disassembly routines return lists of the following datatype. It encodes both invalid byte sequences (with a useful error message, if possible), or a valid instruction. Both variants contain the list of opcode bytes from which the instruction was decoded and the address of the instruction.
Constructors
| BadInstruction Word8 String Int [Word8] | Invalid instruction |
| PseudoInstruction Int String | Pseudo instruction, e.g. label |
| Instruction | Valid instruction |
Instances
Instructions can be displayed either in Intel or AT&T style (like in GNU tools).
Intel style:
- Destination operand comes first, source second.
- No register or immediate prefixes.
- Memory operands are annotated with operand size.
- Hexadecimal numbers are suffixed with
Hand prefixed with0if necessary.
AT&T style:
- Source operand comes first, destination second.
- Register names are prefixes with
%. - Immediates are prefixed with
$. - Hexadecimal numbers are prefixes with
0x - Opcodes are suffixed with operand size, when ambiguous otherwise.
Constructors
| IntelStyle | Show in Intel style |
| AttStyle | Show in AT&T style |
Functions
disassembleBlock :: Ptr Word8 -> Int -> IO (Either ParseError [Instruction])Source
Disassemble a block of memory. Starting at the location pointed to by the given pointer, the given number of bytes are disassembled.
disassembleList :: Monad m => [Word8] -> m (Either ParseError [Instruction])Source
Disassemble the contents of the given list.
disassembleArray :: (Monad m, IArray a Word8, Ix i) => a i Word8 -> m (Either ParseError [Instruction])Source
Disassemble the contents of the given array.
showIntel :: Instruction -> [Char]Source
Show an instruction in Intel style.
showAtt :: Instruction -> [Char]Source
Show an instruction in AT&T style.