Disassembler for x86 machine code.
This is a disassembler for object code for the x86 architecture. It provides functions for disassembling byte arrays, byte lists and memory blocks containing raw binary code.
- Disassembles memory blocks, lists or arrays of bytes into lists of instructions.
- Abstract instructions provide as much information as possible about opcodes, addressing modes or operand sizes, allowing for detailed output.
- Provides functions for displaying instructions in Intel or AT&T style (like the GNU tools)
Differences to GNU tools, like gdb or objdump:
- Displacements are shown in decimal, with sign if negative.
- LOCK and repeat prefixes are recognized, but not contained in the opcodes of instructions.
- Support for 16-bit addressing modes. Could be added when needed.
- Complete disassembly of all 64-bit instructions. I have tried to disassemble them properly but have been limited to the information in the docs, because I have no 64-bit machine to test on. This will probably change when I get GNU as to produce 64-bit object files.
- Not all MMX and SSESSE2SSE3 instructions are decoded yet. This is just a matter of missing time.
- segment override prefixes are decoded, but not appended to memory references
On the implementation:
This disassembler uses the Parsec parser combinators, working on byte lists. This proved to be very convenient, as the combinators keep track of the current position, etc.
- data Opcode
- data Operand
- = OpImm Word32
- | OpAddr Word32 InstrOperandSize
- | OpReg String Int
- | OpFPReg Int
- | OpInd String InstrOperandSize
- | OpIndDisp String Int InstrOperandSize
- | OpBaseIndex String String Int InstrOperandSize
- | OpIndexDisp String Int Int InstrOperandSize
- | OpBaseIndexDisp String String Int Int InstrOperandSize
- data InstrOperandSize
- data Instruction
- data ShowStyle
- disassembleBlock :: Ptr Word8 -> Int -> IO (Either ParseError [Instruction])
- disassembleList :: Monad m => [Word8] -> m (Either ParseError [Instruction])
- disassembleArray :: (Monad m, IArray a Word8, Ix i) => a i Word8 -> m (Either ParseError [Instruction])
- showIntel :: Instruction -> [Char]
- showAtt :: Instruction -> [Char]
All operands are in one of the following locations:
- Constants in the instruction stream
- Memory locations
Memory locations are referred to by on of several addressing modes:
- Absolute (address in instruction stream)
- Register-indirect (address in register)
- Register-indirect with displacement
- Base-Index with scale
- Base-Index with scale and displacement
Displacements can be encoded as 8 or 32-bit immediates in the instruction stream, but are encoded as Int in instructions for simplicity.
|OpAddr Word32 InstrOperandSize|
|OpReg String Int|
|OpInd String InstrOperandSize|
|OpIndDisp String Int InstrOperandSize|
Register-indirect with displacement
|OpBaseIndex String String Int InstrOperandSize|
Base plus scaled index
|OpIndexDisp String Int Int InstrOperandSize|
Scaled index with displacement
|OpBaseIndexDisp String String Int Int InstrOperandSize|
Base plus scaled index with displacement
Some opcodes can operate on data of several widths. This information is encoded in instructions using the following enumeration type..
No operand size specified
8-bit integer operand
16-bit integer operand
32-bit integer operand
64-bit integer operand
128-bit integer operand
32-bit floating point operand
64-bit floating point operand
80-bit floating point operand
The disassembly routines return lists of the following datatype. It encodes both invalid byte sequences (with a useful error message, if possible), or a valid instruction. Both variants contain the list of opcode bytes from which the instruction was decoded and the address of the instruction.
|BadInstruction Word8 String Int [Word8]|
|PseudoInstruction Int String|
Pseudo instruction, e.g. label
Instructions can be displayed either in Intel or AT&T style (like in GNU tools).
- Destination operand comes first, source second.
- No register or immediate prefixes.
- Memory operands are annotated with operand size.
- Hexadecimal numbers are suffixed with
Hand prefixed with
- Source operand comes first, destination second.
- Register names are prefixes with
- Immediates are prefixed with
- Hexadecimal numbers are prefixes with
- Opcodes are suffixed with operand size, when ambiguous otherwise.
Disassemble a block of memory. Starting at the location pointed to by the given pointer, the given number of bytes are disassembled.
Disassemble the contents of the given list.
Disassemble the contents of the given array.