risc386: Reduced instruction set i386 simulator

[ bsd3, compilers-interpreters, education, program ] [ Propose Tags ]

risc386 is a symbolic Intel(R) 386 assembler interpreter which allows infinitely many registers (temporaries). Its purpose is to debug the output of a MiniJava compiler (from Andrew Appel's book, Modern Compiler Implementation in JAVA) before register allocation has been performed.

risc386 supports only a small fragment of i386 instructions. It expects its input to be a list of procedures in .intel_syntax each of which is started by a label and terminated by a return statement.

Control flow is restricted, so, only jumps to procedure-local labels are allowed. Reading from an uninitialized memory location will lead to an exception.

[Skip to Readme]
Versions [faq] 0.0.20130624, 0.0.20130701, 0.0.20130718, 0.0.20130719, 0.0.20210125
Change log CHANGELOG.md
Dependencies array (>=0.3 && <0.6), base (>=4.6 && <5), containers (>=0.3 && <0.7), mtl (==2.0.* || >=2.1.1 && <2.3), pretty (>=1.0 && <1.2) [details]
License BSD-3-Clause
Author Andreas Abel with contributions by Robert Grabowski and Ulrich Schoepp
Maintainer Andreas Abel <andreas.abel@ifi.lmu.de>
Category Compilers/Interpreters, Education
Home page http://www2.tcs.ifi.lmu.de/~abel/
Source repo head: git clone git://github.com/andreasabel/risc386.git
Uploaded by AndreasAbel at 2021-01-25T02:24:03Z
Distributions NixOS:0.0.20130719
Executables risc386
Downloads 2769 total (69 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs not available [build log]
Last success reported on 2021-01-25 [all 2 reports]


Maintainer's Corner

For package maintainers and hackage trustees

Readme for risc386-0.0.20210125

[back to package description]
risc386 -- Restricted Instruction Set i386 simulator
(C) 2013, Andreas Abel, Ludwig-Maximilians-University Munich

The main purpose of this simulator is to test i386 code generated by a
compiler before register allocation.  Therefore, it supports
temporaries, an potentially infinite amount of extra registers
t<number>.  (Of course, it can also be used to execute symbolic
assembler after register allocation.)

The supported instruction set is very restricted but sufficient to
write a compiler for MiniJava [Andrew Appel, Modern Compiler
Implementation in Java].

I. System requirements:

  You need a recent version of the Haskell Platform.

II. Installation:

  1. Change to a temporary directory.
  2. Unpack the tar ball

     tar xzf risc386-x.y.z.tar.gz

  3. Change to the unpacked directory

     cd risc386-x.y.z

  4. Install using Haskell's packet manager cabal

     cabal install

III. Running the simulator:

  risc386 input-file.s

IV. Format of the input file:

  The input file must be symbolic assembler in Intel format.

  Here is a small example:

        .global Lmain
        .type Lmain, @function
        enter   0, 0
L0:     push    8
        call    L_halloc
        add     %esp, 4
        mov     t1001, %eax
        push    t1001
        call    LC$value
        add     %esp, 4
        mov     t1002, %eax
        push    t1002
        call    L_println_int
        add     %esp, 4
L1:     leave

        .global LC$value
        .type LC$value, @function
        #args LOC 0
        enter   0, 0
L2:     mov     t1004, DWORD PTR [%ebp+8]
        mov     DWORD PTR [t1004+4], 555
        mov     t1003, DWORD PTR [%ebp+8]
        mov     %eax, DWORD PTR [t1003+4]
L3:     leave

  Lexing rules:
  (If you want to be sure, read the .x file, the lexer specification.)

    * White space is ignored (except as separator for alphanumeric tokens).

    * Lines beginning with a dot '.' are skipped.
      These lines are pragmas for the symbolic assembler,
      which risc386 ignores.

    * Lines beginning with a hash-symbol followed by a space '# '
      are comments, which are ignored as well.

    * Lines beginning with a hash followed by a non-space character
      are risc386 pragmas and not ignored.

      Currently, risc386 only recognizes the pragma '#args'.

    * Valid tokens are:

        #args LOC REG

        [ ] : , . + - *
        dword ptr                    DWORD PTR

        mov lea                      MOV LEA
        add sub imul                 ADD SUB IMUL
        idiv inc dec neg             IDIV INC DEC NEG
        shl shr sal sar              SHL SHR SAL SAR
        and or xor                   AND OR XOR
        not                          NOT
        cmp                          CMP
        je jne jl jle jge            JE JNE JL JLE JGE
        jmp call ret                 JMP CALL RET
        push pop enter leave         PUSH POP ENTER LEAVE
        nop                          NOP

        eax ebx ecx edx esi edi ebp esp
        %eax %ebx %ecx %edx %esi %edi %ebp %esp

        <number>    (given by reg.ex [0-9]+)
        t<number>   (denoting a temporary register)

        <ident>     (given by reg.ex. [a-zA-Z][a-zA-Z0-9_'$]*)

      Identifiers are used for labels.

  Parsing rules:
  (If you want to know all of them, read the .y file)

    1. The input file must be a sequence of procedures.

       There must be one procedure whose name ends in 'main'.
       This one is taken as the entry point.

    2. Each procedure starts with a label and ends with a return
       instruction. Optionally, it can be preceded by a declaration
       of its arguments

                  #args REG %eax, LOC 0, LOC 4

       Lmyproc expects its first argument in register %eax,
       its second at [%esp+0] and its third at [%esp+4].
       The stack addresses are to be taken *before* the CALL
       is executed (which will put the return address on the stack
       and shift the relative location of the arguments by +4).

    3. The body of each procedure is a list of i386 assembler
       instructions in Intel syntax.  The supported instructions
       are listed above.

       Each instruction my be preceded by a label.

       Conditional and unconditional jumps are only allowed to
       a label, and only to one defined in the same procedure.
       Cross-procedure jumps or jumps to a calculated address
       are not supported.

       CALLs are only defined to a procedure label.
       risc386 assumes the cdecl calling convention.

    4. Restrictions for individual instructions:

       RET    does accept arguments
       ENTER  is only supported in the form ENTER <number>, 0


    risc386 knows a number of predefined procedures.  They expect
    their arguments on the stack (cdecl calling convention) and
    return the result in %eax.

      1 Argument: number of bytes to allocate on the heap
      Result    : pointer to first allocated byte.

      1 Argument: signed 32bit integer value to print
      Result    : nothing

      1 Argument: unicode char (32bit) to print
      Result    : nothing

      1 Argument: error code
      Result    : nothing, does not return, stops execution

  Execution specialties:

    risc386 supports 4 different types, all of size 32 bits:

      1. Signed integers.

      2. Heap addresses.

         Heap addresses consist of a base address which was obtained
         by L_halloc plus an offset.  The offset must be a multiple of 4.

      3. Stack addresses.

         %esp and %ebp may only be loaded with stack addresses.

      4. Return addresses.

         Get pushed onto the stack by a CALL.

         RET checks that a return address lies on top of the stack
         before returning.  The content of the return address is
         ignored, RET jumps back to the procedure where the matching
         CALL was issued.

    CMP is the only command that sets flags.

    CALL saves all temporary registers, RET restores them.