| Copyright | (C) 2013-2016 University of Twente 2016-2017 Myrtle Software Ltd 2017 Google Inc. | 
|---|---|
| License | BSD2 (see the file LICENSE) | 
| Maintainer | Christiaan Baaij <christiaan.baaij@gmail.com> | 
| Safe Haskell | Trustworthy | 
| Language | Haskell2010 | 
| Extensions | 
 | 
Clash.Explicit.BlockRam
Description
BlockRAM primitives
Using RAMs
We will show a rather elaborate example on how you can, and why you might want
to use blockRams. We will build a "small" CPU+Memory+Program ROM where we
will slowly evolve to using blockRams. Note that the code is not meant as a
de-facto standard on how to do CPU design in Clash.
We start with the definition of the Instructions, Register names and machine codes:
{-# LANGUAGE RecordWildCards, TupleSections, DeriveAnyClass #-}
module CPU where
import Clash.Explicit.Prelude
type InstrAddr = Unsigned 8
type MemAddr   = Unsigned 5
type Value     = Signed 8
data Instruction
  = Compute Operator Reg Reg Reg
  | Branch Reg Value
  | Jump Value
  | Load MemAddr Reg
  | Store Reg MemAddr
  | Nop
  deriving (Eq,Show)
data Reg
  = Zero
  | PC
  | RegA
  | RegB
  | RegC
  | RegD
  | RegE
  deriving (Eq,Show,Enum)
data Operator = Add | Sub | Incr | Imm | CmpGt
  deriving (Eq,Show)
data MachCode
  = MachCode
  { inputX  :: Reg
  , inputY  :: Reg
  , result  :: Reg
  , aluCode :: Operator
  , ldReg   :: Reg
  , rdAddr  :: MemAddr
  , wrAddrM :: Maybe MemAddr
  , jmpM    :: Maybe Value
  }
nullCode = MachCode { inputX = Zero, inputY = Zero, result = Zero, aluCode = Imm
                    , ldReg = Zero, rdAddr = 0, wrAddrM = Nothing
                    , jmpM = Nothing
                    }
Next we define the CPU and its ALU:
cpu
  :: Vec 7 Value
  -- ^ Register bank
  -> (Value,Instruction)
  -- ^ (Memory output, Current instruction)
  -> ( Vec 7 Value
     , (MemAddr, Maybe (MemAddr,Value), InstrAddr)
     )
cpu regbank (memOut,instr) = (regbank',(rdAddr,(,aluOut) <$> wrAddrM,fromIntegral ipntr))
  where
    -- Current instruction pointer
    ipntr = regbank !! PC
    -- Decoder
    (MachCode {..}) = case instr of
      Compute op rx ry res -> nullCode {inputX=rx,inputY=ry,result=res,aluCode=op}
      Branch cr a          -> nullCode {inputX=cr,jmpM=Just a}
      Jump a               -> nullCode {aluCode=Incr,jmpM=Just a}
      Load a r             -> nullCode {ldReg=r,rdAddr=a}
      Store r a            -> nullCode {inputX=r,wrAddrM=Just a}
      Nop                  -> nullCode
    -- ALU
    regX   = regbank !! inputX
    regY   = regbank !! inputY
    aluOut = alu aluCode regX regY
    -- next instruction
    nextPC = case jmpM of
               Just a | aluOut /= 0 -> ipntr + a
               _                    -> ipntr + 1
    -- update registers
    regbank' = replace Zero   0
             $ replace PC     nextPC
             $ replace result aluOut
             $ replace ldReg  memOut
             $ regbank
alu Add   x y = x + y
alu Sub   x y = x - y
alu Incr  x _ = x + 1
alu Imm   x _ = x
alu CmpGt x y = if x > y then 1 else 0
We initially create a memory out of simple registers:
dataMem :: KnownDomain dom => Clock dom -> Reset dom -> Enable dom -> Signal dom MemAddr -- ^ Read address -> Signal dom (Maybe (MemAddr,Value)) -- ^ (write address, data in) -> Signal dom Value -- ^ data out dataMem clk rst en rd wrM =mealyclk rst en dataMemT (replicated32 0) (bundle (rd,wrM)) where dataMemT mem (rd,wrM) = (mem',dout) where dout = mem!!rd mem' = case wrM of Just (wr,din) ->replacewr din mem _ -> mem
And then connect everything:
system
  :: ( KnownDomain dom
     , KnownNat n )
  => Vec n Instruction
  -> Clock dom
  -> Reset dom
  -> Enable dom
  -> Signal dom Value
system instrs clk rst en = memOut
  where
    memOut = dataMem clk rst en rdAddr dout
    (rdAddr,dout,ipntr) = mealyB clk rst en cpu (replicate d7 0) (memOut,instr)
    instr  = asyncRom instrs <$> ipntr
Create a simple program that calculates the GCD of 4 and 6:
-- Compute GCD of 4 and 6
prog = -- 0 := 4
       Compute Incr Zero RegA RegA :>
       replicate d3 (Compute Incr RegA Zero RegA) ++
       Store RegA 0 :>
       -- 1 := 6
       Compute Incr Zero RegA RegA :>
       replicate d5 (Compute Incr RegA Zero RegA) ++
       Store RegA 1 :>
       -- A := 4
       Load 0 RegA :>
       -- B := 6
       Load 1 RegB :>
       -- start
       Compute CmpGt RegA RegB RegC :>
       Branch RegC 4 :>
       Compute CmpGt RegB RegA RegC :>
       Branch RegC 4 :>
       Jump 5 :>
       -- (a > b)
       Compute Sub RegA RegB RegA :>
       Jump (-6) :>
       -- (b > a)
       Compute Sub RegB RegA RegB :>
       Jump (-8) :>
       -- end
       Store RegA 2 :>
       Load 2 RegC :>
       Nil
And test our system:
>>> sampleN 32 $ system prog systemClockGen resetGen enableGen [0,0,0,0,0,0,4,4,4,4,4,4,4,4,6,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,2]
to see that our system indeed calculates that the GCD of 6 and 4 is 2.
Improvement 1: using asyncRam
As you can see, it's fairly straightforward to build a memory using registers
and read (!!) and write (replace) logic. This might however not result in
the most efficient hardware structure, especially when building an ASIC.
Instead it is preferable to use the asyncRam function which
has the potential to be translated to a more efficient structure:
system2
  :: ( KnownDomain dom
     , KnownNat n )
  => Vec n Instruction
  -> Clock dom
  -> Reset dom
  -> Enable dom
  -> Signal dom Value
system2 instrs clk rst en = memOut
  where
    memOut = asyncRam clk clk en d32 rdAddr dout
    (rdAddr,dout,ipntr) = mealyB clk rst en cpu (replicate d7 0) (memOut,instr)
    instr  = asyncRom instrs <$> ipntr
Again, we can simulate our system and see that it works. This time however,
we need to disregard the first few output samples, because the initial content of an
asyncRam is undefined, and consequently, the first few
output samples are also undefined. We use the utility function printX to conveniently
filter out the undefinedness and replace it with the string X in the few leading outputs.
>>> printX $ sampleN 32 $ system2 prog systemClockGen resetGen enableGen [X,X,X,X,X,X,4,4,4,4,4,4,4,4,6,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,2]
Improvement 2: using blockRam
Finally we get to using blockRam. On FPGAs, asyncRam will
be implemented in terms of LUTs, and therefore take up logic resources. FPGAs
also have large(r) memory structures called Block RAMs, which are preferred,
especially as the memories we need for our application get bigger. The
blockRam function will be translated to such a Block RAM.
One important aspect of Block RAMs have a synchronous read port, meaning that,
unlike the behavior of asyncRam, given a read address r
at time t, the value v in the RAM at address r is only available at time
t+1.
For us that means we need to change the design of our CPU. Right now, upon a load instruction we generate a read address for the memory, and the value at that read address is immediately available to be put in the register bank. Because we will be using a BlockRAM, the value is delayed until the next cycle. We hence need to also delay the register address to which the memory address is loaded:
cpu2
  :: (Vec 7 Value,Reg)
  -- ^ (Register bank, Load reg addr)
  -> (Value,Instruction)
  -- ^ (Memory output, Current instruction)
  -> ( (Vec 7 Value,Reg)
     , (MemAddr, Maybe (MemAddr,Value), InstrAddr)
     )
cpu2 (regbank,ldRegD) (memOut,instr) = ((regbank',ldRegD'),(rdAddr,(,aluOut) <$> wrAddrM,fromIntegral ipntr))
  where
    -- Current instruction pointer
    ipntr = regbank !! PC
    -- Decoder
    (MachCode {..}) = case instr of
      Compute op rx ry res -> nullCode {inputX=rx,inputY=ry,result=res,aluCode=op}
      Branch cr a          -> nullCode {inputX=cr,jmpM=Just a}
      Jump a               -> nullCode {aluCode=Incr,jmpM=Just a}
      Load a r             -> nullCode {ldReg=r,rdAddr=a}
      Store r a            -> nullCode {inputX=r,wrAddrM=Just a}
      Nop                  -> nullCode
    -- ALU
    regX   = regbank !! inputX
    regY   = regbank !! inputY
    aluOut = alu aluCode regX regY
    -- next instruction
    nextPC = case jmpM of
               Just a | aluOut /= 0 -> ipntr + a
               _                    -> ipntr + 1
    -- update registers
    ldRegD'  = ldReg -- Delay the ldReg by 1 cycle
    regbank' = replace Zero   0
             $ replace PC     nextPC
             $ replace result aluOut
             $ replace ldRegD memOut
             $ regbank
We can now finally instantiate our system with a blockRam:
system3
  :: ( KnownDomain dom
     , KnownNat n )
  => Vec n Instruction
  -> Clock dom
  -> Reset dom
  -> Enable dom
  -> Signal dom Value
system3 instrs clk rst en = memOut
  where
    memOut = blockRam clk en (replicate d32 0) rdAddr dout
    (rdAddr,dout,ipntr) = mealyB clk rst en cpu2 ((replicate d7 0),Zero) (memOut,instr)
    instr  = asyncRom instrs <$> ipntr
We are, however, not done. We will also need to update our program. The reason
being that values that we try to load in our registers won't be loaded into the
register until the next cycle. This is a problem when the next instruction
immediately depended on this memory value. In our case, this was only the case
when the loaded the value 6, which was stored at address 1, into RegB.
Our updated program is thus:
prog2 = -- 0 := 4
       Compute Incr Zero RegA RegA :>
       replicate d3 (Compute Incr RegA Zero RegA) ++
       Store RegA 0 :>
       -- 1 := 6
       Compute Incr Zero RegA RegA :>
       replicate d5 (Compute Incr RegA Zero RegA) ++
       Store RegA 1 :>
       -- A := 4
       Load 0 RegA :>
       -- B := 6
       Load 1 RegB :>
       Nop :> -- Extra NOP
       -- start
       Compute CmpGt RegA RegB RegC :>
       Branch RegC 4 :>
       Compute CmpGt RegB RegA RegC :>
       Branch RegC 4 :>
       Jump 5 :>
       -- (a > b)
       Compute Sub RegA RegB RegA :>
       Jump (-6) :>
       -- (b > a)
       Compute Sub RegB RegA RegB :>
       Jump (-8) :>
       -- end
       Store RegA 2 :>
       Load 2 RegC :>
       Nil
When we simulate our system we see that it works. This time again,
we need to disregard the first sample, because the initial output of a
blockRam is undefined. We use the utility function printX to conveniently
filter out the undefinedness and replace it with the string X.
>>> printX $ sampleN 34 $ system3 prog2 systemClockGen resetGen enableGen [X,0,0,0,0,0,0,4,4,4,4,4,4,4,4,6,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,2]
This concludes the short introduction to using blockRam.
Synopsis
- blockRam :: (KnownDomain dom, HasCallStack, NFDataX a, Enum addr) => Clock dom -> Enable dom -> Vec n a -> Signal dom addr -> Signal dom (Maybe (addr, a)) -> Signal dom a
- blockRamPow2 :: (KnownDomain dom, HasCallStack, NFDataX a, KnownNat n) => Clock dom -> Enable dom -> Vec (2 ^ n) a -> Signal dom (Unsigned n) -> Signal dom (Maybe (Unsigned n, a)) -> Signal dom a
- blockRamU :: forall n dom a r addr. (KnownDomain dom, HasCallStack, NFDataX a, Enum addr, 1 <= n) => Clock dom -> Reset dom -> Enable dom -> ResetStrategy r -> SNat n -> (Index n -> a) -> Signal dom addr -> Signal dom (Maybe (addr, a)) -> Signal dom a
- blockRam1 :: forall n dom a r addr. (KnownDomain dom, HasCallStack, NFDataX a, Enum addr, 1 <= n) => Clock dom -> Reset dom -> Enable dom -> ResetStrategy r -> SNat n -> a -> Signal dom addr -> Signal dom (Maybe (addr, a)) -> Signal dom a
- data ResetStrategy (r :: Bool) where
- readNew :: (KnownDomain dom, NFDataX a, Eq addr) => Clock dom -> Reset dom -> Enable dom -> (Signal dom addr -> Signal dom (Maybe (addr, a)) -> Signal dom a) -> Signal dom addr -> Signal dom (Maybe (addr, a)) -> Signal dom a
- blockRam# :: (KnownDomain dom, HasCallStack, NFDataX a) => Clock dom -> Enable dom -> Vec n a -> Signal dom Int -> Signal dom Bool -> Signal dom Int -> Signal dom a -> Signal dom a
BlockRAM synchronized to the system clock
Arguments
| :: (KnownDomain dom, HasCallStack, NFDataX a, Enum addr) | |
| => Clock dom | 
 | 
| -> Enable dom | Global enable | 
| -> Vec n a | Initial content of the BRAM, also determines the size,  NB: MUST be a constant. | 
| -> Signal dom addr | Read address  | 
| -> Signal dom (Maybe (addr, a)) | (write address  | 
| -> Signal dom a | Value of the  | 
Create a blockRAM with space for n elements
- NB: Read value is delayed by 1 cycle
- NB: Initial output value is undefined
bram40 ::Clockdom ->Enabledom ->Signaldom (Unsigned6) ->Signaldom (Maybe (Unsigned6,Bit)) ->SignaldomBitbram40 clk en =blockRamclk en (replicated40 1)
Additional helpful information:
- See Clash.Explicit.BlockRam for more information on how to use a Block RAM.
- Use the adapter readNewfor obtaining write-before-read semantics like this:readNewclk rst (blockRamclk inits) rd wrM
Arguments
| :: (KnownDomain dom, HasCallStack, NFDataX a, KnownNat n) | |
| => Clock dom | 
 | 
| -> Enable dom | Global enable | 
| -> Vec (2 ^ n) a | Initial content of the BRAM, also
 determines the size,  NB: MUST be a constant. | 
| -> Signal dom (Unsigned n) | Read address  | 
| -> Signal dom (Maybe (Unsigned n, a)) | (Write address  | 
| -> Signal dom a | Value of the  | 
Create a blockRAM with space for 2^n elements
- NB: Read value is delayed by 1 cycle
- NB: Initial output value is undefined
bram32 ::Clockdom ->Enabledom ->Signaldom (Unsigned5) ->Signaldom (Maybe (Unsigned5,Bit)) ->SignaldomBitbram32 clk en =blockRamPow2clk en (replicated32 1)
Additional helpful information:
- See Clash.Prelude.BlockRam for more information on how to use a Block RAM.
- Use the adapter readNewfor obtaining write-before-read semantics like this:readNewclk rst (blockRamPow2clk inits) rd wrM
Arguments
| :: forall n dom a r addr. (KnownDomain dom, HasCallStack, NFDataX a, Enum addr, 1 <= n) | |
| => Clock dom | 
 | 
| -> Reset dom | 
 | 
| -> Enable dom | Global enable | 
| -> ResetStrategy r | Whether to clear BRAM on asserted reset ( | 
| -> SNat n | Number of elements in BRAM | 
| -> (Index n -> a) | If applicable (see first argument), reset BRAM using this function. | 
| -> Signal dom addr | Read address  | 
| -> Signal dom (Maybe (addr, a)) | (write address  | 
| -> Signal dom a | Value of the  | 
Version of blockram that has no default values set. May be cleared to a arbitrary state using a reset function.
Arguments
| :: forall n dom a r addr. (KnownDomain dom, HasCallStack, NFDataX a, Enum addr, 1 <= n) | |
| => Clock dom | 
 | 
| -> Reset dom | 
 | 
| -> Enable dom | Global enable | 
| -> ResetStrategy r | Whether to clear BRAM on asserted reset ( | 
| -> SNat n | Number of elements in BRAM | 
| -> a | Initial content of the BRAM (replicated n times) | 
| -> Signal dom addr | Read address  | 
| -> Signal dom (Maybe (addr, a)) | (write address  | 
| -> Signal dom a | Value of the  | 
Version of blockram that is initialized with the same value on all memory positions.
data ResetStrategy (r :: Bool) where Source #
Constructors
| ClearOnReset :: ResetStrategy 'True | |
| NoClearOnReset :: ResetStrategy 'False | 
Read/Write conflict resolution
Arguments
| :: (KnownDomain dom, NFDataX a, Eq addr) | |
| => Clock dom | |
| -> Reset dom | |
| -> Enable dom | |
| -> (Signal dom addr -> Signal dom (Maybe (addr, a)) -> Signal dom a) | The  | 
| -> Signal dom addr | Read address  | 
| -> Signal dom (Maybe (addr, a)) | (Write address  | 
| -> Signal dom a | Value of the  | 
Create read-after-write blockRAM from a read-before-write one
Internal
Arguments
| :: (KnownDomain dom, HasCallStack, NFDataX a) | |
| => Clock dom | 
 | 
| -> Enable dom | Global enable | 
| -> Vec n a | Initial content of the BRAM, also
 determines the size,  NB: MUST be a constant. | 
| -> Signal dom Int | Read address  | 
| -> Signal dom Bool | Write enable | 
| -> Signal dom Int | Write address  | 
| -> Signal dom a | Value to write (at address  | 
| -> Signal dom a | Value of the  | 
blockRAM primitive