PoC.cache.mem¶
This unit provides a cache (PoC.cache.par2) together with a cache controller which reads / writes cache lines from / to memory. It has two PoC.Mem Interface interfaces:
- one for the “CPU” side (ports with prefix
cpu_
), and - one for the memory side (ports with prefix
mem_
).
Thus, this unit can be placed into an already available memory path between the CPU and the memory (controller). If you want to plugin a cache into a CPU pipeline, see PoC.cache.cpu.
Configuration¶
Parameter | Description |
---|---|
REPLACEMENT_POLICY | Replacement policy of embedded cache. For supported values see PoC.cache_replacement_policy. |
CACHE_LINES | Number of cache lines. |
ASSOCIATIVITY | Associativity of embedded cache. |
CPU_ADDR_BITS | Number of address bits on the CPU side. Each address identifies one memory word as seen from the CPU. Calculated from other parameters as described below. |
CPU_DATA_BITS | Width of the data bus (in bits) on the CPU side. CPU_DATA_BITS must be divisible by 8. |
MEM_ADDR_BITS | Number of address bits on the memory side. Each address identifies one word in the memory. |
MEM_DATA_BITS | Width of a memory word and of a cache line in bits. MEM_DATA_BITS must be divisible by CPU_DATA_BITS. |
OUTSTANDING_REQ | Number of oustanding requests, see notes below. |
If the CPU data-bus width is smaller than the memory data-bus width, then the CPU needs additional address bits to identify one CPU data word inside a memory word. Thus, the CPU address-bus width is calculated from:
CPU_ADDR_BITS=log2ceil(MEM_DATA_BITS/CPU_DATA_BITS)+MEM_ADDR_BITS
The write policy is: write-through, no-write-allocate.
The maximum throughput is one request per clock cycle, except for
OUSTANDING_REQ = 1
.
If OUTSTANDING_REQ
is:
- 1: then 1 request is buffered by a single register. To give a short
critical path (clock-to-output delay) for
cpu_rdy
, the throughput is degraded to one request per 2 clock cycles at maximum. - 2: then 2 requests are buffered by PoC.fifo.glue. This setting has the lowest area requirements without degrading the performance.
- >2: then the requests are buffered by PoC.fifo.cc_got. The number of outstanding requests is rounded up to the next suitable value. This setting is useful in applications with out-of-order execution (of other operations). The CPU requests to the cache are always processed in-order.
Operation¶
Memory accesses are always aligned to a word boundary. Each memory word (and each cache line) consists of MEM_DATA_BITS bits. For example if MEM_DATA_BITS=128:
- memory address 0 selects the bits 0..127 in memory,
- memory address 1 selects the bits 128..256 in memory, and so on.
Cache accesses are always aligned to a CPU word boundary. Each CPU word consists of CPU_DATA_BITS bits. For example if CPU_DATA_BITS=32:
- CPU address 0 selects the bits 0.. 31 in memory word 0,
- CPU address 1 selects the bits 32.. 63 in memory word 0,
- CPU address 2 selects the bits 64.. 95 in memory word 0,
- CPU address 3 selects the bits 96..127 in memory word 0,
- CPU address 4 selects the bits 0.. 31 in memory word 1,
- CPU address 5 selects the bits 32.. 63 in memory word 1, and so on.
A synchronous reset must be applied even on a FPGA.
The interface is documented in detail here.
Warning
If the design is synthesized with Xilinx ISE / XST, then the synthesis option “Keep Hierarchy” must be set to SOFT or TRUE.
Entity Declaration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | entity cache_mem is
generic (
REPLACEMENT_POLICY : string := "LRU";
CACHE_LINES : positive;
ASSOCIATIVITY : positive;
CPU_DATA_BITS : positive;
MEM_ADDR_BITS : positive;
MEM_DATA_BITS : positive;
OUTSTANDING_REQ : positive := 2
);
port (
clk : in std_logic; -- clock
rst : in std_logic; -- reset
-- "CPU" side
cpu_req : in std_logic;
cpu_write : in std_logic;
cpu_addr : in unsigned(log2ceil(MEM_DATA_BITS/CPU_DATA_BITS)+MEM_ADDR_BITS-1 downto 0);
cpu_wdata : in std_logic_vector(CPU_DATA_BITS-1 downto 0);
cpu_wmask : in std_logic_vector(CPU_DATA_BITS/8-1 downto 0) := (others => '0');
cpu_rdy : out std_logic;
cpu_rstb : out std_logic;
cpu_rdata : out std_logic_vector(CPU_DATA_BITS-1 downto 0);
-- Memory side
mem_req : out std_logic;
mem_write : out std_logic;
mem_addr : out unsigned(MEM_ADDR_BITS-1 downto 0);
mem_wdata : out std_logic_vector(MEM_DATA_BITS-1 downto 0);
mem_wmask : out std_logic_vector(MEM_DATA_BITS/8-1 downto 0);
mem_rdy : in std_logic;
mem_rstb : in std_logic;
mem_rdata : in std_logic_vector(MEM_DATA_BITS-1 downto 0)
);
end entity;
|
See also