I have to buffer some data in a quite big buffer. It is not a usual shift register or a FIFO, because I will have to be able to read data also from the middle of the buffer. I managed to implement that in a way so I can use it as I need it. The problem is, that it does make use of LUTs for that, which takes a lot of space in my design. I would like to change my design so, that the buffer gets inferred as Block RAM. Using ram_style "block" didn't help. Any ideas or suggestions how I could achieve that? Update: buf_size is declared in a package: constant buf_size : natural := 5;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity deriv_buffer is
generic(
NSAMPLES : natural := 16
);
port(
clk : in std_logic;
rst : in std_logic;
deriv_s : in t_deriv_array( NSAMPLES - 1 downto 0 );
deriv_buf : out t_deriv_array( buf_size * NSAMPLES - 1 downto 0 )
);
end deriv_buffer;
architecture Behavioral of deriv_buffer is
signal deriv_buf_s : t_deriv_array( (buf_size-1) * NSAMPLES - 1 downto 0 );
attribute ram_style : string;
attribute ram_style of deriv_buf_s : signal is "block";
begin
deriv_buf( buf_size * NSAMPLES - 1 downto (buf_size - 1) * NSAMPLES ) <= deriv_s;
buffer_p : process( rst, clk )
begin
if rst = '1' then
deriv_buf_s <= ( others => ( others => '0' ) );
elsif rising_edge( clk ) then
deriv_buf_s( (buf_size - 1) * NSAMPLES - 1 downto (buf_size - 2) * NSAMPLES ) <= deriv_s;
deriv_buf_s( (buf_size - 2) * NSAMPLES - 1 downto (buf_size - 3) * NSAMPLES ) <= deriv_buf_s( (buf_size - 1) * NSAMPLES - 1 downto (buf_size - 2) * NSAMPLES );
deriv_buf_s( (buf_size - 3) * NSAMPLES - 1 downto (buf_size - 4) * NSAMPLES ) <= deriv_buf_s( (buf_size - 2) * NSAMPLES - 1 downto (buf_size - 3) * NSAMPLES );
deriv_buf_s( (buf_size - 4) * NSAMPLES - 1 downto (buf_size - 5) * NSAMPLES ) <= deriv_buf_s( (buf_size - 3) * NSAMPLES - 1 downto (buf_size - 4) * NSAMPLES );
end if;
end process buffer_p;
deriv_buf( (buf_size-1)*NSAMPLES - 1 downto 0 ) <= deriv_buf_s;
end Behavioral;
If you want you use a block RAM, you need to consider that a block RAM only has 2 ports. You cannot look freely into the data in the RAM: you need to access it through either port.
Furthermore, reading and/or writing takes a clock cycle to process.
So if we look at your code, it already starts out problematically:
You have your whole RAM connected to an output port! I don't know what you are doing with the contents in the entity using this component, but as I said: you don't have free access to the contents of a block RAM. You need to follow proper block RAM design guidelines.
Refer to the Xilinx Synthesis User Guide for instance for proper block RAM instantiation. (Chapter 4 HDL Coding Techniques, section RAM HDL Coding Techniques)
Next problem: reset
Resetting a RAM is not possible. If you really want to clear the RAM, you need to write a
(others=>'0')
to each separate address location. Thus you need control logic to do so. But now, using this reset code will not allow a block RAM to be instantiated.Then in your code you have the part
This code has two big issues:
You could implement the code to use 4 block RAM instances. But then still all the ports of these block RAMs would be occupied. So no port would be left to provide random access to all the data in the RAM, like you wish.
Conclusively: I think you should reconsider your requirement. What you want is not possible in block-RAM. If you want to use block RAM, you should change your algorithm.