RNN (4) - RTL Design

- February 24, 2022

I designed the RTL in the same way as the System Simulator designed in the previous post. The major considerations for RTL design are as follows. (The detailed code description will be posted in a separate post later. Also, there are many good tutorials on how to use Intel Quartus Modelsim, so you'd better search for it. I will post the explanation of simulating this code in a separate post.)

- I/O method: The most common method is to save the input in the memory, drive the RNN, save the predicted value to output memory, and then send the completion signal. At this time, the memory can be selected in various ways. Various methods such as FIFO, Single Port, Dual Port, etc. can be used. Here, Single Port Memory is selected as the simplest.

- Timing: RTL stands for Register Transfer Level. I don't know who put it on, but it seems to have been put together relatively well. Register is a component whose output is changed at the edge of the clock. Since there is some logic between Register and Register, it is a concept that the result is updated only at the moment of the edge of the clock no matter what change occurs within one clock. Operations that take a lot of time to get the final result, such as Matrix and Vector Multiplication/Addition like RNN, cannot be completed within one clock in many cases. This corresponds to the case where timing closure does not occur when synthesizing. It is possible to use the method of inserting registers in the middle after designing once, but rather than that, if you control the logic by slicing the logic in advance with a state machine, etc., it will help to cope with the timing problem in the future.

- Testbench: In case of using standardized IP or to increase function coverage, verification methodologies such as UVM and OVM are used, but relatively intuitive things such as matrix operation can be sufficiently verified with the existing Verilog Testbench. So, here, instead of using a separate verification method, test vectors directly generated from the system simulator are used.

In Verilog, the basic block unit consists of modules, and the structure of the entire code can be grasped with the block diagram below. If you look at the block below and analyze the RTL code, it seems to be helpful in understanding the overall operation.

In the testbench file, the input generated by the System Simulator is fed sequentially, the final yhat prediction is saved in output_sram, and the value is read while increasing the address and checked as the Modelsim simulation result.

source: https://github.com/bxk218/RNN_wave_predictor_verilog/tree/main/rtl

Search This Blog

How to make HW accelerators for Machine Learning

RNN (4) - RTL Design

Comments

Post a Comment

Popular posts from this blog

RNN (5) - FPGA System Design

RNN (3) - System Simulator