RNN (2) - Reference model

RNN has been increasingly used in recent years, and it is being used in various ways, such as language translation, automatic sentence completion, card fraud prevention, and prediction of time series data such as stocks/weather. Here, we will use a model that predicts time series data. Before RNN was used, these data were predicted using a statistical method called ARIMA. Using a kind of filter technology, relatively near future were predicted, and with the advent of RNNs, more accurate predictions were made possible by learning Big Data. In particular, in the case of periodicity, it can be predicted with ARIMA because it has a temporal autocorrelation structure. 


Anyway, we'll implement this model for a relatively simple and easy-to-check example. Of course, there is a way to build this model using CNN, but it does not show optimal performance in terms of performance because it has no choice but to receive the input as a fixed. If there is an opportunity in the future, I will try to implement an example that can predict future results more distant than current data using LSTM or GRU. RNN is illustrated in the figure below. It is similar to a normal DNN, the only difference is that the output of the hidden layer is fed back to the input when the next input proceeds. It is easier to understand when unfolded like the picture on the right, but this picture may be misunderstood as the concept of adding a new node to the network sideways. What should not be confused here is that, basically, hardware only needs 1 set on the left, and you can think of it as being repeatedly driven as long as the length of the input data. All you need to do is to save the output that reflects the weight and bias of the hidden layer with water, and then reflect it as an input during the next operation. Then, optimization, compilation, and loss function calculation to find the optimal parameter based on this network may be performed in the same manner. As mentioned earlier, since only inference was performed with hardware, please refer to the code in the link below for the process of finding the optimal parameter. 


The operation performed in Inference in the left block can be simply expressed as the following expression. (The bias term is omitted from the figure.)

            Ht = tanh(WihXi + bih + WhhHt-1 + bhh),   Yout = HhoHt + bho

(1) Pytorch model: This model was modified by keeping Copy right from Udemy's course. There are many very good courses, some that explain Tensor flow well, and there are many courses that explain the basics of ML well, so I think it will be helpful to listen to them. In the course below, I am building a model using Pytorch. Personally, I think it is easier to understand the HW structure of Pytorch's language structure. In particular, the good thing about this course was that it was very helpful in emphasizing that ML is not a universal cheat that can handle anything, and that it can produce sufficiently predictable results by thoroughly combining mathematical operations. (All source codes are modified from original that was published on a tutorial lecture. * Teacher: Mike X Cohen, sincxpress.com, Course url: udemy.com/course/dudl/?couponCode=202112. The code modification follows author’s copy right policy)

1) Input 

Synthetic input is used. What makes it easier than actual data is that it can be verified in various ways in the future. In particular, it is possible to synthesize data with a temporal autocorrelation characteristic according to the RNN model as a wave with periodicity. With Pytorch, it can be implemented in the following way, and you get a sample as shown in the figure below.

N = 500

time = torch.linspace(0,30*np.pi,N)

data = torch.sin(time+torch.cos(time))


 

2) RNN Model

# Network Parameter Specification

-          input_size =  1            # "channels" of data
-          num_hidden =  9         # breadth of model (number of units in hidden layers)
-          num_layers =  1           # depth of model (number of "stacks" of hidden layers)
-          seqlength  = 30           # number of datapoints used for learning in each segment
-          batchsize  =  1
If the network is drawn according to the above spec, it is as follows. And the size of the parameter is as follows.
-          Wih = 1x9, bih = 1x9, Whh = 9x9, bhh = 9x9, Who = 1x9, bho = 1x1

As with other DNNs, one line at each node has one weight and one bias. The 31st data is predicted immediately after all 30 inputs have passed through the network. In other words, the 30th data is predicted by inputting the 0~29th data, and the 31st data is predicted after inputting the 1st~30th data. Therefore, it can be considered that the y values ​​up to the first 29th are discarded, and the normal prediction values ​​are output from the 30th. Here, for the optimization, MSE was used for the loss function and zero-grad was used for the optimizer. Accuracy of the model is over 99%. The figure below shows the convergence result of Loss on the left, and the predicted result on the right. The first 29 output values ​​are discarded because they are not exact values.




Comments

Popular posts from this blog

RNN (5) - FPGA System Design

RNN (3) - System Simulator

RNN (4) - RTL Design