본문 바로가기

Deep Learning

Very simple RNN example

반응형

 

1. Notation

 

I_{t} : input vector to RNN at time t

S_{t} : vector that represents RNN state at time t (also called hidden state)

O_{t} : output vector of RNN at time t

W_{1}, b_{1} : weight matrix and bias vector to be trained from data, which determine the next state vector of RNN given the current input and previous state vectors.

W_{2}, b_{2} : weight matrix and bias vector to be trained from data, which determine the current output of RNN given the current state

 

2. Explanation

 

Eqn. 1 is for the state update and Eqn. 2 is for the output. The size of the hidden state depends on your decision.


I miss activation function for Eqn. 1 and Eqn. 2. For Eqn. 1, sigmoid function is used. For Eqn. 2, it depends on your application.

 

3. Example

 

Toy examples can be found at https://medium.com/@erikhallstrm/hello-world-rnn-83cd7105b767



* Good question and answer from stackoverflow [https://stackoverflow.com/questions/40384791/for-the-tf-nn-rnn-cell-basicrnn-whats-the-difference-between-the-state-and-outp]


Q) for the tf.nn.rnn_cell.BasicRNN,what's the difference between the state and output?


as I know state = tanh(w * input + u * pre_state + b) output = state*w_out but for the tf.nn.rnn_cell.BasicRNN , I just get the unit_num (I think it's the dim of state) and at the api web page,Most basic RNN: output = new_state = activation(W * input + U * state + B so can I think in this function state = output? and the function just has w,u,b,but no w_out?


A) What "vanilla" RNN that you describe does is it computes the new hidden state, and then uses some output projection to compute the output. In tensorflow they separated that "compute new hidden state" and "compute output projection" parts. The BasicRNN just outputs the hidden state as its output, another class called OutputProjectionWrapper can then apply a projection to it (and multiplying by w_out is just applying a projection). To get the behavior you want, you need to do:


tf.nn.rnn_cell.OutputProjectionWrapper(tf.nn.rnn_cell.BasicRNNCell(...), num_output_units)

It also allows you to have different number of neurons in your hidden state and in your output projection.