1. Notation
I_{t} : input vector to RNN at time t
S_{t} : vector that represents RNN state at time t (also called hidden state)
O_{t} : output vector of RNN at time t
W_{1}, b_{1} : weight matrix and bias vector to be trained from data, which determine the next state vector of RNN given the current input and previous state vectors.
W_{2}, b_{2} : weight matrix and bias vector to be trained from data, which determine the current output of RNN given the current state
2. Explanation
Eqn. 1 is for the state update and Eqn. 2 is for the output. The size of the hidden state depends on your decision.
I miss activation function for Eqn. 1 and Eqn. 2. For Eqn. 1, sigmoid function is used. For Eqn. 2, it depends on your application.
3. Example
Toy examples can be found at https://medium.com/@erikhallstrm/hello-world-rnn-83cd7105b767
* Good question and answer from stackoverflow [https://stackoverflow.com/questions/40384791/for-the-tf-nn-rnn-cell-basicrnn-whats-the-difference-between-the-state-and-outp]
Q) for the tf.nn.rnn_cell.BasicRNN,what's the difference between the state and output?
as I know state = tanh(w * input + u * pre_state + b) output = state*w_out but for the tf.nn.rnn_cell.BasicRNN , I just get the unit_num (I think it's the dim of state) and at the api web page,Most basic RNN: output = new_state = activation(W * input + U * state + B so can I think in this function state = output? and the function just has w,u,b,but no w_out?
A) What "vanilla" RNN that you describe does is it computes the new hidden state, and then uses some output projection to compute the output. In tensorflow they separated that "compute new hidden state" and "compute output projection" parts. The BasicRNN
just outputs the hidden state as its output, another class called OutputProjectionWrapper
can then apply a projection to it (and multiplying by w_out
is just applying a projection). To get the behavior you want, you need to do:
tf.nn.rnn_cell.OutputProjectionWrapper(tf.nn.rnn_cell.BasicRNNCell(...), num_output_units)
It also allows you to have different number of neurons in your hidden state and in your output projection.
'Deep Learning' 카테고리의 다른 글
[tensorflow] install tensorflow on ubuntu (0) | 2018.07.05 |
---|---|
Very simple LSTM example (0) | 2018.03.06 |
[KERAS] how to install keras with tensorflow+anaconda+pycharm on windows10 (0) | 2018.01.29 |
Experimental results of "Driving experience sharing method for end-to-end control of self-driving cars" (0) | 2018.01.19 |
[tensorflow] training specific variables (0) | 2017.11.14 |