It bears repeating: Recurrent neural networks are designed to interpret temporal or sequential information. These networks use other data points in a sequence to make better predictions. They do this by taking in input and reusing the activations of previous nodes or later nodes in the sequence to influence the output.

- h(t) is updating the cell sttate depending on the input Xt and the old state ht-1.
- States and input can be in the form of a weight matrix and the output vector will be achieved by applying a non-linearity to the updatted hidden state
- The model should handele variable-length sequences and track long-term dependencies
Data is “sequential”

In this example the predicted word at a time step will be used as a hidden state in the next time step and the next word in the sentence will be the predicted and generated word
- Every prediction made will be computed to a loss by comparing the predicted data and the labeled data.
- Parameters will be shared across the sequence, the set of weights should be applied in other time steps of the sequence.

Sequence Modeling sequence
Predict the last word of a sentence : “This morning I took my cat for a walk “
representing the information :
- Neural network is a set of mathematical operations, It doesn’t have an understanding of what a word/ sentence.
- We need to translate the data into an array of numerical inputs that can be fed in and can generate an array of number as an output.
Embedding