# LSTM Architecture Ashit Debdas Sept 20 2020 · 2 min read

A standard LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.

Architecture of LSTM

1.      Forget Gate

2.      Input Gate

3.      Output Gate

1. Forget Gate

The first level in our LSTM is to decide what information we’re going to throw away from the cell state. This forget gate is responsible for removing information that is no longer required for the LSTM to understand things or the information that is of less importance is removed via multiplication of a filter. This is required for optimizing the performance of the LSTM network.

This the gate takes in two inputs; h_t−1 and x_t

h_t-1 is the output of the previous cell and x_t is the input at that particular time step. The given inputs are multiplied by the weight matrices and a bias is added. Following this, the sigmoid function is applied to this value. The sigmoid function outputs a vector number between 0 and 1

2. Input Gate

The following action is to decide what new information we’re going to store in the cell state.

1.      Regulating what values need to be added to the cell state by involving a sigmoid function. This is basically very similar to the forget gate furthermore acts as a filter for all the information from h_t-1 and x_t.

2.      Creating a vector comprising all possible values that can be added (as perceived from h_t-1 and x_t) to the cell state. This is done using the tanh function, which outputs values from -1 to +1.

3.      Multiplying the value of the regulatory filter (the sigmoid gate) to the created vector (the tanh function) and then adding this useful information to the cell state via addition operation.

Update The Old Cell

It’s now time to update the old cell state, C_t−1, into the new cell state Ct. The previous steps already decided what to do, we just need to truly do it.

We multiply the old state by f_t, forgetting the things we decided to forget earlier. Then we add i_t∗C~t. This is the new candidate values, scaled by how much we decided to update each state value.

In the case of the language model, this is where we’d actually drop the information about the old subject’s gender and add the new information, as we decided in the previous steps.

3. Output Gate