**Pre-requisite to understand Neural Machine Translation**

1.) Artificial Neural Network Basics

2.) RNN - LSTM or RNN - GRU

3.) Encoder Decoder

**Problems with Encoder Decoder**

1.) We need to bring down sentences of source Input into fixed length vector where This may lead to difficulty in handling long sentences resulting in loosing Information or We can also says that loosing context information instead we having Bi-directional RNN-LSTM cells.

2.) If We have test sentences longer than training corpus that basically reducing the performance as length of input sentences increases.

3.) Context of all Encoder RNN cells is consolidated and Output of Last Cell of Encoder have feeded to Decoder's first RNN Cells therefore Model not clearly able to understand which are the most significant words to predict target word.

**To Overcome this Issue, **An Attention cames into picture where we are trying to find out which of Input words of given sentence are responsible for generating each target output word.

**Idea to solve this issue,** Researchers came up with an Idea in 2016, to add a ** Neural Network between Encoder and Decoder** which is responsible for

**or we can say focused words of Input Sentence to predict target output to achieve the concept of**

*finding out the most significant words**Attention or Self-Attention*

**Generalized Architecture**

**Generalized Representation of how Attention has played role in Hindi to English Translation:**

** Deep Dive into Encoder Decoder with Attention Neural Network**

In this figure , We can see that their are 4 Inputs : { X1,X2,X3,X4 } is feeded to Bi-Directional LSTM Cells which is responsible for generating 4 Outputs : {O1,O2,O3,O4}.

Decoder RNN cells **s **i-*th is computed as :*

where ,

**s**of

**i - 1**

*th*is the context of previous Decoder's RNN cell

**y**of

**i - 1**

*th*is the output of previous Decoder's RNN cell

**c**of

**i**is the context vector which is produced from feedforward neural network.

Those outputs which are generated from Encoder Cells is the Input for the intermediate neural network with trainable parameter is alpha value denoted as { α }.So context vector denoted as { C } which is computed as weighted sum of Outputs of Encoder cells with Trainable alpha value .This equation depicts like basic neural network equation

where ,

**C**is the context vector

**α**is the trainable weights

**O**is the outputs of Encoder Bi-Directional RNN LSTM cells

**Concept to be Cleared :**

In the figure 3. We can see that for getting context vector C2.The output of X2 denoted as O2 and its alpha weights gets nullified because Output O2 is not playing any significant role only the ouput {O1,O2,O3} has attention.Like this, same as happened with context vector C3 where only {O1 and O2} is responsible or have a focus for producing Context Vector C3

To compute the value of **α **is : exponent of each e divided by submission of exponent every e value

To compute the value of e is : This function is denoted as Attention Function which is responsible for finding out the most significant words to predict next target word

where,

**s**

*i-1*is the output of previous RNN cell of decoder

**h**

*j-th*is the particular RNN cell of Encoder

**Concept to be Cleared : **Researcher parameterized α as feedforward neural network which is jointly trained with all the components of proposed system which allows the gradient of the cost function to be backpropagated through and this gradient can be used to train the value of α (when we compute value of α it is termed as Alignment Model) as well as whole translation model jointly, therefore it is termed as **Jointly** - **Alignment Translation Model.**

### Thank You for reading !

Feel Free to add your Feedback and Connect with me on LinkedIn

https://www.linkedin.com/in/sonihimanshu1/

**Happy Deep Learning !**

**References :**

Neural Machine Translation by Jointly Learning to Align and Translate

https://arxiv.org/abs/1409.0473

**Special Thanks to**

Sudhanshu Kumar Sir.

Chief AI Engineer and CEO of iNeuron.ai