Training a Neural Network with Backpropagation — Mathematics (Part II)

#ann #backpropagation

Harshita Pandey Jan 07 2022 · 3 min read
Share this

The backpropagation algorithm has two main phases- forward and backward phase. The structure of a simple three-layer neural network is shown in Figure 1. Here, every neuron of one layer is connected to all neurons of the next layer but neurons of the same layer are not interconnected. The information flows from the first layer neurons (input layer), via the second layer neurons (hidden layer) to the third layer neurons (output layer).

Figure 1. Artificial Neural Network.

Let’s consider, the inputs, outputs, the initial weights and biases as:

Forward Pass

The input layer receives signals and without performing any computation simply transmits the information to the hidden layer. The net input to a neuron of the hidden layer is calculated as the summation of each output of the input layer multiplied by weights (weights are initialized as small random numbers) and an additional bias is incorporated. Then sigmoid activation function is applied to learn complex patterns in the data and to normalize the output of each neuron to a range between 1 and 0. In each successive layer, every neuron sums its inputs and then applies an activation function to compute its output. The output layer of the network then produces the final response, i.e., the predicted value.

Total Error Calculation

Now, we need to calculate the total error using the mean squared error loss function. Loss function describes how efficient the model performs with respect to the expected outcome. 

Consider, loss function= “mean squared error”

The derivative of this loss now needs to be computed with respect to the weights and bias in all layers in the backward phase.

First Backward Pass

The main goal of the backward phase is to learn the gradient of the loss function with respect to the different weights and bias by using the chain rule of differential calculus. These gradients are used to update the weights and bias. Since these gradients are learned in the backward direction, starting from the output node, this learning process is referred to as the backward propagation.

Bias constant (usually 1) has its own weight for different nodes. The weight of the bias in a layer is updated in the same fashion as all the other weights are updated.

Next, we will continue the backwards pass to update the values of w1, w2, w3, w4 and b1, b2. The gradient with respect to these weights and bias depends on w5 and w8, and we will be using the old values, not the updated ones.

Updated Weights and Bias,

Weights Bias
w1(new) = 0.149993053 b1(new) = 0.399861062
w2(new) = 0.200002605 b2(new) = 0.350052104
w3(new) = 0.249986106 b3(new) = 0.248544627
w4(new) = 0.30000521 b4(new) = 0.60039353434
w5(new) = 0.399117359
w6(new) = 0.450238667
w7(new) = 0.498139526
w8(new) = 0.550234657

Forward Pass with Updated Weights and Bias

Error Calculation

After the first round of backpropagation, the total error has decreased to 0.2539 (approximately).

Further, the calculated error value was also validated by building and training an artificial neural network.

Building an Artificial Neural Network

Import Libraries

import pandas as pd
import numpy as np 

Create the Dataframe

Input values: x1 = 0.05, x2 = 0.10 and Output values: y1 = 0.01, y2 = 0.99

df=pd.DataFrame([[0.05, 0.1, 0.01, 0.99]], columns=['x1', 'x2', 'y1', 'y2'])
x1 x2 y1 y2
0.05 0.10 0.01 0.99
target=df.iloc[:, 2:]
print('Actual output:',target)

inputs=df.iloc[:, :2]
print('Input values:', inputs)

Actual output: [[0.01 0.99]]

Input values: [[0.05 0.1 ]]

The network

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

weights and bias initialization

Initial weights: w1=0.15, w2 =0.20, w3 =0.25, w4 =0.30, w5 =0.40, w6 =0.45, w7 =0.50, w8 =0.55 Initial bias: b1=0.40, b2=0.35, b3=0.25, b4=0.60

model = Sequential()
model.add(Dense(units = 2, activation = 'sigmoid', use_bias=True, bias_initializer="ones", weights =[np.array([[0.15, 0.20], [0.25, 0.30]]), np.array([0.40, 0.35])]))
model.add(Dense(units =2, activation = 'sigmoid', use_bias=True, bias_initializer="ones", weights =[np.array([[0.40, 0.45], [0.50, 0.55]]), np.array([0.25, 0.60])]))

model.compile(optimizer = 'SGD', loss ='mean_squared_error')

Fit the model

classifier =, target, epochs=10)

Summary of the network


Save the model"model.h5")

Updated Weigths (w1, w2, w3 & w4) and Bias (b1 & b2)

Updated Weigths (w5, w6, w7 & w8) and Bias (b3 & b4)

Hence, the correctness of the performed manual calculation is validated.



Read next