Classification Evaluation Metrics Part-1

##datascience ##machinelearning ##classification ##metrics ##python

Priyabrata Panda Oct 06 2020 · 3 min read
Share this

Introduction

         Most of the machine learning problems are classified  into two types Supervised and Unsupervised learning. In case of Supervised learning there are two types of problems we mostly encounter Regression and Classification. In case of Regression dependent variable is a continuous  one. But in case of Classification our dependent variable is a categorical (class) one. In Classification there are many type of classification such as Binary classification, Multiclass classification and Multilabel Classification. When we train a classifier on a particular dataset we need an evaluation metrics in order to evaluate how good our model performing on that particular dataset. As there are lots of evaluation metrics present it becomes difficult to choose the write one in these series of blogs , I am going cover each and every evaluation metrics in detail

What  are we going to learn?

  • Confusion Matrix
  • Accuracy Score
  • Precision Score
  • Recall Score
  • Confusion Matrix

       Confusion Matrix is most popular ,simple but effective evaluation metric in classification problem. To Understand the confusion matrix let's consider a simple binary classification problem.

    y_true=[1,0,1,0,0,0,1,1,0,0,1,1,0,1,0,1,1,0,1,1] #Our actual value
    y_pred=[1,0,0,1,0,0,1,1,0,0,1,1,0,0,1,1,1,1,0,0] #Predicted value

    In order to get confusion matrix we will use confusion_matrix function of sklearn.metrics module

    from sklearn.metrics import confusion_matrix
    print(confusion_matrix(y_true,y_pred))
    Not able to understand the result !! :(

    Let's understand the matrix bit by bit

    Here we go

    The columns of confusion represent the predicted value and row represents actual value. The top left represents True Negative T.N means our model able to classify negative class  as negative class. The right bottom represents True Positive (T.P) means this many instances are correctly classified as Positive

    The Top Right cell represents False Positive (F.P) means the instances are belong to -ve class but our model predicts these as Positive. It's also know as Type-I error

    The Left Bottom represents False Negative (F.N) means the instances are positive  but our model predicts these as negative. It's is also known as Type-II error

    Let's see the confusion matrix for multiclass classification

    y_true=["cat","dog","cow","cat","cow","dog","cow","cat","dog","cat","cow","dog","cat","dog","cat","cow"]
    y_pred=["cat","cow","cow","dog","cow","cat","cow","dog","dog","cow","dog","dog","cow","dog","cow","cow"]
    confusion_matrix(y_true,y_pred,labels=["cat","cow","dog"])

    In much more details 

    Accuracy Score

    It is the ratio of total number of correct prediction to total number of prediction

    y_true=[1,0,1,0,0,0,1,1,0,0,1,1,0,1,0,1,1,0,1,1] #Our actual value
    y_pred=[1,0,0,1,0,0,1,1,0,0,1,1,0,0,1,1,1,1,0,0] #Predicted value
    accuracy_score(y_true,y_pred)
    Ans.0.65

    Alert!! Sometimes Accuracy score may be deceptive (in case of imbalanced dataset)  to know more about it check the blog How to handle Imbalanced Dataset

    Precision Score

    It is the accuracy of positive prediction, means the ability of the classifier not to classify a negative sample as positive

    y_true=[1,0,1,0,0,0,1,1,0,0,1,1,0,1,0,1,1,0,1,1] #Our actual value
    y_pred=[1,0,0,1,0,0,1,1,0,0,1,1,0,0,1,1,1,1,0,0] #Predicted value
    precision_score(y_true,y_pred)
    Ans.0.7

    Use Cases: This is important  in case movies classification either children can watch it or not. because you don't want to recommend a adult movie to a child .So you want to minimise false positive as much as possible

    As far we discussed only about precision score of binary classification .In case of Multi class classification things become little tricky. There are  three types of precision score 1.Macro, 2.Micro,3.Weighted

    Macro precision score: First it calculates  precision of each class and then it evaluates the unweighted mean (A.M) of these precision score

    y_true=["cat","dog","cow","cat","cow","dog","cow","cat","dog","cat","cow","dog","cat","dog"]
    y_pred=["cat","cow","dog","dog","cow","cat","cow","dog","dog","cow","dog","dog","cow","dog"]
    precision_score(y_true,y_pred,average="macro")
    Ans.0.4428571428571428

    Weighted precision score: First it calculates precision of each class then it evaluates weighted mean of these precision scores (sum(score*no.intsances)/total no of instances)

    y_true=["cat","dog","cow","cat","cow","dog","cow","cat","dog","cat","cow","dog","cat","dog"]
    y_pred=["cat","cow","dog","dog","cow","cat","cow","dog","dog","cow","dog","dog","cow","dog"]
    precision_score(y_true,y_pred,average="weighted")
    #Ans.0.44591836734693874

    Recall Score

    It is the ability of a classifier to classify positive instance as positive, mathematically represented as

    Use Cases: It is used in diseases detection because you don't want to predict the person of not having diseases but actually the person having diseases. Basically you want to reduce False negative instances

    There are also three types of recall score as like precision in case of Multiclass classification

    y_true=["cat","dog","cow","cat","cow","dog","cow","cat","dog","cat","cow","dog","cat","dog"]
    y_pred=["cat","cow","dog","dog","cow","cat","cow","dog","dog","cow","dog","dog","cow","dog"]
    print("The weighted recall score is ",recall_score(y_true,y_pred,average="weighted"))
    print("The macro recall score is ",recall_score(y_true,y_pred,average="macro"))
    print("The micro recall score is ",recall_score(y_true,y_pred,average="micro"))
    Comments
    Read next