Concept of Various Probability Distributions in Statistics

#statistics #machinelearning

Manpreet Kaur Nov 27 2020 · 5 min read
Share this

What is the probability distribution in statistic?

Before discussing the concept of a probability distribution, it is important to understand the meaning of statistic, the idea of using a statistic, and the type of statistic.

What is the meaning of statistic and its role in machine learning?

Statistics is the science that deals with methodologies to collect, organize, review, analyze, and draw conclusions from data. It is used in many disciplines like marketing, business, healthcare, telecom, etc.

Types of Data and Scale of Measurement

Population and sample mean and standard deviation

The shape of a distribution of the data depends upon the measure of central tendency (mean, median, and mode) and measure of variability/dispersion (range, variance, and standard deviation).

Standard deviation measures the dispersion of a set of data from its mean and is represented by the square root of the variance (σ).

  •  If data points further from the mean, higher the deviation.
  • Let us discuss different types of probability distribution:

    Probability density distribution plays an important role in various probability distributions. So, it needs to understand the concept of this.

    Probability Density Function

    A Probability Distribution is a mathematical function through which the probability of occurrence of different possible outcomes in an experiment can be calculated.

    In other words, the equation describing a continuous probability distribution is called a probability density function.

    It has some properties such as:

    1.    The graph of probability density function (PDF) Will be continuous over a range

    2.    The area bounded by the curve between (a) and (b) always equal to 1

    Normal Distribution (Gaussian distribution)

    It depends upon the two factors:

  •  Mean: determine the location of the center of the graph
  • Standard deviation: determine the height of the graph
  • The normal distribution is a probability distribution that associates a normal random variable (X) with the cumulative probability.

    The normal distribution is represented by its following features:

  • A graph appears as a symmetric bell shape curve
  • Mean and median are equal; both located at the center of the distribution
  • In a perfect normal distribution, the tails on either side of the curve are exact mirror images of each other.
  • Approximately 68.2% of the data falls within 1 standard deviation of the mean
  • Approximately 95.4% of the data falls within 2 standard deviations of the mean
  • Approximately 99.6% of the data falls within 3 standard deviations of the mean
  • When mean, median, and mode all are the same, then distribution is called symmetric.

    When a distribution is skewed to either left or right, then distribution is called asymmetric

    There are several methods to check the skewness in the dataset e.g., boxplot, kde plot. More skewness means more outliers in the dataset. To handle this problem of skewness, we can use normalization without changing the nature of data (bring down the scale of data set into a specific range), in this manner, the dispersion of dataset would come down.

    Standard Normal Distribution

    Conversion of normal distribution to standard normal distribution (µ = 0, σ = 1) using Z statistic by shifting the entire graph(data) is called standard normal distribution.

    How standardization differ from normalization?

    Normalization means scale down the feature’s dataset between 0 and 1. Example Max Min Scalar.

    Standardization means convert all the values of features into standard normal distribution with mean (µ) = 0, standard deviation (σ) = 1

    Using formula:    

    But one question arises here: Why do we need to convert normal distribution to standard normal distribution?

    So, the answer is that while performing the P-test, F-test, Z-test for sampling distribution, we need to get the value for relative statistical tables like Z-table, P-table, F-table in which all the values have been generated by using a standard normal distribution.

    It is important to do standardization of the dataset to perform all statistical analyses. In this way we will get the result or intuition about the dataset, so we need to convert normal distribution into standard normal distribution. The condition of using the z- table is that we should know the population.

    Student’s T Distribution

    It is symmetrical about zero, bell-shaped, but more spread out than the normal distribution.

    Using T-test, we can compare two samples.

    Conditions for Student T-Test
  • The sample must be randomly selected and continuous
  • Use when the sample size is small
  • Use when population variance or standard variation are not known
  • The observation should be independent of one another
  • The data should not contain outliers
  • Sample size less than 15: 

    Use t-test if the data are close to normal. If the data are non-normal or outliers are present, do not use t-procedures.

    Sample size at least 15:

    T-test can be used except in the presence of outliers or strong skewness

    Large samples: 

    T-test can be used even for skewed distributions when the sample is large (greater than or equal to 30).

    The larger the sample size, the distribution of the sample means tends to normality and the sample standard deviation (s) tends towards population standard deviation (σ)

    As the degree of freedom increases, t - distribution tends towards a standard normal distribution

    Chi Squared Test:

    1.    It tells about how closely distribution of the categorical variable matches an expected distribution (goodness of fit).

    2.    It also checks whether two categorical variables are independent of each other or not (test of independence)

    3.    It is based on the frequencies and independent of parameters like mean and standard deviation.

    Goodness of Fit


    Binomial Distribution

    The binomial distribution is a kind of probability density function. It is used when there is more than one outcome of a certain experiment, for example, tossing a coin gives two outcomes. These outcomes are labeled as “head” and “tail.”

  • Each trial is independent.
  • The probability of both outcomes is the same for all the trials.
  • Source:

    Bernoulli distribution

    It is a type of Discrete Probability distribution. It considers random experiment will have only two outcomes, 1 ("success") and 0 ("failure") with complementary probabilities p and 1−p respectively

    For example, getting the probability of head from tossing of a coin in a single trial either “0” (success) or “1” (failure).

    P(Success) = p

    P(Failure)= 1-p

    Let, X=1 when Success and X=0 when failure,

    Then the probability distribution function is given as:

    Poisson Distribution

    It is used to find out the probability of several events in a certain period.

  • Events can occur independently and dependent on an interval (certain range)
  • An event can occur any number of times
  • The rate of occurrence is constant (the rate does not change based on time)
  • Source:

    Uniform Distribution

  • It is the continuous probability distribution function.
  • In continuous probability distribution, there are an infinite number of real values that could exist in an interval.
  • Conditions
  • The area under the curve must be equal to 1
  • The length of the interval determines the height of the curve, 1/(b−a).
  • * * *

    Looking forward to valuable suggestions from all of you.

    Thank you for reading.

    Happy learning !!!


    Read next