*What
is the probability distribution in statistic? *

Before discussing the concept of a probability distribution, it is important to understand the meaning of statistic, the idea of using a statistic, and the type of statistic.

*What
is the meaning of statistic and its role in machine learning? *

Statistics is the science that deals with methodologies to collect, organize, review, analyze, and draw conclusions from data. It is used in many disciplines like marketing, business, healthcare, telecom, etc.

**Types
of Data and Scale of Measurement**

The shape of a distribution of the data depends upon the measure of central tendency (mean, median, and mode) and measure of variability/dispersion (range, variance, and standard deviation).

Standard deviation measures the dispersion of a set of data from its mean and is represented by the square root of the variance (σ).

**Let
us discuss different types of probability distribution:**

Probability density distribution plays an important role in various probability distributions. So, it needs to understand the concept of this.

#### Probability Density Function

A Probability Distribution is a mathematical function through which the probability of occurrence of different possible outcomes in an experiment can be calculated.

In other words, the equation describing a continuous probability distribution is called a probability density function.

**It
has some properties such as:**

1. The graph of probability density function (PDF) Will be continuous over a range

2. The area bounded by the curve between (a) and (b) always equal to 1

### Normal Distribution (Gaussian distribution)

It depends upon the two factors:

The normal distribution is a probability distribution that associates a normal random variable (X) with the cumulative probability.

The normal distribution is represented by its following features:

When
mean, median, and mode all are the same, then distribution is called **symmetric**.

When a
distribution is skewed to either left or right, then distribution is called **asymmetric**

There are several methods to check the skewness in the dataset e.g., boxplot, kde plot. More skewness means more outliers in the dataset. To handle this problem of skewness, we can use normalization without changing the nature of data (bring down the scale of data set into a specific range), in this manner, the dispersion of dataset would come down.

#### Standard Normal Distribution

Conversion of normal distribution to standard normal distribution (µ = 0, σ = 1) using Z statistic by shifting the entire graph(data) is called standard normal distribution.

**How
standardization differ from normalization?**

Normalization means scale down the feature’s dataset between 0 and 1. Example Max Min Scalar.

Standardization means convert all the values of features into standard normal distribution with mean (µ) = 0, standard deviation (σ) = 1

Using formula:

**But one question arises here: Why do we need to convert normal
distribution to standard normal distribution?**

So, the answer is that while performing the P-test, F-test, Z-test for sampling distribution, we need to get the value for relative statistical tables like Z-table, P-table, F-table in which all the values have been generated by using a standard normal distribution.

It is important to do **standardization **of the dataset to perform
all statistical analyses. In this way we will get the result or intuition about
the dataset, so we need to convert normal distribution into standard normal
distribution. The condition of using the z- table is that we should know the
population.

### Student’s T Distribution

It is symmetrical about zero, bell-shaped, but more spread out than the normal distribution.

Using T-test, we can compare two samples.

##### Conditions for Student T-Test

**Sample size less than 15:**

Use t-test if the data are close to normal. If the data are non-normal or outliers are present, do not use t-procedures.

**Sample size at least 15**:

T-test can be used except in the presence of outliers or strong skewness

**Large samples:**

T-test can be used even for skewed distributions when the sample is large (greater than or equal to 30).

The larger the sample size, the distribution of the sample means tends to normality and the sample standard deviation (s) tends towards population standard deviation (σ)

As the degree of freedom increases, t - distribution tends towards a standard normal distribution

### Chi Squared Test:

1. It tells about how closely distribution of the categorical variable matches an expected distribution (goodness of fit).

2. It also checks whether two categorical variables are independent of each other or not (test of independence)

3. It is based on the frequencies and independent of parameters like mean and standard deviation.

__Goodness
of Fit__

### Binomial Distribution

The binomial distribution is a kind of probability density function. It is used when there is more than one outcome of a certain experiment, for example, tossing a coin gives two outcomes. These outcomes are labeled as “head” and “tail.”

##### Characteristic:

### Bernoulli distribution

It is a type of Discrete Probability distribution. It considers random experiment will have only two outcomes, 1 ("success") and 0 ("failure") with complementary probabilities p and 1−p respectively

For example, getting the probability of head from tossing of a coin in a single trial either “0” (success) or “1” (failure).

P(Success) = p

P(Failure)= 1-p

Let, X=1 when Success and X=0 when failure,

**Then
the probability distribution function is given as:**

### Poisson Distribution

It is used to find out the probability of several events in a certain period.

##### Characteristic

### Uniform Distribution

##### Conditions

*** * ***

Looking forward to valuable suggestions from all of you.

Thank you for reading.

Happy learning !!!

### References

https://www.ineuron.ai/

https://www.khanacademy.org/math/ap-statistics/density-curves-normal-distribution-ap/stats-normal-distributions/a/basic-normal-calculations