What is a Maximum Likelihood Estimate?

If you are a data science student, you must have used this term many times. In Linear Regression, Logistic Regression we use this often.

Infact, the most basic statistic like a sample mean is an MLE for the population mean.

We use the term MLE solution without knowing what it actually is. Intuitively the name suggests that is the highly possible solution. But what is the concept behind it?

Let’s dive right in!

What is Likelihood?

Let’s start with finding the probability of heads if you toss a coin.

So, how can we determine the probability of heads when you toss a coin?

I would go with tossing the coin so many number of times, let’s say 5.

Number of heads = Count the Heads (Say 3)

Total number of tosses.

P(H) = Number of heads/Total number of tosses

So, I would say P(H) = 3/5.

But what is the conceptual background that got us to this approach?

Introduction

The concept:

Let’s define some random variables to start with.

A random variable

Xi = {1 if toss is H, 0 otherwise}.

Xi as you might know, follows a Bernoulli distribution.

Now, let’s define another random variable, y.

“y” is the number of heads we will get in 5 trials.

Guess the distribution “y” follows?

“y” follows a binomial distribution as it is a summation of 5 Bernoulli trials.

If we change the probability of heads (Theta), the distribution will change.

If we change theta, the distribution of “y” changes.

To decide on the value of theta, we calculate likelihood.

Likelihood:

Simply put, Likelihood is the probability of observing the data given a theta.

To calculate likelihood, we take the joint probability of the D (Data observed).

P(D/theta) is the likelihood.

We try with different values of theta, and the theta that gives maximum likelihood will be the maximum likelihood estimate for the true value of theta (P(H)).

I made a video to make it easy. Watch this:

Difference between Likelihood and Probability:

Probability is a function of data while Likelihood is a function of the parameter, in our case it is the function of Theta.

Can you explain the above statement?

You might want to go through MLE in logistic regression now. Check it out here.

Happy Learning!

PS: Thanks to Professor Gourab Nath’s session on MLE. This entire post is a summary of his session.

Leave a comment

Website Built with WordPress.com.

Up ↑

Design a site like this with WordPress.com
Get started