Logistic Regression – MLE

So far we have seen the workings of Linear regression.

Linear Regression – A detailed introduction

Now, can we solve a classification problem using linear regression?

Why or why not?

Binary Logistic Regression :

Logistic regression is a supervised machine learning algorithm that gives probability of a class as an output. Businesses make decisions about setting the threshold, and by looking at the predicted probabilities, we can classify an observation into a class.

Logistic Regression is an improvement over linear regression. We tweak the model in a way that the model outputs predicted probabilities instead of a real number.

Can you guess the range of output of Logistic Regression model?

The probability on Logistic Regression is given by the equation below:

Where, Z is the linear equation.

The Logistic regression equation is mathematically modelled as a sigmoid function instead of a straight line, that ranges between 0 and 1.

Real life situations rarely follow a mathematical equation. Hence, we are introducing some bias in the model .

Maximum Likelihood Estimate for Model Estimation :

A maximum likelihood estimate gives us the optimum Betas, or coefficients, intercept for our linear equation.

For the workings of MLE, please follow the picture blocks below.

Assumptions of Logistic Regression:

The response variable is binary (Since we are talking about binary logistic regression)
The observations are independent. We are multiplying the probabilities remember!
- (How to check this assumption: The easiest way to check this assumption is to create a plot of residuals against time (i.e. the order of the observations) and observe whether or not there is a random pattern. If there is not a random pattern, then this assumption may be violated)
There is no multi collinearity among explanatory variables.
- How to check this assumption: The most common way to detect multicollinearity is by using the variance inflation factor (VIF), which measures the correlation and strength of correlation between the predictor variables in a regression model.
LR assumes that there are no extreme outliers in the data set. Since, like in linear regression, they will affect the model. (Shifts the line)
There is linear relationship between the logit of the response variable and the explanatory variables.
- Logit(p) = log(p / (1-p)) where p is the probability of a positive outcome.
- Logit is the log(odds). It is equal to z, which is our linear equation.
- Note: These come under Generalized Linear Models, their errors are not normally distributed.
- Log(odds) Explanation:

6. Sample size is sufficiently large so as to draw reasonable conclusions about the population.

Say, the sample data has

At least 10 cases with the least frequent outcome for each independent variable.
Sample data has 5 predictors, probability of the outcome is 1%.
Then the minimum sample size required is

Regression output:

The scikit library is very limited in its functions. If we want to look at model summary and understand the model, we have to use statsmodel.

The summary looks like this:

Explanation of some of the terms in the summary table:

coef : the coefficients of the independent variables in the regression equation.
Log-Likelihood : the natural logarithm of the Maximum Likelihood Estimation(MLE) function. MLE is the optimization process of finding the set of parameters that result in the best fit.
LL-Null : the value of log-likelihood of the model when no independent variable is included(only an intercept is included).
Pseudo R-squared : a substitute for the R-squared value in Least Squares linear regression. It is the ratio of the log-likelihood of the null model to that of the full model.

Source : https://www.geeksforgeeks.org/logistic-regression-using-statsmodels

The below summary is even more elaborate.

Akaike information criterion:

Variable Importance:

First approach to finding if a variable is important is to check the log likelihood with and without the variable .
Second is to check the R_squared with and without the variable.
Using the varImp() function provided by caret package.
Difference between AUROC with and without features.

Pseudo R_squared:

Detailed implementation in python is present in my Github :

https://github.com/SreeKavyadurbaka/Logistic-Regression-

Logistic Regression – MLE

Binary Logistic Regression :

Maximum Likelihood Estimate for Model Estimation :

Assumptions of Logistic Regression:

Regression output:

Variable Importance:

Pseudo R_squared:

One thought on “Logistic Regression – MLE”

Add yours

Leave a comment Cancel reply

Binary Logistic Regression :

Maximum Likelihood Estimate for Model Estimation :

Assumptions of Logistic Regression:

Regression output:

Variable Importance:

Pseudo R_squared:

Share this:

One thought on “Logistic Regression – MLE”

Add yours

Leave a comment Cancel reply