If you need some introduction to Linear Regression, Go here !
You must be familiar with the output of a linear regression model.
We get a p – value for each variable.
What is this p-value?
Hypothesis Testing :
For a simple linear regression, as you know, the equation will look like this:
If there is no relationship between X, y, then Beta_1 will be zero. Conversely, if Beta_1 is 0, there will be no change in y even if X changes.
Our Claim here is that X will have an effect on Y for population. Unless we test this, we can’t be sure.
Lets set up the Hypothesis :
H0 : Beta_1 = 0 #Null Hypothesis is that the coefficient is 0
H1 : Beta_1 <> 0 # This is our claim, the alternate hypothesis
Step 2: Defining the alpha (Significance Level). We are taking it as 0.05.
Step 3 : Finding the test statistic. In our case, the test statistic follows a t distribution. and it looks like this:
Step 4: Finding P value for the t distribution.
If the p-value is > 0.05,
Null is accepted,
Proving that there is no relationship between Xi (In case of multiple variables) and y variables in our data set.
If the p-value < 0.05,
then the null hypothesis is rejected,
meaning, there is a statistically significant relationship between our Xi and y s.
If there are multiple independent variables, then, we set the null hypothesis as all coefficients are 0, and the test follows.
If in the data set, for a variable, p value comes out to be > 0.05, it means there is no statistically significant relationship between that variable and the target. Hence, we can simply drop that variable.
If you look at the p- values above, you can drop the variables with very high p-values.
Happy Learning!
🙂
Leave a comment