Topic modelling

Topic Modelling is a clustering technique used to classify documents (text) according to the topic they deal with. It is an unsupervised machine learning technique. Let's go through topic modelling step by step using a text data set. We will be using articles from NPR (National Public Radio), obtained from their website www.npr.org Intuition How would... Continue Reading →

Stationarity in time series data – What, Why and How

Before stationarity, let's understand white noise. What is white noise? A white noise series is a sequence of random numbers and cannot be predicted. More formally, a series is white noise if the variables are independent and identically distributed with mean of zero and same variance. Each value has zero correlation with other values in... Continue Reading →

Linear Regression – A detailed introduction

What is Regression? We use this term very often in Machine Learning and Statistics. What is the meaning of this term? It is a method used in statistics to determine the relation between on variable (dependent) and the other variable (independent). Literal meaning of regression is stepping back towards the average. So , where from... Continue Reading →

Understanding SST – ANOVA

When population means are same / similar : When the population means are same or nearer, SSW will be high, in other words, most of SST will go into SSW . Checking it in python : See that, SSW is more compared to SSB When population means are different: When population means are different, then... Continue Reading →

Multicollinearity – A detailed Understanding

Multicollinearity: It is the existence of correlation among the predictor variables. Multicollinearity is the occurrence of high intercorrelations among two or more independent variables in a multiple regression model.Investopedia Why is it a problem? Multicollinearity among independent variables will result in less reliable statistical inferences. Multicollinearity increases the variability in coefficients, making the estimates sensitive to... Continue Reading →

ANOVA – Analysis of Variance: Introduction

To carryout comparison of means of several populations we use Analysis of Variance. ANOVA is a statistical method for determining the existence of differences among several population means Aczel - Sounderpandian Why is it called Analysis of Variance? Though we are comparing different population means for difference, the technique requires the analysis of different forms... Continue Reading →

Confidence Interval

If you want to make an inference on the population parameter from a sample statistic., then you use this concept. Let's say we want to find the average cost of PGs all over HSR layout. Instead of going to every PG (population) and finding the mean price, we pick a sample randomly and find the... Continue Reading →

Website Built with WordPress.com.

Up ↑

Design a site like this with WordPress.com
Get started