In other words we should use weighted least squares with weights. Note: The the lowercase Greek letter sigma is used to represent the standard deviation of a population while the letter s is. If all are unique, then counts the number of unique values, and counts the number of samples.įor example, if values are drawn from the same distribution, then we can treat this set as an unweighted sample, or we can treat it as the weighted sample with corresponding weights, and we should get the same results. These standard deviations reflect the information in the response Y values (remember these are averages) and so in estimating a regression model we should downweight the obervations with a large standard deviation and upweight the observations with a small standard deviation. If all of the are drawn from the same distribution and the integer weights indicate frequency of occurrence in the sample, then the unbiased estimator of the weighted population variance is given by The standard deviation is simply the square root of the variance above.
The degrees of freedom of the weighted, unbiased sample variance vary accordingly from N − 1 down to 0. The unbiased estimator of a weighted population variance (assuming each is drawn from a Gaussian distribution with variance ) is given by : While this is simple in unweighted samples, it is not straightforward when the sample is weighted. In normal unweighted samples, the N in the denominator (corresponding to the sample size) is changed to N − 1. Where, which is 1 for normalized weights.įor small samples, it is customary to use an unbiased estimator for the population variance. The biased weighted sample variance is defined similarly to the normal biased sample variance:
If all of the are drawn from the same distribution and the integer weights indicate frequency of occurrence in the sample, then the unbiased estimator of the weighted population variance is given by. When a weighted mean is used, the variance of the weighted sample is different from the variance of the unweighted sample. The standard deviation is simply the square root of the variance above. Typically when a mean is calculated it is important to know the variance and standard deviation about that mean.