article

Statistics simplified – Part 2

Naveen Narayanan
February 18, 2013

Hello friends, in our last session, we discussed about the basic statistics and the statistical concepts to get the overview of the subject. Let’s move on and understand some more about this interesting subject.

Statistical Inference

  • Statistical infer is the process of making an estimate, prediction, or decision about a population based on a sample.
  • We use statistics to make inferences about parameters. Therefore, we can make an estimate, prediction, or decision about a population based on sample data.
  • Thus, we can apply what we know about a sample to the larger population from which it was drawn!

Rationale

  • Large populations make investigating each unit impractical and expensive. it is easier and cheaper to take a sample and make estimates about the population from the sample.
  • However Such conclusions and estimates are not always going to be correct. For this reason, we build into the statistical inference “measures of reliability,” namely confidence level and significance level.

Confidence and Significance Levels

  • The confidence level is the proportion of times that an estimating procedure will be correct.
    E.g. a confidence level of 95% means that, estimates based on this form of statistical inference will be correct 95% of the time.
  • When the purpose of the statistical inference is to draw a conclusion about a population, the significance level measures how often the conclusion are likely to be wrong in the long run.
    E.g. a 5% significance level means that, in the long run, this type of conclusion will be wrong 5% of the time.
  • We use (Greek letter “alpha”) to represent significance, then our confidence level is 1 − alpha.
    This relationship can also be stated as: Confidence Level + Significance Level = 1
  • Consider a statement from polling data you may hear about in the news:
    “This poll is considered accurate within 3.4 percentage points, 19 times out of 20.”
    In this case, our confidence level is 95% (19/20 = 0.95), while our significance level is 5%.

Confidence Interval

A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.

If independent samples are taken repeatedly from the same population, and a confidence interval calculated for each sample, then a certain percentage (confidence level) of the intervals will likely include the unknown population parameter. Confidence intervals are usually calculated so that this percentage is 95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence intervals for the unknown parameter.

The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter. A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter.

Confidence intervals are more informative than the simple results of hypothesis tests (where we decide “reject H0” or “don’t reject H0”) since they provide a range of probable values for the unknown parameter.

Confidence Interval for the Difference Between Two Means

A confidence interval for the difference between two means specifies a range of values within which the difference between the means of the two populations may lie. These intervals may be calculated by, for example, a producer who wishes to estimate the difference in mean daily output from two machines; a medical researcher who wishes to estimate the difference in mean response by patients who are receiving two different drugs; etc.

The confidence interval for the difference between two means contains all the values of µ1 – µ2 (the difference between the two population means)

H0: µ1 = µ2

against

Ha: µ1 not equal to µ2

i.e.

H0: µ1 – µ2 = 0

against

Ha: µ1 – µ2 not equal to 0

If the confidence interval includes 0 we can say that there is no significant difference between the means of the two populations, at a given level of confidence.

The width of the confidence interval gives us some idea about how uncertain we are about the difference in the means. A very wide interval may indicate that more data should be collected before anything definite can be said.

We calculate these intervals for different confidence levels, depending on how precise we want to be. We interpret an interval calculated at a 95% level as, we are 95% confident that the interval contains the true difference between the two population means. We could also say that 95% of all confidence intervals formed in this manner (from different samples of the population) will include the true difference.

So friends, in this blog I covered confidence level and confidence intervals. I hope you are enjoying the journey along with me.

In the next part, I will cover hypothesis testing…… so enjoy reading………!

Top