So far we have considered summary statistics from what we have considered to be populations. Suppose we cannot obtain data from every single person in a population, there may be simply too many people, or, using statistical techniques, we can find a more efficient way, where only a proportion of some population need be contacted or have some experiment performed upon them. The group of the population we use will be called a sample. In order that some useful properties of samples can be illustrated we shall use the alcohol.sav file to obtain a number of random samples of birth weights (birthwt). To start with, every pair will obtain 10 samples of 15 cases. The mean & standard deviation of birthwt will be obtained from each sample. So with 6 computers in use we should obtain 60 sample means & standard deviations. The properties of this sample of 60 sample means & standard deviations will then be illustrated by the tutor. This will then be repeated using 60 samples of size 30.
SPSS takes a random sample which includes all variables, so we must ensure that only valid or not missing values of birthwt are included in the samples. To ensure that SPSS does not include missing values of birthwt in the random sample we have to completely remove, or delete from the file those values of birthwt which are missing. By obtaining a frequency table you can see that there are 462 valid values of birthwt. To select those cases with a valid birthwt, in SPSS, choose Data, Select cases..., If condition is satisfied, If..., place ~missing (birthwt) (the '~' symbol means 'not' to give 'not missing') in the box (see Figure 19), Continue, then in order that additional selections may be made later we have to delete or remove those cases which do not have valid birthwt values. To do this click on the Deleted option in the Unselected Cases Are box, then OK. We now have a data file containing all 462 valid values of birthwt. The file should not be saved as alcohol.sav since this will lose data from other variables for which only birthwt was missing.
We can now obtain the first random sample of 15. In SPSS, choose Data, Select cases..., then we can return the Unselected cases are box to Filtered (Filtered means non-selected cases are only temporarily excluded and can be included again. If they were Deleted we would need to re-open the file to retrieve the excluded cases) then choose Random sample of cases, Sample, (see Figure 20), Exactly, in the box type 15, then (because the whole file contains 462 cases) in the cases from the first box type 462, Continue, OK. You should now obtain the mean & standard deviation of the 15 birthwt values in your sample. To choose the next random sample of 15 you need only choose Data, Select cases..., where you will find everything you need remains from before, OK. When you have obtained your 10 sample means & standard deviations, give them to the tutor.
You should now repeat the previous procedure, with a sample size of 30.
Histograms of the 60 sample means should look similar to Figure
21, but with different means and standard deviations. The standard
deviation of the means taken with a sample size of 15 is greater than for
those taken with a sample size of 30. From Figure 12, you will recall that
the population mean was 3263kg with a standard deviation of 554. You can
see that both sample means are similar to this. If you now divide the population
standard deviation (554) by
,
the first sample size, you should have a value (143) similar to the standard
deviation of the 60 sample size 15 means. Then if you divide the population
standard deviation by
,
you should have a value (101) similar to the standard deviation of the
60 sample size 30 means. Dividing the population standard deviation (S)
by the square root of the sample size (n) gives a value known as the standard
error of the sample mean.
Standard error (SE) of the sample mean =
The standard error is a measure of how accurately the sample mean estimates the population mean. It is related to the population standard deviation and the sample size. A larger sample size will decrease the standard error and give a more accurate estimate of the population mean.
Similarly to the standard deviation, 95% of sample means obtained with the same sample size would lie approx 2SEs (1.96SEs to be precise) either side of the population mean. This range is called a confidence interval. It means that a random sample for a given sample size will have a probability of 0.95 of lying within the confidence interval.
It is often the case that we need to use a sample to estimate the population mean & standard deviation. We cannot this time say that the population mean is likely to lie within 2SEs of the sample mean, because the accuracy of the estimate of the population standard deviation and thus the standard error of the mean depends on the sample size. As will be seen later, the sample size can be taken into account.
Introduction
|
Summary
Statistics |
Descriptive Statistics
| Sampling |
Normal Distribution
| The t-Student Distribution
|
Correlation and Regression
|
Analysis of Variance
|
Contingency Tables |
Non-Parametric Statistics