Sampling

So far we have considered summary statistics from what we have considered to be populations. Suppose we cannot obtain data from every single person in a population, there may be simply too many people, or, using statistical techniques, we can find a more efficient way, where only a proportion of some population need be contacted or have some experiment performed upon them. The group of the population we use will be called a sample. In order that some useful properties of samples can be illustrated we shall use the alcohol.sav file to obtain a number of random samples of birth weights (birthwt). To start with, every pair will obtain 10 samples of 15 cases. The mean & standard deviation of birthwt will be obtained from each sample. So with 6 computers in use we should obtain 60 sample means & standard deviations. The properties of this sample of 60 sample means & standard deviations will then be illustrated by the tutor. This will then be repeated using 60 samples of size 30.


Figure 19

SPSS takes a random sample which includes all variables, so we must ensure that only valid or not missing values of birthwt are included in the samples. To ensure that SPSS does not include missing values of birthwt in the random sample we have to completely remove, or delete from the file those values of birthwt which are missing. By obtaining a frequency table you can see that there are 462 valid values of birthwt. To select those cases with a valid birthwt, in SPSS, choose Data, Select cases..., If condition is satisfied, If..., place ~missing (birthwt) (the '~' symbol means 'not' to give 'not missing') in the box (see Figure 19), Continue, then in order that additional selections may be made later we have to delete or remove those cases which do not have valid birthwt values. To do this click on the Deleted option in the Unselected Cases Are box, then OK. We now have a data file containing all 462 valid values of birthwt. The file should not be saved as alcohol.sav since this will lose data from other variables for which only birthwt was missing.

We can now obtain the first random sample of 15. In SPSS, choose Data, Select cases..., then we can return the Unselected cases are box to Filtered (Filtered means non-selected cases are only temporarily excluded and can be included again. If they were Deleted we would need to re-open the file to retrieve the excluded cases) then choose Random sample of cases, Sample, (see Figure 20), Exactly, in the box type 15, then (because the whole file contains 462 cases) in the cases from the first box type 462, Continue, OK. You should now obtain the mean & standard deviation of the 15 birthwt values in your sample. To choose the next random sample of 15 you need only choose Data, Select cases..., where you will find everything you need remains from before, OK. When you have obtained your 10 sample means & standard deviations, give them to the tutor.


Figure 20

You should now repeat the previous procedure, with a sample size of 30.

Properties of the sample means


Figure 21

Histograms of the 60 sample means should look similar to Figure 21, but with different means and standard deviations. The standard deviation of the means taken with a sample size of 15 is greater than for those taken with a sample size of 30. From Figure 12, you will recall that the population mean was 3263kg with a standard deviation of 554. You can see that both sample means are similar to this. If you now divide the population standard deviation (554) by , the first sample size, you should have a value (143) similar to the standard deviation of the 60 sample size 15 means. Then if you divide the population standard deviation by , you should have a value (101) similar to the standard deviation of the 60 sample size 30 means. Dividing the population standard deviation (S) by the square root of the sample size (n) gives a value known as the standard error of the sample mean.

Standard error (SE) of the sample mean = 

The standard error is a measure of how accurately the sample mean estimates the population mean. It is related to the population standard deviation and the sample size. A larger sample size will decrease the standard error and give a more accurate estimate of the population mean.

Similarly to the standard deviation, 95% of sample means obtained with the same sample size would lie approx 2SEs (1.96SEs to be precise) either side of the population mean. This range is called a confidence interval. It means that a random sample for a given sample size will have a probability of 0.95 of lying within the confidence interval.

Properties of the sample standard deviations

Figure 22 should be similar to the distributions of the standard deviations taken from each of your sample size groups. Both have means similar to the population standard deviation. That is to say, we can estimate the population standard deviation (S = 554) by using the sample standard deviation (s). We can also use the sample standard deviation (s) to estimate the standard error of the sample mean if we do not (as is usual) know the population standard deviation. It can also be seen (by the respective standard deviations in Figure 22) that the larger sample size provides a more accurate estimate of the population standard deviation.


Figure 22
 

It is often the case that we need to use a sample to estimate the population mean & standard deviation. We cannot this time say that the population mean is likely to lie within 2SEs of the sample mean, because the accuracy of the estimate of the population standard deviation and thus the standard error of the mean depends on the sample size. As will be seen later, the sample size can be taken into account.


Introduction | Summary Statistics | Descriptive Statistics | Sampling | Normal Distribution | The t-Student Distribution |
Correlation and Regression | Analysis of Variance  | Contingency Tables | Non-Parametric Statistics