Evaluation of Samples
A sample of 7,200 households was drawn. The goal of the research team was to retain at least 5,000 households over the life of the panel survey. As Table 1 shows, this goal was achieved. The response rate started at 88.8 percent, remained above 80 percent for three rounds, and dropped to 76 percent only in the fourth round. These rates are quite satisfactory by international standards and outstanding by Western European standards.
Within each household, data were solicited for all individuals residing therein, including unmarried minors attending school elsewhere. Adults responded for children. As the bottom row of Table 1 indicates, individual data from 89 to 97 percent of individuals in participating households during each round were obtained. Since information about all household members was sought rather than sampling only some of them, the figures represented the population of adult individuals (not just of households) without any special weighting except for nonresponses.
Table 1. Household and Individual Response Rates
|Dates of Field Work
|Households Out of 7,200
|Household Response Rate (percent)
|Individuals in Participating Households
|Individuals from Whom Individual Questionnaires were Obtained
|Individual Response Rate within Participating Households (percent)
The sample can also be evaluated informally by comparing demographic statistics with corresponding parameters from the census. Bear in mind that a certain amount of demographic change has occurred since the census, so the census itself does not constitute a perfect standard.
The correspondence between household size in the census and in the Phase I sample was very good. For example, in urban areas the census reported that two-person households constituted 33.1 percent of all multiperson households; in the survey sample the percentage ranged from 32.7 to 34.8 percent over the four waves. In the census 3.4 percent of all urban multiperson households consisted of six or more members; in the survey sample from 2.9 to 3.7 percent. Only occasionally were deviations as high as two percentage points observed. The deviations were somewhat higher in the rural areas. However, such deviations may be attributed to the fact that the definition of a household in the Soviet census differed from our definition, and the care taken in distinguishing multiple households living in a single-family residence was not the same in the census as in the study.
Next, consider some comparisons based on individual rather than household data. For example, in the 1989 census males from 0 to 14 years of age living in urban areas constituted 8.37 percent of the total population; in Rounds I through IV of the RLMS the corresponding percentages were 7.96, 8.56, 8.16, and 8.71 percent, respectively–without using any corrective post-stratification weights. There were many similarly heartening comparisons. However, one can find some sharper deviations. For example, the total percentage of rural citizens was 23.2 percent in Round I rather than 26.57 percent as in the 1989 census.
Also, consider the distribution of respondents by nationality (ethnicity). Remember that the Phase I sample was not designed to represent all ethnic groups any more than the Current Population Survey or the General Social Survey in the U.S. were designed to represent, say, Vietnamese, Eskimos, or Chinese in San Francisco. In the 1989 census 81.5 percent of the Russian Federation claimed to be ethnic Russian and in Round I of this sample it was 82.7 percent. In the census 3.8 percent were Tatar and in the survey sample 3.1 percent. In the census, 3.0 percent were Ukrainian and in the survey sample 2.5 percent.
Similarly, consider the distribution of respondents’ education. In the 1989 census 6.5 percent of the Russian population age 15 and older had completed three or fewer years of schooling, while in the survey sample 5.0 percent had done so. In the census 27.4 percent had completed general secondary school and in the survey sample 24.6 percent had done so. In the census 11.3 percent claimed to have completed higher education, while the survey reported 14 percent. This upward educational bias is far less than is typically observed in nonprobability based surveys of the Russian Federation.
While the above figures were generally encouraging, they concerned only demographic variables. Total household income was one of the most important variables to calculate, as illustrated by data from Round III. Since inflation occurred during field work, ruble amounts were deflated to June 1992 levels: the mean total household income was 7,950 rubles and the standard deviation was 12,585 rubles. (Incidentally, inflated to December 1994 levels, these amounts would be 628,050 and 994,215 rubles, respectively.)
Most statistical packages (and consequently most analysts) disregard sample design effects. They are not easily calculated. Moreover, they can vary for every variable in a questionnaire as well as for all composite variables. Nevertheless, given the small number of PSUs in this survey, it seemed necessary to perform the calculations to provide some assurance of the level of precision achieved. The results appear in Table 2.
Table 2. Design Effects for Total Household Income
|Number of PSUs
|Deft (Square Root of Design Effect)
|Standard Error in June 1992 Rubles
|Size of 95 Percent Confidence Interval (percent)
|Simple Random Sample
Had this been a simple random sample of 5,546 households from the entire population of households in the Russian Federation, the design effect would have been precisely 1.00 by definition (see the bottom row). Using the standard formulas, the standard error would have been computed to be 169 rubles; the 95 percent confidence interval expressed in terms of the mean household income would have been ±4.2 percent (i.e., (1.96 * 169 rubles) / 7,950).
All national samples were stratified and clustered to cut costs. The convenience and savings exacted a toll: the confidence interval around the results (or the standard deviation of the results) became larger, i.e., precision decreased. This decrease was measured with the design effect, or with the square root of the design effect. In this survey, the design effect (DEFF) for total household income was about 9.975 based on data from Rounds I and III. The square root (DEFT) was 3.16 (see the top row). In other words, the standard error (534 rubles) was 3.16 times as large as it would have been had we obtained these results from a simple random sample. Consequently, the precision decreased ±13.2 percent. As the table reveals, had we employed 40 rather than 20 PSUs, we would have achieved an estimated precision level of ±9.7 percent; had we employed 60 rather than 20 PSUs and kept the same sample size, we would have achieved a precision level of ±7.2 percent. This value constitutes a more reasonable point of comparison than a simple random sample, since no simple random sample of large countries is feasible.
In conclusion, keep in mind that sampling error is only one of several kinds of errors that can taint survey results. Also important is the quality of the questionnaire, the interviewers’ training and fieldwork, and the data entry and cleaning processes. All of these processes were conducted in new ways in the RLMS Phase I project conducted with Goskomstat. Although we in fact pushed for a larger number of PSUs, in hindsight we recognize that we were perhaps unwittingly fortunate that Goskomstat’s resources limited the number of PSUs to 20. This limit allowed us to concentrate sufficiently on the somewhat unmeasurable nonsampling aspects of quality while giving up a tolerable and quantifiable amount of sampling precision.
For a detailed review of the Phase II sample, please read the Sample Attrition, Replenishment, and Weighting in Rounds V-VII report. The household response rate exceeded 80 percent. In both Rounds V and VI, individual questionnaires were obtained from over 97 percent of the people listed on the household rosters. The response rates did indeed vary across PSUs depending on the proportion of households in rural areas. However, since we anticipated that in over-sampling, the actual proportion of completed household interviews would compare well to the proportion of the population in each stratum. Most entries differed by less than 0.004; St. Petersburg constituted the highest exception (0.0294 rather than 0.0355).
The distribution of household size in the sample, within both rural and urban localities, corresponded well to the figures from the 1989 census. Bear in mind that single-member households are excluded from the comparison because the census includes many institutionalized people, while our sample explicity excluded them. Thus, there was no valid basis for comparison.
The multivariate distribution of the sample by sex, age, and urban-rural location compared quite well with the corresponding multivariate distribution of the 1989 census. Of course, because of random sampling error and changes in the distribution since the 1989 census, we would not expect perfect correspondence. Nevertheless, there was usually a difference of only one percentage point or less between the two distributions.
Another way to evaluate the adequacy (or efficiency) of the sample is to examine design effects. One of the important factors in determining the precision of estimates in multistage samples is the mean ultimate cluster (PSU) size. All else being equal, the larger the size the less precise are the results. In Phase I of the RLMS, the average cluster size approached 360–a large number dictated by constraints imposed by our collaborators. Thus, although the sample size hovered around 6,000 households, precision was less than we would have liked for a sample of that size.
In the Phase II sample, the situation was considerably better. Although there were only 4,000 households, the mean cluster size was much smaller than in the Phase I sample. There were 35 PSUs with about 100 households each; even this level was an improvement over the average of 360 in the design of the RLMS Phase II sample. However, in the three self-representing areas, the respondents were drawn from 61 PSUs. Thus, the mean cluster size in the entire the sample was about 42, i.e., 4,000/(35+61). Given these much smaller cluster sizes, we had reason to expect that precision in this survey would be as good as it was in Phase I, despite the smaller sample size. This, in fact, turned out to be the case in Round V, the first round of Phase II. The mean total household income was 510,146 rubles, with a 95 percent confidence interval of 65,950 (i.e. 12.9 percent).