Even in the best circumstances, drawing a good sample of an entire country is a daunting exercise. Russia presents some of the most challenging circumstances: the territory is vast (spanning 11 time zones and covering more than one-tenth of the land mass of the world), the population is ethnically heterogeneous, and the residential patterns are complex. For example, a large fraction of the population–up to 10 or 12 percent–lives in dormitories or communal apartments. Many of the census statistics that Western samplers take for granted are inaccessible or nonexistent.
Compounding these problems is the fact that survey research has a weak tradition in Russia. For ideological reasons, social research was severely restricted until recently. Many powerful local authorities prohibited surveys of the adult population in their regions out of fear that surveys would bring problems to the attention of their superiors. Thus, despite Russia’s strong tradition in mathematical statistics, virtually no one has had the opportunity to conduct surveys. Consequently, with the exception of so-called “micro-censuses” conducted periodically by Goskomstat to update census results, there have been no probability samples of large areas of Russia (or the Soviet Union) until recently. Since large-scale independent surveys became permissible in 1989, many organizations have claimed to have drawn representative samples of Russia. Unfortunately, sample documentation is invariably inadequate and quotas are often invoked in the attempt to make the demographic characteristics of the sample correspond to those of the most recent census, under the mistaken impression that such correspondence attests to the high quality of the sample. The RLMS represents the first nationally representative random sample for Russia, albeit a highly clustered one.
Phase I (Rounds I – IV)
The goal was to develop a sample of households (excluding institutionalized people) that would meet accepted scientific standards of a true probability sample to the greatest extent possible, while taking into account the severe operational constraints of Goskomstat. With the advice of William Kalsbeek [a sampling expert at the University of North Carolina at Chapel Hill (UNC-CH)] and later with help from Leslie Kish, the project developed a replicated three-stratified cluster sample of residential addresses, excluding military, penal, and other institutionalized populations. Replication was designated for Stage 1 of sampling so that the number of primary sampling units (PSUs) could be kept manageable, with the understanding that later they would be expanded. The sample size of each replicate was set at 20 PSUs. The quality of this sample was statistically analyzed.
From the outset, the sampling team thought an ideal sample would include more than 20 PSUs. Given the logistical problems Goskomstat had to overcome, however, 20 PSUs turned out to be the maximum number that could be done well. Doubling the number of PSUs would have degraded other equally important determinants of quality. There was a high degree of clustering, the effects of which are well known. Using a relatively small number of PSUs enlarged standard errors, although it did not bias results. However, the sample was not as inefficient as it might seem. Stratification was employed in the selection of PSUs, and this stratification took advantage of considerable unpublished data from Goskomstat. The 20 PSUs embraced as much variability as possible, far more than would have been captured in a simple random sample of regions (in Russia the term for regions is “raions”). Also, clustering provided important research benefits, for it may have actually enhanced potential ancillary data collection (e.g., media monitoring) as well as comparative analyses of different sites.
In sampling Stage 1, the 2,335 official regions (raions) were implicitly stratified according to 10 quality-of-life regions and percent urban. Proportional-to-size sampling (PPS) was then used to select PSUs. Moscow and St. Petersburg were selected with certainty as self-representing units, as is commonplace in such surveys.
In sampling Stage 2, voting districts within each PSU were ordered according to size (and, in cities, according to relation to the city center), and PPS procedures were used to select 10 districts in each PSU. This approach yielded 200 secondary sampling units (SSUs).
In sampling Stage 3, a list of all household addresses in each SSU was compiled, where “household” was defined as a group of people living together and sharing income and expenditures. Adjustments were made to take into account single addresses at which several households lived (e.g., adult dormitories and communal apartments). Using an appropriate interval and a randomly selected starting point in the list of households, 36 households were selected systematically in each of the 200 SSUs, yielding a sample of 7,200 households. This number was selected with the expectation that, even after four rounds, at least 5,000 households would remain in the study. In fact, the project achieved an even higher response rate.
Of the 7,200 targeted households, 6,334 provided data for Round I (17,154 individuals, of which 4,148 were age 55 and older), representing a response rate of 88.8 percent. An additional 40 households (or less than 1 percent of the sample) refused to participate in Round II interviews, while a number of Round I refusals agreed to be surveyed for Round II. Several approaches were used to reduce subsequent loss-to- follow-up (including providing honoraria to respondents and training interviewers to be courteous and respectful).
Rounds II through IV of the RLMS returned to the same addresses. Choosing to follow addresses had significant practical advantages. In particular, since interviewers had no discretion in the selection of households to be part of the survey, they could not reduce their workload by claiming that households were lost to follow-up. Also, independent checks of interviewers’ work were easier to institute.
Preliminary Results on Sampling
It is instructive to compare the demographic attributes of Phase I of the RLMS sample of individuals with those of the 1989 Russian census. In the Table 1, the age distribution values of
Table 1. Age Distribution
|1989 Census (percent)
|RLMS Sample 3.5 Years Later (percent)
the RLMS sample compared favorably to those determined by the Soviet census four years earlier. Gender and education almost matched those found in the census. The distribution of respondents is shown in Table 2; note the RLMS sample was not designed to represent all ethnic groups. It was statistically unlikely that small ethnic groups (of which there are more than 100 in Russia) would be proportionally represented in a national survey that was not designed specifically to represent them. It was likely, however, that large national groups would be proportionally represented, which was precisely what was observed for the distribution of respondents across the six largest nationality groups.
Table 2. Nationality (Ethnicity)
|1989 Census (percent)
|RLMS Sample 3.5 Years Later (percent)
From the outset, nationality (ethnicity) was considered a very salient issue. Indeed, for Round I, a battery of 15 questions about ethnic identity and language, fashioned after those in the Soviet Interview Project, was proposed. This battery of questions was vigorously opposed by Goskomstat officials who considered it too sensitive for a government survey, especially since the survey contained a PSU in a territory threatening secession. Rather than risk the entire survey over this issue, the project settled for a single question on ethnic identity that allowed respondents to volunteer more than one nationality if they wished. More detailed questions on ethnicity were asked in later rounds of the survey.
Based on census figures on nationality and language, the project anticipated some problems with delivering interviews only in Russian. The project was prepared to develop multiple language questionnaires, but colleagues in three Russian institutions insisted that it would be unnecessary. This proved to be right. In the first place, the two PSUs that fell in areas with high concentrations of non-Russians were known to have high concentrations of Russian speakers. Second, the questionnaire consisted primarily of questions of fact with lists of everyday nouns, not of opinion items where nuances make a crucial difference. Therefore, Russian-as-a-second-language was quite sufficient. The project acknowledged that, if the sample were expanded to represent ethnic enclaves, the questionnaire language would surely pose a greater problem than it did presently.
A related issue concerns the reaction of Muslim women to the abortion, pregnancy history, and other sensitive sections of the survey. To some extent, it is important to note that Muslim women in Russia do not react as do those where Islam has been freely practiced. In point of fact, there was no appreciable difference in the willingness of these women to continue in the study, and the response rate for questions was the same. The loss-to-follow-up rate for Rounds I and II was 5.6 percent. This value was based on the total set of families who were interviewed in each oblast and the number interviewed who were included in the original Round I sample. The rates for the two oblasts with large Muslim populations, Kazan and Nal’chik, are provided as an example: Kazan had a loss-to-follow-up rate of 3.4 percent and Nal’chik had a nonresponse rate of 1.9 percent.
Phase II (Rounds V – XXIII)
In Phase II of the RLMS, a multi-stage probability sample was employed. Please refer to the March 1997 review of the Phase II sample. First, a list of 2,029 consolidated raions was created to serve as PSUs. These were allocated into 38 strata based largely on geographical factors and level of urbanization but also based on ethnicity where there was salient variability. As in many national surveys involving face-to-face interviews, some remote areas were eliminated to contain costs; also, Chechnya was eliminated because of armed conflict. From among the remaining 1,850 raions (containing 95.6 percent of the population), three very large population units were selected with certainty: Moscow city, Moscow Oblast, and St. Petersburg city constituted self-representing (SR) strata. The remaining non-self-representing raions (NSR) were allocated to 35 equal-sized strata. One raion was then selected from each NSR stratum using the method “probability proportional to size” (PPS). That is, the probability that a raion in a given NSR stratum was selected was directly proportional to its measure of population size.
The NSR strata were designed to have approximately equal sizes to improve the efficiency of estimates. The target population (omitting the deliberate exclusions described above) totaled over 140 million inhabitants. Ideally, one would use the population of eligible households, not the population of individuals. As is often the case, we were obliged to use figures on the population of individuals as a surrogate because of the unavailability of household figures in various regions.
Although the target sample size was set at 4,000, the number of households drawn into the sample was inflated to 4,718 to allow for a nonresponse rate of approximately 15 percent. The number of households drawn from each of the NSR strata was approximately equal (averaging 108), since the strata were of approximately equal size and PPS was employed to draw the PSUs in each one. However, because response rates were expected to be higher in urban areas than in rural areas, the extent of over-sampling varied. This variation accounted for the differences in households drawn across the NSR PSUs. It also accounted for the fact that 940 households were drawn in the three SR strata–more than the 14.6 percent (i.e. 689) that would have been allotted based on strict proportionality.
Since there was no consolidated list of households or dwellings in any of the 38 selected PSUs, an intermediate stage of selection was then introduced, as usual. Professional samplers will recognize that this is actually the first stage of selection in the three SR strata, since those units were selected with certainty. That is, technically, in Moscow, St. Petersburg, and Moscow oblast, the census enumeration districts were the PSUs. However, it was cumbersome to keep making this distinction throughout the description, and researchers followed the normal practice of using the terms “PSU” and “SSU” loosely. Needless to say, in the calculation of design effects, where the distinction is critical, the proper distinction was maintained. The selection of second-stage units (SSUs) differed depending on whether the population was urban (located in cities and “villages of the city type,” known as “PGTs”) or rural (located in villages). That is, within each selected PSU the population was stratified into urban and rural substrata, and the target sample size was allocated proportionately to the two substrata. For example, if 40 percent of the population in a given region was rural, 40 of the 100 households allotted to the stratum were drawn from villages.
In rural areas of the selected PSUs, a list of all villages was compiled to serve as SSUs. The list was ordered by size and (where salient) by ethnic composition. PPS was employed to select one village for each 10 households allocated to the rural substratum. Again, under the standard principles of PPS, once the required number of villages was selected, an equal number of households in the sample (10) was allocated to each village. Since villages maintain very reliable lists of households, in each selected village the 10 households were selected systematically from the household list. In a few cases, villages were judged to be too small to sustain independent interviews with 10 households; in such cases, three or four tiny villages were treated as a single SSU for sampling purposes.
In urban areas, SSUs were defined by the boundaries of 1989 census enumeration districts, if possible. If the necessary information was not available, 1994 microcensus enumeration districts, voting districts, or residential postal zones were employed–in decreasing order of preference. Since census enumeration districts were originally designed to be roughly equal in population size, one district was selected systematically without using PPS for each 10 households required in the sample. In the few cases where postal zones were used, one zone was likewise selected systematically for each 10 households. However, where voting districts were used, to compensate for the marked variation in population size, PPS was employed to select one voting district for each 10 households required in the urban sub-stratum.
Given the lack of reliable official lists of households within the urban SSUs, we were obliged to develop the list of households from which 10 households were selected. First, a list of dwellings was made. Where more than one household was known to exist within a single dwelling (that is, in the communal apartments and enterprise dormitories that are relatively commonplace in the Russian Federation), the list was amended so that each household (or space within the dwelling) was enumerated in advance of selection. Then, the required number of households was drawn systematically, starting with a random selection in the first interval.
In both urban and rural substrata, interviewers were required to visit each selected dwelling up to three times to secure the interviews. They were not allowed to make substitutions of any sort. The interviewers’ first task was to identify households at the designated dwellings. “Household” was defined as a group of people who live together in a given domicile, and who share common income and expenditures. Households were also defined to include unmarried children, 18 years of age or younger, who were temporarily residing outside the domicile at the time of the survey. If perchance the interviewer identified more than one household in the dwelling, he or she was obliged to select one using a procedure outlined in the technical report. The interviewer then administered a household questionnaire to the most knowledgeable and willing member of the household.
The interviewer then conducted interviews with as many adults as possible, acquiring data about their individual activities and health. Data for the children’s questionnaires were obtained from adults in the household. By virtue of the fact that an attempt was made to obtain individual questionnaires for all members of households, the sample constitutes a proper probability sample of individuals as well as of households, without any special weighting. Actually, the fact that we did not interview unmarried minors living temporarily outside the domicile slightly diminished the representativeness of the sample of individuals in that age group.
As described above, the sample frame was essentially based on dwellings in urban areas and households in rural areas. In conducting Rounds VI through XII interviewers in both urban and rural areas attempted to conduct interviews in the same dwellings (or spaces within communal apartments and dormitories) that fell into the Round V sample. They returned to each Round V dwelling even if the household in the dwelling had refused to participate during previous rounds, and even if they found out that the household whom they interviewed in previous rounds had moved to a new dwelling prior to the interview.
Since the change in housing stock was minuscule between late 1994 and late 1995, this procedure ensured that the results in 1995 were approximately as representative as they were in 1994. The response rate was nearly the same: 84 percent in Round V and 80 percent in Round VI–both respectable figures in survey research requiring such substantial face-to-face interviews about every member of every household. Furthermore, by returning to every dwelling we actually obtained interviews from some 200 households who had declined to participate in Round V. This approach could eventually permit some analysis of the nature of non-response in Round V–an analysis that would be more sophisticated than merely comparing the demographic characteristics of households to those in the census.
It is especially important to note that this procedure did not appreciably vitiate our ability to conduct panel analyses with Round V and VI data. First, the data set rendered it quite easy to identify households and people who participated in both rounds. Second, as it turned out, only 250 households (6.3 percent) from Round V moved from their dwellings and were thus lost to Round VI–a low level of attrition for a panel survey of this sort. Nevertheless, we did gather data on their new addresses whenever possible in anticipation of a supplementary study to follow up.
As stated above, the household response rate exceeded 80 percent. As in Round V, individual questionnaires were obtained from over 97 percent of the individuals listed on the household rosters. The response rates did indeed vary across PSUs depending on the proportion of households in rural areas. However, since we anticipated that over-sampling the actual proportion of completed household interviews would compare well to the proportion of the population in each stratum. The distribution of household size in the sample, within both rural and urban localities, corresponded well to the figures from the 1989 census. Bear in mind that single-member households were excluded from the comparison because the census included many institutionalized people, while our sample explicitly excluded them. Thus, there is no valid basis for comparison.
The multivariate distribution of the sample by sex, age, and urban-rural location compared quite well with the corresponding multivariate distribution of the 1989 census. Of course, because of random sampling error and changes in the distribution since the 1989 census, we did not expect perfect correspondence. Nevertheless, there was usually a difference of only one percentage point or less between the two distributions.
Another way to evaluate the adequacy (or efficiency) of the sample was to examine design effects. An important factor in determining the precision of estimates in multi-stage samples was the mean ultimate cluster (PSU) size. All else being equal, the larger the size the less precise the measure is. In Rounds I through IV of the RLMS, the average cluster size approached 360–a large number dictated by constraints imposed by our collaborators. Thus, although the sample size hovered around 6,000 households, precision was less than we would have liked for a sample of that size. In Rounds I and III of the RLMS, the 95 percent confidence interval for household income was about Â±13 percent.
In the Phase II sample, the situation was considerably better. Although there were only 4,000 households, the mean size of clusters was much smaller than in Phase I. There were 35 PSUs with about 100 households each; even this result was an improvement over the average of 360 in the design of the RLMS Rounds I through IV. However, in the three self-representing areas, the respondents were drawn from 61 PSUs. Recall that Moscow city and oblast, as well as St. Petersburg city, were not sampled but were chosen with certainty. Therefore, the first stage of selection in them was the selection of census enumeration districts. Thus the mean cluster size in the entire sample was about 42, i.e., 4,000/(35+61). Given these much smaller cluster sizes, researchers had reason to expect that precision in this survey would be as good as it was in Rounds I through IV despite the smaller sample size, and this expectation, in fact, turned out to be the case in Rounds V through XIII.
The number of sampled households was increased in Round XIX (2010) to 6000. New independent samples of the same sample design, but smaller sizes, were drawn in addition to the 1994-2009 sample dwellings in all primary sample units (PSU). The numbers of dwellings in the additional samples of each primary sample unit (PSU) were set to get the needed number of interviews from each PSU – corresponding to the percent of each PSU in the population of Russian Federation.