The RLMS project aimed to provide timely and accurate information about Russia, and the RLMS invested many of its resources to enhance the data quality. The project team was the first to note that Phase I data did not match the standards of data that the group collected in other countries. Nevertheless, RLMS data are of exceptionally high quality and are very useful in the context of research on the transition in Russia.
One area of concern was interviewer training. It was not unusual in Russia for this training to consist solely of a short lecture, with no practice. The training program for Phase I of the RLMS was the first of its kind to last for more than a few hours. One-week training sessions were held three times in preparation for Round I. One of the sessions was solely for supervisors and most interviewers were trained twice. During the next year, two more supervisor training sessions were held. Interviewer training was conducted in at least six different sites per round. High interviewer turnover during the first two rounds was reduced considerably as interviewers began to understand the expectations and close monitoring of the project staff. The project prepared videotapes for each round of data collection, including formal instruction in how to interview, examples of well-done interviews, and specific training for the RLMS.
Despite these efforts, there were considerable problems with the RLMS training in Phase I. The Russian Goskomstat interviewers who participated in Round I training were not those who actually collected the data in about half of the oblasts. In Rounds II and III, the project was able to train the actual interviewers, supervisors, and data entry personnel. For Round IV, Goskomstat officials at the time did not allow any training.
A second area of concern is monitoring and evaluation. During the fall of 1992, the project conducted the first review of sample implementation ever conducted in a Russian setting. The project’s collaborators at the Institute of Sociology of the Russian Academy of Science (ISRosAN) conducted elaborate checks of the way the sample was drawn in each oblast, including direct checks of lists and households. They then discussed the sampling procedures in detail with workers in each oblast. In doing so, they found that a quota system was used in two oblasts. Fortunately, this was done only to replace non-respondents and was not done for the entire sample. Essentially, non-random procedures were used to fill quotas in only four voting districts, and then only for replacing non-respondents. Michael Swafford and Michael Kosolapov and his group (ISRosAN) continued to monitor sampling implementation. They included extensive training sessions on sampling for Goskomstat supervisors in March and April 1992, consultation by telephone with most oblast staff during the implementation phase, reviews of difficult issues and problems uncovered during the training program, and subsequent revisions of some sampling plans.
A third area of concern is the interviews themselves. The field supervisor is responsible for the interviews in his or her region. As a check of the quality of the work of the interviewers, ISRosAN staff contacted and re-interviewed a random subset of households. For Round I, ISRosAN revisited and re-interviewed hundreds of households in eight oblasts and 24 voting districts. Remarkably, this review showed that all families interviewed by ISRosAN staff had previously been interviewed by Goskomstat interviewers.
In Phase II, it was the responsibility of local supervisors to gather the necessary information for sampling in accordance with written instructions, to arrange for training facilities, to invite people to be trained, to supervise their work, and to check the completed questionnaires. All local supervisors consulted by telephone with representatives in Moscow who could answer their questions in advance.
All interviewers underwent a demanding training regimen, outlined below. Any trainee whose performance during training revealed him or her to be unsuited for the job was dismissed before field work began. Interviewers were:
- Lectured on the general principles of face-to-face interviewing. (We provided a 70-minute video tape entitled “Introduction to Interviewing” to ensure that all interviewers received the same instructions and examples. Where there was no available VCR, we rented video salons.)
- Required to read through the entire questionnaire in advance, then to fill it out themselves.
- Shown an example of a good interview with commentary, again using a video tape. (The tape included a section on the diet portion of the questionnaire.)
- Introduced to the written questionnaire specifications, entitled “Interviewer Instructions.”
- Asked to role-play the role of respondent while trainees took turns reading questions as they would in an actual interview.
- Asked to practice interviewing in groups of three. (One assumed the role of interviewer; another, the role of respondent; the third, the role of observer, watching to see whether the interviewer was working properly. The trainer and perhaps some other experienced interviewers circulated among the triads to observe.)
- Given written exercises that tested their ability to react properly to certain difficult situations in administering the questionnaire.
- Asked to review the administrative procedures pertaining to the survey.
- Given practice in persuading respondents to participate by having them role play.
- Required to complete at least one practice interview with a household that was not in the sample–preferably not a household related to them, although they were allowed to practice with relatives first.
- Given work exams after each of their first three interviews or more, until they demonstrated that they were competent.
At first, SPSS-DE was employed to reduce clerical error. Data were entered twice, and then the records were compared. Two training programs were held (the first for two weeks, the second for one week) for Goskomstat data entry personnel and their supervisors. UNC-CH programmers along with ISRosAN and RCPM programmers took part in this activity. Problems were found in data entry for Round I. SPSS-DE software required that files be separated into many subfiles, thus increasing the risk of error; the Russian computers used in addition to the project’s 386s for data entry could not replicate the picture of each page of the questionnaire on the screen as designed; and the initial menu-driven diet entry program was too slow. The project went to great lengths to address all these concerns. New Russian-language data entry software was specifically prepared for Russian hardware requiring only one record per questionnaire.
In addition, we found that direct entry of the dietary data led to extensive delays during the cleaning process and did not allow for adequate editing. For Rounds III and IV the project used a more traditional system in which dietary data were hand-edited and coded before data entry.
The project also discovered problems with identification numbers. It seems that the same individual identification numbers were used for different respondents in different rounds of survey work. The Goskomstat interviewers and supervisors did not utilize the rosters from previous rounds. It also took additional time for Goskomstat staff to realize the existence and importance of this problem. These snags led to a delay resulting from the additional cleaning time for Round II. A system instituted for Phase II addressed this and other delays.
As with data entry, the project encountered challenges with data cleaning that were rectified only after Round I was completed and the senior Russian collaborators from Goskomstat and RCPM visited UNC-CH. The initial suggestion was to use statistical cleaning techniques, as is traditional in Russia. Subsequently, an agreement was reached to adopt a more thorough approach that included checking problematic codes against original questionnaires, checking all identification numbers, using checks of subsamples as a guide toward more detailed data checks, and checks on the data collection and entry work.
In Phase II, when questionnaires were returned to local supervisors, those supervisors were required to examine them to locate problems that could best be remedied in the field, e.g., by returning to get key demographic information or cleaning ID numbers so that the roster of individuals located in the household questionnaire matched those on the individual questionnaires from that household. The questionnaires were then transported to Moscow, where yet another ID check was performed.
In Moscow, coders looked through all questionnaires to code so-called “other: specify” responses. However, open-ended questions (e.g., occupation questions) were not coded at this time. Instead, their texts were fully entered as long string variables. Entering the open-ended answers as character variables offered several advantages. First, it allowed data entry to begin immediately, with no delay for coding. Second, it permited the use of computer programs to assist in coding the string variables. Third, the method allowed any user of the original data sets to recode the character variables to suit his or her purposes without going back to the paper copies of the questionnaires.
All data entry was handled in-house using the SPSS data entry program on PCs. For the first survey of Phase II, Round V, the first pass of data entry began on December 20, 1994, and finished on February 1, 1995. The second (verification) pass overlapped with the first to speed up the process. It began on January 15, 1995, and was completed on February 8, 1995 (with the exception of the diet data). The second pass revealed an error rate of 1 percent in each pass. Rounds VI through XII used a similar timeframe.