NOTE: Data from Rounds I through IV cannot be merged with Rounds V through XIX data since entirely different samples were used for the RLMS Phase I (Rounds I through IV) and Phase II (Rounds V through XIX). The attempt to merge data sets from Phase I with those from Phase II will generate erroneous results.
In each of the RLMS data sets, the unit of analysis is a household, a household member, or a survey site, depending upon whether one is looking at household-, individual-, or community-level data. Note that Person 1 is not necessarily the head of the household.
Rounds I through IV
In Rounds I through IV, the variables SITE and FAMILY identify a unique household; SITE, FAMILY, and PERSON identify a unique individual. SITE is a geographic descriptor, and there are up to 360 families within each SITE. The numbers for FAMILY are repeated from site to site, and the numbers for PERSON are repeated from family to family. To prevent errors in merging data, use SITE as a primary sort key; FAMILY as a secondary sort key for household-level data; and SITE, FAMILY, and PERSON as sort keys for individual-level data.
In Rounds I through IV, as in all subsequent rounds, a code of “1” for gender indicates that the respondent is male, while a code of “2” indicates a female respondent. For birth years, a code of “00” means that the respondent was born in 1900. A code of “99” means that 1899 was the respondent’s year of birth.
In Round V, the variables SITE5 and FAMILY5 identify a unique household. SITE5, FAMILY5, and PERSON5 identify a unique individual. As in Round I, SITE5 is a geographic descriptor and the numbers for FAMILY5 and PERSON5 are repeated within their respective broader categories.
Rounds VI through XVII
Starting in Round VI, an additional variable was required for unique identification of a household or person. The variables SITE, CENSUSD, and FAMILY identify a unique household, while SITE, CENSUSD, FAMILY, and PERSON identify an individual. As in Round I, SITE is a geographic descriptor and the numbers for CENSUSD, FAMILY, and PERSON are repeated within their respective broader categories.
Rounds XVIII and Above
Starting in Round XVIII, the variables REGION and FAMILY identify a unique household, while REGION, FAMILY, and PERSON identify an individual. REGION is a geographic descriptor and the numbers for FAMILY, and PERSON are repeated within their respective broader categories.
Starting with Round V, each round has a unique individual or family identifier. In both files, this identifier has the same name. For Round V it was AID, for Round VI it was BID, etc. To merge Round V with a later round, use the merge variable AID (at either the household- or individual-level) from the data sets; to merge Round VI with a later round, use BID, etc.
To help researchers construct longitudinal files, the variable IDIND was added to the individual-level data files. This allows people across rounds to be merged easily. If you have older files that do not include idind, we supply a file that allows you to link idind with each individual. That variable, and the variables needed to link with each cross-sectional file, are in Longitudinal_identifiers.zip in the Supplemental Files section of the RLMS-HSE Dataverse.
Do NOT try to merge rounds using any other combination of variables. To do so will generate erroneous results.