The syntax examples below show how to perform some common data management tasks useful in analysing the Innovation Panel data files.
Each task is illustrated with code for Stata. Statements beginning with // are comments. The 6 tasks are:
In this example we will distribute household level information to individuals in those households. We can do this by merging household level file (such as w_hhresp_ip) with an individual level file (such as w_indresp_ip) within the same wave.
// open the household level file
use a_hidp a_hhsize using a_hhresp_ip, clear
// sort it on the household identifier, w_hidp
sort a_hidp
// save this temporary file
save hhinfo, replace
// open the individual level file
use pidp a_hidp a_marstat using a_indresp_ip, clear
// sort it on the household identifier, w_hidp
sort a_hidp
// merge it with the earlier saved file on w_hidp. The output shows how many cases matched
merge m:1 a_hidp using hhinfo
// drop this variable – essential step
drop _merge
save final1, replace
// clean up unwanted files
erase hhinfo.dta
In this example we will summarise individual level information within a household (number of 18-24 year olds in the household) and then match that onto the household level file.
use a_hidp a_hhsize using a_hhresp_ip, clear
sort a_hidp
save hhinfo, replace
use pidp a_hidp a_dvage using a_indall_ip, clear
// create a variable that counts the number of 18-24year olds in each household
bysort a_hidp: egen n1824= sum(a_dvage>=18 & a_dvage<=24)
// keep only first observation for every household
bysort a_hidp: keep if _n==1
// keep only household level information
keep a_hidp n1824
// now merging this household information with the household level file
sort a_hidp
merge 1:1 a_hidp using hhinfo
drop _merge
save final2, replace
erase hhinfo.dta
In this example we will match the information of wives onto that of their partners/spouses.
/* Open the dataset with information on all persons in responding households and keep only those persons who have a spouse/partner in the household*/
use a_hidp a_pno a_hgpart a_sex a_dvage using a_indall_ip if a_hgpart>0, clear
// rename the prefix a_ to something that would indicate that this information relates to the spouse or partner
renpfix a_ sp_
/* rename the spouse/partner pno variable to the respondent pno variable as this will be used to match on to the respondent information. Then sort and save the data*/
rename sp_hgpart a_pno
rename sp_hidp a_hidp
drop sp_pno
sort a_hidp a_pno
save spousepartner, replace
/* Again open the data with information on all persons in responding households*/
use a_hidp a_pno a_hgpart a_sex a_dvage using a_indall_ip if a_hgpart>0, clear
/* rename the prefix a_ to something that would indicate that this information relates to the respondent */
renpfix a_ r_
/* as we want to match on a_hidp and a_pno rename r_hidp and r_pno back to these */
rename r_hidp a_hidp
rename r_pno a_pno
// Now sort and merge with the spouse partner file
sort a_hidp a_pno
merge 1:1 a_hidp a_pno using spousepartner
drop _merge
save final3, replace
erase spousepartner.dta
Example 4: Using the EGOALT file to create household composition variables
In this example we will create a variable that measures the number of siblings in the household using the w_egoalt_ip file.
use b_hidp b_epno b_relationship using b_egoalt_ip, clear
// create a variable that counts the number of siblings in the household
bysort b_hidp b_epno: egen nsiblings = sum(b_relationship>=14 & b_relationship<=17)
lab var nsiblings "number of siblings in household"
// keep one observation per person
bysort b_hidp b_epno: keep if _n==1
sort b_hidp b_epno
save final4, replace
Now this information can be merged with any individual level file.
To match individual level files across two waves into a long format do the following (for more waves add wave specific prefix in the foreach statement):
foreach w in a b {
// open the individual level file
use pidp `w’_jbhas using `w’_indresp_ip, clear
// drop the wave prefix from all variables
renpfix `w’_
// create a wave variable
gen wave=strpos(“ab”, “`w’”)
// save one file for each wave
save temp`w’, replace
}
// open the file for the first wave (wave a_)
use tempa, clear
foreach w in b {
// append the files for second wave onwards
append using temp`w’
}
// save the long file
save final5, replace
// erase temporary files
foreach w in a b {
erase temp`w’.dta
}
The following code shows how to match individual level files across two waves into a wide format. The code can be adapted to handle more waves by adding wave specific prefixes in the foreach statement:
use pidp a_jbhas using a_indresp_ip, clear
sort pidp
save temp, replace
foreach w in b {
use pidp `w’_jbhas using `w’_indresp_ip, clear
sort pidp
merge 1:1 pidp using temp
drop _merge
sort pidp
save temp, replace
}
save final6, replace
erase temp.dta