Stata commands for basic data tasks
Understanding Society provides syntax for Stata. Below are the Stata commands for basic data tasks to help you get started with using the dataset.
Load your data
Load specific variables of a specific Wave: use pidp a_hidp a_pno a_hiqual_dv a_nchild_dv using a_indresp, clear
Commonly used commands to explore data
Check if pidp uniquely identifies the data: isid pidp
Find all duplicate values in household identifier variable: duplicates report a_hidp
Display variable type, format, and any value/variable labels: describe a_hiqual_dv
Count number of rows (observations) for whom a_hiqual_dv is missing. Can be combined with any logic: count if a_hiqual_dv<0
Recode missing values to missing values in Stata format [.]:recode a_hiqual_dv (-9/-1=.)
Commonly used commands to view data organisation
Show histogram of data, number of missing or zero observations: inspect a_nchild_dv
Report summary statistics (mean, stdev, min, max) for variables: summarize a_hiqual_dv a_nchild_dv
Overview of variable type, stats, number of missing/unique values: codebook a_hiqual_dv a_nchild_dv
Sort in order, first by hh identifier (descending) then by person number (ascending): gsort –a_hidp +a_pno
List the a_hidp a_pno a_hiqual_dv for observations with a_nchild_dv==2 and separates list by household identifier: list a_hidp a_pno a_hiqual_dv if a_nchild_dv ==2, sepby(a_hidp)
Opens the data editor: browse
Commonly used commands to summarise data
One-way table: number of rows with each value of a_hiqual_dv. Creates binary variable for every a_hiqual_dv value in a new variable, hiqual, include missing values: tabulate a_hiqual_dv, mi gen(hiqual)
Cross-tabulate number of observations for each combination of a_hiqual_dv and a_nchild_dv, including missing values (two-way table): tabulate a_hiqual_dv a_nchild_dv, mi
Create compact table of summary statistics: tabstat a_nchild_dv, by(a_hiqual_dv) stat(mean sd n)
Create a flexible table of summary statistics; display stats formats numbers for all data: table a_hiqual_dv, contents(mean a_nchild_dv sd a_nchild_dv) f(%9.2fc) row
Merge in additional variables from the household response data file a_hhresp; keepusing() allows you to select specific variables to add (Here: housing tenure and net household income): merge m:1 a_hidp using a_hhresp.dta, keepusing(a_tenure_dv a_fihhmnnet1_dv)
Show how many records were included only in the master file (here: indresp)[_m=1], only included in the using file (here: hhresp) [_m=2], or included in both files [_m=3]: tabulate _merge
Keep only observations who have an individual and hh response record: keep if _merge==3
Delete the _merge variable: drop _merge
Saving your data
Save your project dataset (save the data in your directory and give the data a name of your choice): save mydirectory/mydata.dta, replace