Basic data management for Understanding Society

Stata commands for basic data tasks

Understanding Society provides syntax for Stata. Below are the Stata commands for basic data tasks to help you get started with using the dataset.

Load your data

Load specific variables of a specific Wave: use pidp a_hidp a_pno a_hiqual_dv a_nchild_dv using a_indresp, clear

Commonly used commands to explore data

Check if pidp uniquely identifies the data: isid pidp

Find all duplicate values in household identifier variable: duplicates report a_hidp

Display variable type, format, and any value/variable labels: describe a_hiqual_dv

Count number of rows (observations) for whom a_hiqual_dv is missing. Can be combined with any logic: count if a_hiqual_dv<0

Recode missing values to missing values in Stata format [.]:recode a_hiqual_dv (-9/-1=.)

Commonly used commands to view data organisation

Show histogram of data, number of missing or zero observations: inspect a_nchild_dv

Report summary statistics (mean, stdev, min, max) for variables: summarize a_hiqual_dv a_nchild_dv

Overview of variable type, stats, number of missing/unique values: codebook a_hiqual_dv a_nchild_dv

Sort in order, first by hh identifier (descending) then by person number (ascending): gsort –a_hidp +a_pno

List the a_hidp a_pno a_hiqual_dv for observations with a_nchild_dv==2 and separates list by household identifier: list a_hidp a_pno a_hiqual_dv if a_nchild_dv ==2, sepby(a_hidp)

Opens the data editor: browse

Commonly used commands to summarise data

One-way table: number of rows with each value of a_hiqual_dv. Creates binary variable for every a_hiqual_dv value in a new variable, hiqual, include missing values: tabulate a_hiqual_dv, mi gen(hiqual)

Cross-tabulate number of observations for each combination of a_hiqual_dv and a_nchild_dv, including missing values (two-way table): tabulate a_hiqual_dv a_nchild_dv, mi

Create compact table of summary statistics: tabstat a_nchild_dv, by(a_hiqual_dv) stat(mean sd n)

Create a flexible table of summary statistics; display stats formats numbers for all data: table a_hiqual_dv, contents(mean a_nchild_dv sd a_nchild_dv) f(%9.2fc) row

Merge in additional variables from the household response data file a_hhresp; keepusing() allows you to select specific variables to add (Here: housing tenure and net household income): merge m:1 a_hidp using a_hhresp.dta, keepusing(a_tenure_dv a_fihhmnnet1_dv)

Show how many records were included only in the master file (here: indresp)[_m=1], only included in the using file (here: hhresp) [_m=2], or included in both files [_m=3]: tabulate _merge

Keep only observations who have an individual and hh response record: keep if _merge==3

Delete the _merge variable: drop _merge

Saving your data

Save your project dataset (save the data in your directory and give the data a name of your choice): save mydirectory/mydata.dta, replace