Stata Video Tutorials

Webpage Contents (navigate to the section you need)

Note: Example commands are included in some cases. Simply replace italicized words with your variables.

    • catvar = categorical variable
    • groupvar = grouping (categorical) variable
    • intvar = interval variable
    • indvar = independent variable
    • depvar = dependent variable

 

ACCESSING STATA (return to contents)

 

ENTERING OR IMPORTING DATA (return to contents)

 

DATA MANAGEMENT AND PREP (return to contents)

  • Know your data
    • codebook var1 var2
  • If you want to add notes to your data set -- about the data set itself; about particular variables, etc. -- you can do so through the notes command. Below are the commands for general data set notes and for a particular variable.
    • notes: text
    • notes [this simple command will show you all of the notes attached to your data set]
    • notes var: text
    • notes var
  • Do-files
  • Changing variable names from upper to lower case (and vice versa)
    • varcase
    • There may also be times when you have variable names that include both upper and lower case letters, or some variables that are upper case and others lower case. The command varcase, in those instances, will just reverse the casing. Another command that will switch everything to lower case across the board is the following. The * will have Stata make the change for all variables. Alternatively, you could specify a particular variable(s) in its place.
      • rename *, lower
  • Cloning a variable and renaming a variable
    • clonevar newvar = oldvar
    • rename oldvarname newvarname
  • Recoding categorical variables(e.g., creating dummies; reordering response categories) Keep in mind that in the commands below the first value you list in the ( ) is the category value in the variable you want to recode; the second number refers to the value you want that category to be assigned in the new variable you are generating as part of the command.
    • recode catvar (# = # "label1")(# = # "label2"), generate(newvar) label(newvar) test
    • recode catvar (#/# = # "label1")(# # = # "label2"), generate(newvar) label(newvar) test
    • tabulate catvar, gen(catvar) [this generates dummies from a categorical variable]
  • Reverse coding categorical variables
    • revrs var
  • If you need to see the values signed to all of the categories of a categorical variable (and codebook isn't showing you all of them), you can use the following command to see those values:
    • fre catvar
  • Changing numeric values in other data formats (e.g., -9) to Stata's version of missing values (.)
    • mvdecode _all, mv(-9)
    • mvdecode var, mv(-9)
  • Adding variable and value labels
  • Generating new variables from existing variables
  • Creating a composite variable
    • Start by recoding variables as needed (intuitive directions, etc.)
    • Decide whether items will need to be standardized due to varying value sets, then:
      • alpha var1 var2 var3 var4, item
        • or
      • alpha var1 var2 var3 var4, std item
    • Decide which variables should be included in the final composite by examining the alpha scores. Then you can use the alpha command to actually generate the new composite variable. Notice that, assuming you want to base the composite on the mean value of the components, that you can set a minimum number of values that must be present before a composite is calculated (e.g., 2 of the set).
      • alpha var1 var2 var3, gen(compvar) min(2)
        • or
      • alpha var1 var2 var3, gen(compvar) min(2) std
    • If all of your variables share the same value set, then no standardization is needed. You then have the option of either basing the composite on the mean of those values, or adding them up. If you add them up (rowtotal), be sure to calculate a composite for only those cases that have no missing values for any of the component variables. Instead of using the alpha command to generate the variable, you'll need to first create a variable that counts up the number of missing values for your cases with respect to the component variables. You can then use that variable to set the condition that a composite be calculated only for those cases with no missing values.
      • egen float compmiss = rowmiss(var1 var2 var3)
      • tab compmiss
      • egen float compvar = rowtotal(var1 var2 var3) if compmiss==0

 

DESCRIPTIVE STATISTICS AND GRAPHS (return to contents)

 

BIVARIATE ANALYSES (return to contents)

Cross-tabulations and Chi-Squared (return to contents)

T-tests (return to contents) (Note: Stata has changed the menu system for t-tests)

  • One-way analysis of variance(including Bonferroni test and effect size measure)
    • oneway intvar catvar, tabulate bonferroni
    • If looking for a measure of effect size, run the following ANOVA command; it provides an R-squared:
      • anova intvar catvar
  • Complementary graphing options for ANOVA:
    • See Box Plot video to see a complementary graphing option for ANOVA
    • Or, run the margins and marginsplot commands following an anova command.
      • anova intervalvar catvar
      • margins catvar
      • marginsplot, xdimension(catvar) recast(bar)
      • marginsplot, xdimension(catvar) recast(dot)
  • Kruskal-Wallis rank test (nonparametric alternative to ANOVA)
    • kwallis var, by(catvar)

Correlation and Simple Regression (return to contents)

 

MULTIVARIATE ANALYSES (return to contents)

Multiple OLS Regression (return to contents)

  • Multiple OLS regression
    • regress depvar indvar1 indvar2
    • regress depvar indvar1 indvar2, beta
    • regress depvar indvar1 indvar2, robust [if concerns about normality of depvar]
    • nestreg: regress depvar (1stvar) (2ndvar) (3rdvar 4thvar)
  • Graphing options for OLS regression (run after your regression command):
    • coefplot, drop(_cons) xline(0) nolabel
    • coefplot, drop(_cons) xline(0) msymbol(d) mcolor(white) levels(99 95 90 80 70) ciopts(lwidth(3 ..) lcolor(*.2 *.4 *.6 *.8 *1)) legend(order(1 "99" 2 "95" 3 "90" 4 "80" 5 "70") row(1)) nolabel
    • coefplot, drop(_cons) xline(0) msymbol(d) cismooth nolabel
    • Or, run the margins and marginsplot commands after your regression:
    • margins, dydx(*) post
    • marginsplot, horizontal xline(0) yscale(reverse) recast(scatter)
  • Semipartial correlations (provides measures of contribution to R-squared for each variable)
    • pcorr2 depvar indvar1 indvar2 indvar3
  • Regression diagnostics -- Checking for Multicollinearity and Outliers
    • Calculating Cook's d to identify influential cases (outliers):
      • predict cooksd, cooksd
      • list id cooksd if cooksd > 4/n
    • Checking for multicollinearity
      • estat vif
  • Adjusting for sampling weights
    • regress depvar indvar1 indvar2 [pweight = weightvar]

Multiviarate Logistic Regression (return to contents)

  • Logistic regression
    • logistic depvar indvar1 indvar2
    • listcoef, help percent
    • nestreg: logistic depvar (1stvar) (2ndvar) (3rdvar 4thvar)
  • Graphing options for logistic regression results (run immediately after your logistic regression). These will graph the predicted probability of Y for different values in your selected categorical and/or interval independent variable(s).
    • margins catvar, atmeans
    • marginsplot, xdimension(catvar)
    • Or
    • margins catvar, atmeans
    • marginsplot, xdimension(catvar) recast(bar)
    • marginsplot, xdimension(catvar) recast(dot)
    • Or, to graph odds ratios (immediately after a logistic regression):
    • coefplot, drop(_cons) xline(1) eform xtitle(Odds Ratio) nolabel
  • Calculating predicted probabilities
  • Ordered logistic regression
    • ologit depvar indvar1 indvar2
Report an issue - Last updated: 05/03/2022