1 item has been added to your cart.

Estimate mediation effects, analyze the relationship between an unobserved latent concept such as depression and the observed variables that measure depression, model a system with many endogenous variables and correlated errors, or fit a model with complex relationships among both latent and observed variables. Fit models with continuous, binary, count, ordinal, fractional, and survival outcomes. Even fit multilevel models with groups of correlated observations such as children within the same schools. Evaluate model fit. Compute indirect and total effects. Fit models by drawing a path diagram or using the straightforward command syntax.

Learn about structural equation modeling (SEM). Model specification

Use the SEM Builder or command language
SEM Builder uses standard path diagrams
Command language is a natural variation on path diagrams
Group estimation is as easy as adding group(sex). Also easily add or relax constraints—ginvariant(mcoef) constrains all coefficients in the measurement model to be equal across groups. Or add paths for some groups but not others.

SEM Builder

Drag, drop, and connect to create path diagrams
Estimate models from path diagrams
Display results on the path diagram
Save and modify diagrams
Tools to create measurement and regression components
Set constant and equality constraints by clicking
Complete control of how your diagrams look

Watch Using the SEM Builder in Stata tutorial.

Classes of models for linear SEM

Linear regression
Multivariate regression
Path analysis
Mediation analysis
Measurement models
Confirmatory factor analysis (CFA)
Multiple indicators and multiple causes (MIMIC) models
Latent growth curve models
Hierarchical CFA
Correlated uniqueness models
Arbitrary structural equation models

Additional classes of models for generalized SEM

Generalized linear models
Item response theory models
Measurement models with binary, count, and ordinal measurements
Multilevel CFA models
Multilevel mixed-effects models
Latent growth curve models with generalized-linear responses
Multilevel mediation models
Latent class analysis (LCA)
Selection models
- with random intercepts and slopes
- with binary, count, and ordinal outcomes
Endogenous treatment-effect models
Any multilevel structural equation models with generalized-linear responses

Latent predictors of survival outcomes
Path models, growth curve models, and more
Weibull, exponential, lognormal, loglogistic, or gamma models
Survival outcomes with other outcomes

Linear and generalized-linear responses

Models for continuous, binary, count, ordinal, and nominal outcomes
Thirteen distribution families
- Gaussian
- Bernoulli
- Binomial
- Poisson
- Negative binomial
- Ordinal
- Multinomial
- Beta
- Exponential
- Gamma
- Lognormal
- Loglogistic
- Weibull
Five links
- Identity
- Log
- Logit
- Probit
- Cloglog
Support for common regression models: linear, logistic, probit, ordered logit, ordered probit, Poisson, multinomial logistic, tobit, interval measurements, and more

Multilevel models

Two-, three-, and higher-level structural equation models
Multilevel mixed-effects models
Random intercepts and random slopes
Crossed and nested random effects

Estimation methods for linear SEM

ML—maximum likelihood
MLMV—maximum likelihood for missing values; sometimes called FIML
ADF—asymptotic distribution free, meaning GMM (generalized method of moments) using ADF weighting matrix

Estimation methods for generalized SEM

Maximum likelihood
Mean-variance or mode-curvature adaptive Gauss–Hermite quadrature
Nonadaptive Gauss–Hermite quadrature
Laplace approximation

Standard-error methods

OIM—observed information matrix
EIM—expected information matrix
OPG—outer product of gradients
Satorra—Bentler estimator
Robust—distribution-free linearized estimator
Cluster–robust—robust adjusting for correlation within groups of observations
Bootstrap—nonparametric bootstrap and clustered bootstrap
Jackknife—delete-one, delete-n, and clustered jackknife

Survey support for linear SEM and generalized SEM

Sampling weights and stage-level weights
Stratification and poststratification
Clustered sampling at one or more levels

Postestimation Selector

View and run all postestimation features for your command
Automatically updated as estimation commands are run

Summary statistics data (SSD)

Fit linear SEMs on observed or summary (SSD) data
Fit models on covariances or correlations and optionally variances and means
SSD may be group specific
Easily create and manage SSDs
Build SSDs from original (raw) data for distribution or publication
Automatic corruption/error checking and repairing
Electronic signatures

Starting values

Automatic
May specify for some or all parameters
Grid search available
May fit one model, subset or superset, and use fitted values for another model

Identification

Automatic normalization (anchoring) constraints provide scale for latent variables; may be overridden

Reliability

May specify fraction of variance not due to measurement error

Direct and indirect effects for linear SEM

Confidence intervals
Unstandardized or standardized units

Overall goodness-of-fit statistics for linear SEM

Model vs. saturated
Baseline vs. saturated
RMSEA, root mean squared error of approximation
AIC, Akaike's information criterion
BIC, Bayesian information criterion
CFI, comparative fit index
TLI, Tucker–Lewis index, a.k.a. nonnormed fit index
SRMR, standardized root mean squared residual
CD, coefficient of determination

Equation-level goodness-of-fit statistics for linear SEM

R-squared
Equation-level variance decomposition
Bentler–Raykov squared multiple-correlation coefficient

Group-level goodness-of-fit statistics for linear SEM

SRMR
CD
Model vs. saturated chi-squared contribution

Residual analysis for linear SEM

Mean residuals
Variance and covariance residuals
Raw, normalized, and standardized values available

Parameter tests

Modification indices
Wald tests
Score tests
Likelihood-ratio tests
Easy to specify single or joint custom tests for omitted paths, included paths, and relaxing constraints
Linear and nonlinear tests of estimated parameters
Tests may be specified in standardized or unstandardized parameter units

Group-level parameter tests

Group invariance by parameter class or user specified

Linear and nonlinear combinations of estimated parameters

Confidence intervals
Unstandardized or standardized units

Assess nonrecursive system stability

Predictions for linear SEM

Observed endogenous variables
Latent endogenous variables
Latent variables (factor scores)
Equation-level first derivatives
In- and out-of-sample prediction; may estimate on one sample and form predictions in another

Predictions for generalized SEM

Means of observed endogenous variables—probabilities for 0/1 outcomes, mean counts, etc.
Linear predictions of observed endogenous variables
Latent variables using empirical Bayes means and modes
Standard errors of empirical Bayes means and modes
Observed endogenous variables with and without predictions of latent variables
Density function
Distribution function
Survivor function
Predict observed endogenous variables marginally with respect to latent variables
User-defined nonlinear predictions

May be used with postestimation features
May be saved to disk for restoration and use later
Displayed in standardized or unstandardized units
Optionally display results in Bentler–Weeks form
Optionally display results in exponentiated form as odds ratios, incidence rate ratios, and relative risk ratios
All results accessible for community-contributed programs

Factor variables with generalized SEM

Automatically create indicators based on categorical variables
Include polynomial terms
Perform contrasts of categories/levels

Marginal analysis

Estimated marginal means
Marginal and partial effects
Average marginal and partial effects
Least-squares means
Predictive margins
Adjusted predictions, means, and effects
Works with multiple outcomes simultaneously
Integrates over latent variables
Contrasts of margins
Pairwise comparisons of margins
Profile plots
Graphs of margins and marginal effects

Contrasts for generalized SEM

Analysis of main effects, simple effects, interaction effects, partial interaction effects, and nested effects
Comparisons against reference groups, of adjacent levels, or against the grand mean
Orthogonal polynomials
Helmert contrasts
Custom contrasts
ANOVA-style tests
Contrasts of nonlinear responses
Multiple-comparison adjustments
Balanced and unbalanced data
Contrasts of means, intercepts, and slopes
Graphs of contrasts
Interaction plots

Pairwise comparisons for generalized SEM

Compare estimated means, intercepts, and slopes
Compare marginal means, intercepts, and slopes
Balanced and unbalanced data
Nonlinear responses
Multiple-comparison adjustments: Bonferroni, Sidak, Scheffe, Tukey HSD, Duncan, and Student–Newman–Keuls adjustments
Group comparisons that are significant
Graphs of pairwise comparisons

Explore more about SEM in Stata.

Additional resources

Structural Equation Modeling Reference Manual
Discovering Structural Equation Modeling Using Stata, Revised Edition by Alan C. Acock
In the spotlight: SEM for economists (and others who think they don't care)
In the spotlight: Path diagram for multinomial logit with random effects
In the spotlight: Meet Stata's new xtmlogit
Structural equation modeling using Stata training course
Structural equation modeling (SEM) flyer

See New in Stata 18 to learn about what was added in Stata 18.