Stata 16 New in Bayesian analysis—Multiple chains, predictions, and more
Multiple chains.
Bayesian inference based on an MCMC (Markov chain Monte Carlo) sample is valid only if the Markov chain has converged. One way we can evaluate this convergence is to simulate and compare multiple chains.
The new nchains() option can be used with both the bayes: prefix and the bayesmh command. For instance, you type
. bayes, nchains(4): regress y x1 x2
and four chains will be produced. The chains will be combined to produce a more accurate final result. Before interpreting the result, however, you can compare the chains graphically to evaluate convergence. You can also evaluate convergence using the Gelman–Rubin convergence diagnostic that is now reported by bayes: regress and other Bayesian estimation commands when multiple chains are simulated. When you are concerned about noncovergence, you can investigate further using the bayesstats grubin command to obtain individual Gelman–Rubin diagnostics for each parameter in your model.
Bayesian predictions.
Bayesian predictions are simulated values from the posterior predictive distribution. These predictions are useful for checking model fit and for predicting out-of-sample observations. After you fit a model with bayesmh, you can use bayespredict to compute these simulated values or functions of them and save those in a new Stata dataset. For instance, you can type
. bayespredict (ymin:@min({_ysim})) (ymax:@max({_ysim})), saving(yminmax)
to compute minimums and maximums of the simulated values. You can then use other postestimation commands such as bayesgraph to obtain summaries of the predictions.
The dataset created by bayespredict may include thousands of simulated values for each observation in your dataset. Sometimes, you do not need all of these individual values. To instead obtain posterior summaries such as posterior means or medians, you can use bayespredict, pmean or bayespredict, pmedian. Alternatively, you may be interested in a random sample of the simulated values. You can use, for instance, bayesreps, nreps(100) to obtain 100 replicates.
Finally, you may want to evaluate model goodness of fit using posterior predictive p-values, also known as PPPs or as Bayesian predictive p-values. PPPs measure agreement between observed and replicated data and can be computed using the new bayesstats ppvalues command. For instance, using our earlier example
. bayesstats ppvalues {ymin} {ymax} using yminmax
Multiple datasets in memory in Stata 16
You can now load multiple datasets into memory. You type
. use people
and people.dta is loaded into memory. Next, you type
. frame create counties
. frame counties: use counties
and you have two datasets in memory. people.dta is in the frame named default, and counties.dta is in the frame named counties. Your current frame is still default. Most Stata commands use the data in the current frame. For example, if you typed
. list
then people.dta will be listed. If you typed
. frame counties: list
then counties.dta will be listed. Or you could make counties the current frame by typing
. frame change counties
and list will now list the counties data.
Navigating frames is easy and so is linking them. Imagine that both datasets have a variable named countycode that identifies counties in the same way. Type
. frlink m:1 countycode, frame(counties)
and each person in the default frame is linked to a county in the counties frame. This means you can now use the frget command to copy variables from the counties frame to the current frame. Or you can use the frval() function to directly access the values of variables in the counties frame. For instance, if we have each individual’s income in the default frame and median county income in the counties frame, we can generate a new variable containing relative income by typing
. generate rel_income = income / frval(counties, median_income)
This is just the beginning. While this example uses only two frames, you can have up to 100 frames in memory at once, and you can have many links among those frames.

您可以重组数据,管理变量,并收集各组并重复统计。您可以处理字节,整数,long, float,double和字符串变量(包括BLOB和达到20亿个字符的字符串)。Stata还有一些高级的工具用来管理特殊的数据,如生存/时间数据、时间序列数据、面板/纵向数据、分类数据、多重替代数据和调查数据。





Lasso is a machine-learning technique used for model selection, prediction, and inference.
The new lasso command selects “optimal” predictors for continuous, count, and binary outcomes using deviances from linear, Poisson, logit, or probit regression models.
For instance, if you type
. lasso linear y x1-x500
lasso will select a subset of the specified covariates—say, x2, x10, x11, and x21. You can then use the standard predict command to obtain predictions of y.
If you instead have a binary or count outcome, you can use lasso logit, lasso probit, or lasso poisson in the same way. And if you prefer to select variables using the elastic net or square-root lasso method, you can use the elasticnet or sqrtlasso command.
Sometimes, variable selection or prediction is the final goal of lasso. Other times, you are interested in estimating and testing coefficients. Stata 16 provides 11 commands that allow you to estimate coefficients, standard errors, and confidence intervals and to perform tests for variables of interest while using lasso methods to select from among potential control variables. The commands are
dsregress, dslogit, dspoisson, poregress, pologit, popoisson, poivpoisson, xporegress, xpologit,
xpopoisson, and xpoivregress.
The ds commands perform double-selection lasso, the po commands perform partialing-out lasso, and the xpo commands perform cross-fit partialing-out lasso. They do this for models with continuous, binary, and count outcomes. They can even handle endogenous covariates in models for continuous outcomes. The literature currently discusses many methods for lasso-based inference. We make some of these methods available so that researchers can select their favorite. In fact, there are even more lasso-based methods of inference in the literature, and often researchers may use the tools available in lasso, sqrtlasso, and elasticnet to implement other methods.
The lasso and elasticnet commands are standard lasso tools often requested for variable selection and prediction. The lasso tools for inference implement newer methods developed primarily by econometricians. However, these inference methods will be popular in all disciplines because they provide a method for testing and interpreting coefficients on variables of interest.
Users can easily learn all about the lasso features in the new Lasso Reference Manual.

