Posted on 16 September 2010 by Mitch Abdon
With Stata 11 comes an improved version of -merge-. -merge- combines the observations from another Stata data file (the using dataset) to corresponding observation in the current data file (the master dataset). The observations are matched based on specified variable(s).
What’s different with the new syntax what are its value-added? First, the new syntax helps minimize merging errors. The Stata 11 syntax for -merge- is:
merge type varlist using, options
where “type” can be: 1:1, m:1, 1:m, or m:m
1:1 — Stata expects 1 observation in the master file matches exactly 1 observation in the using file
m:1 — many observations in the master file matches exactly 1 observation in the using file
1:m — 1 observation in the master file is matches more than 1 observations in the using file
m:m — many observations in the master file matches many observations in the using file
By specifying the type of data, Stata will know what to expect and will return an error if either the master or the using database is not consistent with its expectations, thereby minimizing the chance of making errors.
Second, the new command automatically sorts both the master and using datasets according the the variables in varlist—this is my favorite. Gone are the days when you need to open the using file just to sort and save it.
Third, the output includes a table of _merge. In the pre-Stata 11 -merge-, we almost always type “tab _merge” every after merging datasets to make sure that we got it right. Stata 11 saves you this step by automatically reporting the match summaries unless you opt not to by using the option “noreport”.
While the old -merge- syntax will still work in Stata 11, it is worth learning its all new syntax.