Functions and marker symbols


Stata’s -graph twoway function- draws the line plot of a specified function. Example, the half of a parabola with the equation y = x^2 can be drawn by typing:

tw function y = 4*x^2


The default is that the function is drawn over the range [0,1]. If you want to draw the other side of the parabola or change the range, you can specify the range as follows:

tw function y = 4*x^2, range(-2 2)



It is easy to change the line attributes of the plot. For example, if you want to change the color, width, and pattern of the line use lcolor, lwidth, and lpattern options:

tw function y = 4*x^2, range(-2 2) lcolor(red) lwidth(medthick) lpattern(-)



These options are particularly helpful when you have many functions to plot. For example:

tw function y = x^2, range(-2 2) || ///
function y = x^3, range(-2 2) lpattern(-) || ///
function y = x^4, range(-2 2) lpattern(.-) ///
legend(label(1 “y = x{sup:2}”) ///
label(2 “y = x{sup:3}”)  ///
label(3 “y = x{sup:4}”) ///
cols(3) pos(5) ring(0) region(lcolor(none)))


[Note: For more text in graph options (e.g. bold, Greek letters, font type), see -help graph text-. See also Writing Greek letters and other symbols in graphs.]

But suppose you prefer to use marker symbols rather than line patterns to differentiate the line plots, how can you specify this option? Use the recast option (-help advanced_options-):

tw function y = x^2, range(-2 2) recast(connected) msymbol(O) || ///
function y = x^3, range(-2 2) recast(connected) msymbol(T) n(20) || ///
function y = x^4, range(-2 2) recast(connected) msymbol(S) n(20)  ///
legend(label(1 “y = x{sup:2}”) ///
label(2 “y = x{sup:3}”)  ///
label(3 “y = x{sup:4}”) ///
cols(3) pos(5) ring(0) region(lcolor(none)))


The recast option will tell Stata to treat the plot as a new plot. In our case, we specified that the new plot type is a connected graph, for which you can specify marker symbols. What is “n(#)” for? This tells Stata to draw the plot at 20 points. If this were not specified, as in the plot for y = x^2, the plot will be connected by 300 (the default) markers — ugly.

The order() suboption makes label() redundant


Before, I thought that the order() suboption for graphic legend() option is only used to specify which legend keys will be shown and in which order you want them to be displayed. But recently I learned that it also makes the label() suboption redundant. To illustrate, we use census.dta which is installed with Stata.

sysuse census.dta, clear

#delimit ;
tw (scatter  marriage divorce, mlabel(state2))
(lfit marriage divorce),
ytitle(“Number of marriages”)
ylabel(, angle(0))
legend(label(2 “Fitted regression line”) order(2) ring(0) pos(5))
;
#delimit cr

#delimit ;
tw (scatter  marriage divorce, mlabel(state2))
(lfit marriage divorce),
ytitle(“Number of marriages”)
ylabel(, angle(0))
legend(order(2 “Fitted regression line”) ring(0) pos(5))
;
#delimit cr

Both commands above will result in the same graph (see below). The legend key for the first- tw- plot, which by default is the y-variable name or label (if exists),  is not displayed.



Another advantage of using order() over label() is that label(), by default, is limited to hold 15 keys.


Thanks to Derek Wagner of Stata for his reply to our query on maximum number of keys using label() and for pointing out the capabilities of order().
Before, I thought that the order() suboption for graphic legend() option is only used to specify which legend keys will be shown and in which order you want them to be displayed. But recently I learned that it also makes the label() suboption redundant. To illustrate, we use census.dta which is installed with Stata. 

sysuse census.dta, clear

#delimit ;
tw (scatter  marriage divorce, mlabel(state2))
(lfit marriage divorce),
ytitle(“Number of marriages”)
ylabel(, angle(0))
legend(label(2 “Fitted regression line”) order(2) ring(0) pos(5))
;
#delimit cr

#delimit ;
tw (scatter  marriage divorce, mlabel(state2))
(lfit marriage divorce),
ytitle(“Number of marriages”)
ylabel(, angle(0))
legend(order(2 “Fitted regression line”) ring(0) pos(5))
;
#delimit cr

Both commands above will give the same graph.

Another advantage of using order() over label() is that label(), by default, is limited to hold 15 keys.

______________________

Thanks to Derek Wagner of Stata for his reply to our query on maximum number of keys using label().

Writing Greek letters and other symbols in graphs


Greek letters, math symbols, and other symbols (such as Copyright and Trademark symbols) can be incorporated in Stata graphs with the use of SMCL tags. SMCL (Stata Markup and Control Language; pronounced as “smickle”) is used to modify all text output in Stata.

The tag for Greek letters or symbols are of the form {&name}. To illustrate, we will use auto.dta:


#delimit ;

sysuse auto;  /* see note below */

sum price, meanonly;
local p=r(mean);
sum mpg, meanonly;
local m=r(mean);

tw scatter price mpg, xline(m') yline(p’) note(“The verical and horizontal lines correspond” “to {&mu}{subscript:mpg} and {bf:{&mu}{subscript:price}}, respectively.”, pos(2) ring(0)) size(vlarge);

#delimit cr




{&mu} –> lower case Greek letter mu
{subscript:mpg} –> display mpg as a subscript
{bf:{&mu}{subscript:price}} –> display mu with subscript price as bold

For the complete list of symbols, type: “help graph_text”.

Note: -sysuse- is used to to load example datasets that are installed with Stata. To list all data  installed with Stata, type: “sysuse dir”.

Drawing scatter plots


The graph command -twoway scatter- (or -tw scatter-) draws scatter plots. Here we draw the scatter plots of the share of electronics in total export of the Philippines and Malaysia over time.

In Figure 0, we draw a scatter plot of Philippine export shares in electronics. If not specified, Stata will use default options for marker color, size, and shape, axis titles, etc.

tw scatter  exportshare year if reporter==”Philippines”, title(“Figure 0”)

In Figure 1, we specified marker and axis options and added a note.

#delimit ;
tw (scatter  exportshare year if reporter==”Philippines”,
ytitle(“Export Share in Electronics* %”) ylabel(0(10)60, angle(0))
xtitle(“”) xlabel(1965(10)1995 2007)
title(“Figure 1”)
note(“*SITC 2-digit category 77”)
;

In Figure 2, we added Malaysia’s export shares for the same period. By default, Stata will use new colors for added graphs unless this is specified. Here we moved the title to the bottom of the chart by using the suboption pos, which follows clock positions. The default for the title is at 12 o’clock position, pos(12); but we moved it to the 6 o’clcok position, pos(6). We also added the legend() option.

#delimit ;
tw (scatter  exportshare year if reporter==”Philippines”,
msize(large) mlabel(year))
(scatter  exportshare year if reporter==”Malaysia”,
msize(large) mlabel(year)),
ytitle(“Export Share in Electronics* %”) ylabel(0(10)60, angle(0))
xtitle(“”) xlabel(“”)
title(“Figure 2”, pos(6))
note(“*SITC 2-digit category 77”)
legend(label(1 “Philippines”) label(2 “Malaysia”))
;

In Figure 3, we specified marker colors and labels, draw lines connecting the dots (connect), and changed the position of marker labels (mlabpos). We also made changes on the legend(): cols(1) tells Stata to present the legend in 1 column, pos(9) moves the legend to the 9 o’clock position, and ring(0) moves the legend inside the chart.

#delimit ;
tw (scatter  exportshare year if reporter==”Philippines”,
msize(large) mcolor(dkgreen) mlabel(year) mlabpos(12) connect(l))
(scatter  exportshare year if reporter==”Malaysia”,
msize(large) mcolor(dkorange) mlabel(year) mlabpos(6) mlabcolor(black) connect(l)),
ytitle(“Export Share in Electronics* %”) ylabel(0(10)60, angle(0))
xtitle(“”) xlabel(“”)
title(“Figure 3″, pos(10) ring(0))
note(”          *SITC 2-digit category 77″)
legend(cols(1) label(1 “Philippines”) label(2 “Malaysia”) pos(9) ring(0))
;

In Figure 4, we use export share as weights.

#delimit ;
tw (scatter  exportshare year [aw=exportshare],
msymbol(Oh)mcolor(red) mlabpos(12)),
ytitle(“Export Share in Electronics* %”) ylabel(0(10)60, angle(0))
xtitle(“”) xlabel(1965(10)1995 2007)
title(“Figure 4”, pos(5) ring(0))
note(“*SITC 2-digit category 77”)
;

Stacked bars


-graph bar- and -graph twoway bar- draw Stata (vertical) bar charts. -graph bar- draws bar charts over a categorical X variable and has more options than -graph twoway bar-, which draws bar charts with numerical X and Y values. Here, we use -graph bar- to draw stack bar graphs (figures 1 and 2).

In figure 1, we show the composition of Philippine exports according to the Leamer’s classification of tradeable goods at different time periods. The command to draw this graph is:

Figure 1


#delimit ;
graph bar (sum) value,
over(leamer)
over(year)
asyvars stack
legend(cols(3) size(vsmall))
ytitle(“Export Value” “(in million US$)”)
ylabel(, angle(0) format(%12.0gc))
title(Philippines)
subtitle(1965-2007);
#delimit cr

The (stat) option, (sum), adds up the export values for commodities that belong to a group for a specific year, as we have specified in our over() options. The (stat) option, which allows you to specify all stats options used by the command -collapse- (e.g. mean (default), count, max, etc.) is very convenient as it saves you from  constructing another dataset just to draw the graphs.

By typing “asyvars” and “stack”, we are specifying that the variable in the first over() option will be treated as Y-axis variables, and that these y-variables will be presented as stacked bars. Note that, with a long dataset,  the command:

graph bar value, over(leamer) asyvars

will draw the same graph as:

graph bar value_leamer1 value_leamer2 … value_leamer10

with a wide dataset.

The ordering of over() matters. In figure 2, we have interchanged the order of over(leamer) and over(year).

Figure 2


#delimit ;
graph bar (sum) value,
percentages
over(year)
over(leamer, label(angle(90)))
asyvars stack
legend(cols(5) size(small) colfirst)
ytitle(“Export Share, %“)
ylabel(, angle(0) format(%10.0gc))
title(Philippines)
subtitle(by Leamer’s classification)
;
#delimit cr

By specifying “percentages” for figure 2, we are specifying that the share of value the total, instead of the absolute values, will be reported.

Also note how the legends in the 2 charts differ. The general option to control the legend for all Stata graphs is the option legend(legend options). In the examples above we have use the following legend options:

cols() //number of columns
size() //size of texts
colfirst //specifies that the order is top to bottom rather than left to right

For more legend options, type “help legend_option”.

-histogram- and the -addplot- option




The beauty of Stata graphics lies in their flexibility—they can be highly customized. To illustrate, the histogram of proximity (right photo) was drawn using the command:

#delimit ;
histogram proximity, bcolor(ltblue) width(0.025) start(0) freq
addplot(histogram proximity if proximity>0.4, bcolor(yellow) width(0.025) start(0) freq ||
histogram proximity if proximity>0.55, bcolor(blue) width(0.025) start(0) freq ||
histogram proximity if proximity>0.65, bcolor(red) width(0.025) start(0) freq)
ytitle(“Number of Links”)
ylabel(0(10000)30000, format(%8.0gc) angle(0))
xlabel(0.20 0.40 0.55 0.65 0.85)
xtitle(“Proximity”)
note(“Note: The total number of links for the 779 products is (779×778)/2=303,031.”)
legend(off)
;

[Note: Since graph commands can be very long, they can be managed better by changing the delimiter to “;” ]

I know! Isn’t it easier to draw this in Microsoft Excel? If you already have a summary of the frequencies in Excel and you only need to draw it once, maybe it is easier to draw it there. But if you are to draw similar charts for 20 countries at different periods in time, Stata will make your life so much easier. Going back to our example above, this command draws 4 overlaying histograms of the same variable, proximity, in different colors.

The first line “histogram proximity, bcolor(ltblue) width(0.025) start(0) freq” draws the histogram in light blue wherein the width of each bin is equal to 0.025. By specifying start(0), we force the minimum of the range to be zero instead of the actual minimum in the data; and by specifying the option “freq”, we want to draw the frequencies not the densities or percentages. The second line uses the -addplot- option to overlay 3 more histograms in different colors. The -addplot- is an option that add plots to graphs that are not of  Stata’s -graphcommand (we will elaborate on this in future post), such as -histogram-.  If we only need to show the histogram of proximity, -addplot- here is not necessary. But, since we want to highlight different ranges of proximity values, we use -addplot- to create the different color effect.  The rest of the command are explained below:

ytitle(“Number of Links”)
/* y-axis title */

ylabel(0(10000)30000, format(%8.0gc) angle(0))
/* specifies that: (1) the y-axis will be labeled from
0 to 30000 with intervals of 10000 (2) the format of the
lables is a generic format with comma; and (3) the
orientation of the labels is horizontal */

xlabel(0.20 0.40 0.55 0.65 0.85)
/* specifies that the x-axis will be labed by the
specified numbers only */

xtitle(“Proximity”)
/* x-axis title */

note(“Note: The total number of links for the 779 products is (779×778)/2=303,031.”)
/*create a note */

legend(off)
/* supresses the legends */

All these options are general -graph- options. Once you are familiar with these most commonly used options,  you can forget Excel charts.