Using Stata to make sense of my Uber data

I tried Uber in late May and since then it has been 131 Uber rides covering 1,200 kilometers and 80 hours on the road. Uber (and GrabTaxi) has eliminated the wait under the heat (and rain) and the dealing with the assholeness of most taxi drivers here in Metro Manila. But what I love most about Uber, apart from their customer service, is the data they send. Trip receipts are automatically sent as soon as the trip has ended. These do not only show how much I am charged but include time, distance, fare disaggregated by time and distance, and many more. GrabTaxi receipts, on the other hand, only show amount paid and manually encoded by drivers.
Continue reading

Putting observations in order

Is it necessary to put observations in a certain order? In a number of cases, yes. The most obvious case is when you are using the qualifier -in- to specify a subset in your data. For example,

drop in 1/100               /* Drops the observations from line 1 to line 100 /
keep in 30/l                   /
Keeps the observations from line 30 to the last line, denoted by small letter l */

If the observations were in arbitrary order, then you wouldn’t know which ones were dropped or kept, would you? This is when -sort- and -gsort- come in handy. These two put the observations in a certain order. The -sort- command put the observations in ascending order based on a specific variable or a set of variables. The basic syntax for -sort- is:

sort varlist

If varlist is only one variable, then Stata will sort the observations in ascending order based on that variable. If there are 2 variables, var1 and var2, after sort, Stata will sort the observations according to var1 first. Then, for observations with common var1, Stata will sort them according to var2. If there are more than 2 variables, then the observations will be sorted by the first variable first, then the second variable second, and so on. -gsort-, on the other hand, can sort the observations in either ascending or descending order. The basic syntax for -gsort- is:

gsort [+ or -] varname [+ or -] varname [+ or -] varname

A plus sign (+) before the varname instructs Stata to order the observations in ascending order, while a minus sign (-) implies descending order of observations. For example, to sort the countries by their geographical region (regn) in alphabetical order and by GDP per capita (gdppc), from highest to lowest:

gsort + regngdppc

The -by varlist:- prefix also requires the observations to be sorted according to the varlist. But, as we have discussed in “_n, its big brother _N, and Super -bysort-,” this can be conveniently written as:

bysort varlist:


by varlist, sort: