Less than 2 hours flight…
In a Stata training, one of the students wondered why after importing an Excel file of financial indicators into Stata some were read as strings. A quick browse at the data indicates the presence of hyphens (“-“) and that these were used in different ways: one to indicate a negative number and another to indicate a missing observation.
How do we convert these variables to numeric as
destringreturns an error?
. destring GRChange NIChange, replace GRChange contains nonnumeric characters; no replace NIChange contains nonnumeric characters; no replace
First, we use
charlistto check whether “-” is the only non-numeric character that complicates our use of destring:
. charlist GRChange -.0123456789 . charlist Niinmill -.0123456789 . charlist NIChange -.0123456789
charlistresults show that for the variables we checked “-” is all that gives us trouble. What happens when we use the
. destring GRChange Niinmill NIChange, replace ignore(-) GRChange: characters - removed; replaced as double (47 missing values generated) Niinmill: characters - removed; replaced as double (1 missing value generated) NIChange: characters - removed; replaced as double (393 missing values generated)
Yes, the variables
GRChange Niinmill NIChangewere converted to numeric BUT we messed the data. BIG TIME. We cannot ignore the “-” that indicates a negative value.
One solution is to tag the negative values before using
destring, ignore(-). How do we discriminate between the hypthens used to indicate negative values and those that indicate missing values? In the data that we have, negative observations are those that starts with “-” and followed by a number or a decimal point (e.g., -2.5, -0.135, -.34) and missing observations are those with only “-” (there are no observations with “-.”).
Following the patterns we observed above, we use regular expressions to generate a new variable that identifies the negative observations, for example:
gen tagNEGGRchange = regexm(GRChange, "^-[. 0-9]")
This creates the variable tagNEGGRchange with values 1 for negative observations and 0 otherwise. Then we can destring the GRChange ignoring “-” and creating the new variable GRChangeNew.
destring GRChange, ignore(-) gen(GRChangeNew)
Then finally bring back the “-” to the originally negative observations by negating the “destringed” values using the generated tag as qualifier:
replace GRChangeNew = -GRChangeNew if tagNEGGRchange==1
See also -destring- complication.
Later… let’s catch that elusive bandwidth in this part of the world to post this in the world wide web.