-destring- uncomplicated


In a comment to the previous post destring complication: negative numbers, Nick Cox pointed out “the most important advise” in using destring: “never destring, replace unless you are absolutely sure that you are right or are willing to do things again if you made the wrong decision. The generate() option is there for a purpose.”

In addition, his comment point to simpler solution than using regular expressions.



Going back, I actually tried to replace “-” with “” for GRChange before using destring but it didn’t work. And so I thought it won’t work for the rest. But what happened there?

In our example, we used charlist to identify non-numeric characters in variables GRChange Niinmill NIChange. Indeed charlist returned “-” but also ” ” for GRChange which can easily be missed. In the results below do you see that space before “-” in the line following . charlist GRChange?
. charlist GRChange
 -.0123456789

. charlist  Niinmill
-.0123456789

. charlist  NIChange
-.0123456789

And so Nick Cox’s suggestion to use tab and the real() function makes it easier to see what is really going on in our data:
. tab GRChange if real(GRChange)==.

         GRChange  |      Freq.     Percent        Cum.
-------------------+-----------------------------------
              -    |          7       14.89       14.89
                 - |         40       85.11      100.00
-------------------+-----------------------------------
             Total |         47      100.00

Now, if I did the following:
. replace GRChange="" if GRChange==" -   "
(7 real changes made)

. replace GRChange="" if GRChange=="-"
(40 real changes made)

. destring GRChange, gen(GRChangeNew)
GRChange has all characters numeric; GRChangeNew generated as double
(47 missing values generated)

Everything is all we wanted. Also, another uncomplicated way is to generate a numeric variable using the real() function:

gen GRChangeNew = real(GRChange)

No complication at all… at least in this case.

One Response

  1. Thanks for the follow-up. When you see spaces and or unwanted character, a safe replace would be

    replace GRChange = “” if trim(GRChange) == “-”

    Both the code and the documentation of charlist (SSC) have been updated in response to esoteric problems reported by users, esoteric to me that is.

    *! NJC 1.3.0 28 Feb 2014
    * NJC 1.2.1 28 Feb 2014
    * NJC 1.2.0 17 Jul 2008
    * NJC 1.1.0 17 Dec 2002

Leave a Reply