Preserving numerical format after string transformation


Here is one example where you need to preserve the numerical format for strings. Suppose you have a 6-digit numeric observation ID, code, where the first 2 digits represent geographic code and the last 4 digits represent unique observation codes, and you want to generate a new variable, reg, that represents the 2-digit geographic code. The entries in variable code, with numeric format %06.0f will look like: 010001, 010002,…, 250001,…,999999. For variable reg, entries will be: 01, 02,…,99.

[Note: The format %06.0f means that code is fixed as a 6-digit number with leading zeros (i.e. if code is less than 100000, it has 0’s before the first non-zero digit) and with nothing after the decimal point.]

How will you you go about this? First, you need to transform the numeric code into a string. Why? Because there is no basic number operations that returns the first 2-digit of a number. You cannot P-E-M-D-A-S your way out of this. And, second,you take the subset of the string and call it something else. In Stata, this involves the following commands:

tostring code, gen(string1) format(“%06.0f“)        /generates the a string variable string1 and preserves the format with leading zeros/

gen string2=substr(string1,1,2)          /* generates string variable string2. It is subset of the string string1, starting at element 1 with length 2 (in short, the first 2 digits). */

What happens if you only write “tostring code, gen(string1)”? This command will return the string without the leading zeros. For example, from 010001 to “10001.” Then, for observations with code<100000, the “gen string2=substr(string1,1,2)” will return the 2nd and 3rd digit of the code. You’re screwed!

Another way is to use:

gen string1=string(code, “%06.0f“)                 /* generates string variable string1 and preserves the format with leading zeros */

gen string2=substr(string1,1,2)

Or (the most elegant of all):

gen string3=substr(string(code, “%06.0f“),1,2)

In Stata, there are many ways to solve a problem, like there are many ways to prove the Pythagorean theorem. And, like the Pythagorean theorem proofs, there are the very long ones, the shorter ones, and the one that is the most elegant of all.

One Response

  1. […] second, to divide the string into two. But you have no idea how to tell Stata what to do. (See also Preserving numerical format after string transformation and Truncating numbers for more detailed discussion on how to split strings or numerical […]

Leave a Reply