Another case for -insheet-


Recently, a friend showed me a text file with data that looks like the one below (the data below is a subset of auto.dta from Stata website and saved as auto.txt):

“make,price,mpg,rep78,headroom”
“AMC Concord,4099,22,3,2.5”
“AMC Pacer,4749,17,3,3”
“AMC Spirit,3799,22,,3”
“Buick Century,4816,20,3,4.5”
“Buick Electra,7827,15,4,4”
“Buick LeSabre,5788,18,3,4”
“Buick Opel,4453,26,,3”
“Buick Regal,5189,20,3,2”
“Buick Riviera,10372,16,3,3.5”
“Buick Skylark,4082,19,3,3.5”
“Cad. Deville,11385,14,3,4”
“Cad. Eldorado,14500,14,2,3.5”
“Cad. Seville,15906,21,3,3”
“Chev. Chevette,3299,29,3,2.5”
“Chev. Impala,5705,16,4,4”

The data seems to be comma-delimited but -insheet- will fail to load this data properly. Why? Because of the quotation marks (). -insheet- will read everything inside a ” “ as a single string. The solution therefore is to delete all . To do this, you may open the text file using a notepad and then find-replace (Ctrl+H) all with a blank. Without the , you may now use -insheet- to load the file:

insheet using auto.txt, comma names
/* the option “comma” is optional /
/
the option “names” indicate the the first line of the text file — make,price,mpg,rep78,headroom — are the variable names */

An alternate solution involves 3 steps, but without using other programs (i.e. text editors) besides Stata:

(1) load the original text file using -insheet-;

insheet using auto.txt, nonames clear
/* -insheet- will read the data as a single variable and by specifying the option “nonames”, it will call the variable v1. This is important; otherwise, the variable name will be the concatenated variables names without the commas (because commas are not allowed in variable names): makepricempgrep78headroom, which will be difficult to separate. */

(2) save it (as a new file or replace the old file) without the quotation marks using -outsheet-; and,

outsheet using auto.txt, noquote nonames replace
/* without the “noquote” option, the quotation marks will be retained /
/
the “nonames” options will exclude the generated variable name “v1” */

(3) load the correct file comma-delimited file using -insheet-

insheet using auto.txt, comma names clear
/* this should load the file correctly */

This is particularly helpful when you have a huge text file that makes it difficult or impossible to open in text file editors such as Notepad.



5 Responses

  1. Haven’t encountered a .txt data file that would require me to do this though. Still, will keep this in mind.

  2. i like ur method more. its faster.

  3. another way, without resorting to an outside text editor, would be to replace text using regexr in stata itself.

Leave a Reply