-infile- Which data input command (Part 2)


In Which data input command: -infile- or -insheet-?, we introduced the input command -insheet-. Here, we will introduce the more flexible data input command -infile-, which can read both delimited and free format text files. For example, the comma-delimited file in our previous post, population.csv, with the following data:

1960,Philippines,27053834
1960,Thailand,27652013
2007,Philippines,87892094
2007,Thailand,63832135

can also be loaded into Stata, by typing:

infile year str15 country population using population.csv, clear

Note that the first line of the data above are not variable names. -infile-, unlike -insheet-, does not recognize variable names in the data. The capability to read variable names is one advantage -insheet- has over -infile-. Also, the default of -infile- is to read the data as numbers. If, for example, we fail to declare the second variable as string:

infile year country population using population.csv, clear

Stata will still read the data but will input the second variable, country, as missing values. By inserting str15 before the string variable country, we are telling Stata to read and store country as a string variable with at most 15 characters.

The major advantage of -infile- is that it can read text files that are in free format. For example, instead of a comma-delimited file as above, we have a data that is separated only by white spaces:

1960 “Philippines” 27053834
1960 “Thailand” 27652013
2007 “Philippines” 87892094
2007 “Thailand” 63832135

If we use -insheet- to load this file, Stata will read it as a data with only 1 variable, v1, with the following entries:

“1960 Philippines 27053834”
“1960 Thailand 27652013”
“2007 Philippines 87892094”
“2007 Thailand 63832135”

To correctly load the data, use the -infile- command:

infile year str15 population using population.txt

Another advantage of -infile- is when you have a fixed-format text file with a dictionary. To be continued…

Leave a Reply