Which data input command: -infile- or -insheet-?


Stata has more than one data input command, depending on the format of the data you wish to load. We have already discussed that -use- loads Stata-format datasets (-forvalues- and the other -use- syntax) and the user-written -usespss- loads SPSS-format datasets (Reading SPSS data file into Stata). The other data input commands are: -insheet- and -infile-. Which one will you use?

Often, you need to load a data that is not in Stata format. Examples are data downloaded from web-based databases, e.g., World Bank’s World Development Indicators, UN’s Commodity Trade Statistics, IMF’s International Financial Statistics, which can be saved as tab or character delimited files. Some datasets are in undelimited text file format (or free format) with or without data dictionaries. The public-use file of the Family Income and Expenditure Survey (FIES) of the Philippines is a text file format with a dictionary.

The -insheet- command reads text files in which the values are separated by a tab or a character (usually a comma) and there is only one observation per line. These files are created using a spreadsheet or a database program, such as, Microsoft Excel. The general syntax of -insheet- is:

insheet [varlist] using filename [, options]

For example, the file population.csv is a comma-delimited file with the following data:

year,country,population
1960,Philippines,27053834
1960,Thailand,27652013
2007,Philippines,87892094
2007,Thailand,63832135

To read this data using -insheet-, type:

insheet using population.csv, comma names

The above command loads the data population.csv into Stata. By using the option comma and names, we declare that the data in the file are separated by commas and that the first line of the file contains variable names, respectively. In most cases, however, this is not necessary as Stata is clever enough to recognize comma- and tab- delimited text files or whether the first line of the data are variable names. For other options available for -insheet-, type: help insheet.

If the first line of population.csv are not variable names, the command:

insheet using population.csv

reads the file and names the variables in the file as v1, v2, and v3.

If you don’t want these v’s, you may name your variables by using the optional varlist:

insheet year country population using population.csv

If Stata detects more than 3 variables in the dataset, it will return an error indicating that too few variables are specified. What happens if you use -insheet- for a file that is not delimited? Stata will read the file as a data that contains only variable.

Although a bit crude, you may also copy-paste your data from a spreadsheet into Stata data editor (to bring out the Stata editor window, type: “edit”).

-infile-, on the other hand, reads files that are not delimited (or in free format)… to be continued.

One Response

  1. I like your Note regarding infile or insheet

Leave a Reply