In this article, I’ll explain how to use the scan function to read data into R. Let’s first have a look at the basic R syntax and the definition of scan():
Basic R Syntax:
scan("data.txt", what = "character")
Definition:
The scan function reads data into a vector or list from a file or the R console.
Below, I’ll show you five examples for the application of the scan function in R. So let’s get started…
Typically, the scan function is applied to text files (i.e. txt format). Let’s therefore create such a text file on our computers:
First, we are going to create an example data frame in R:
data data.frame(x1 = c(4, 4, 1, 9), # Create example data.frame x2 = c(1, 8, 4, 0), x3 = c(5, 3, 5, 6)) data # Print data to RStudio console # x1 x2 x3 # 4 1 5 # 4 8 3 # 1 4 5 # 9 0 6
Our example data contains three columns and four rows with numeric values. Now, let’s write this data frame as txt file to our computer:
write.table(data, # Write data as txt file to directory file = "data.txt", row.names = FALSE)
You can check if everything worked well by going to the following directory:
getwd() # Get currently used directory "C:/Users/Your Path. "
In the directory, you should have a txt-file with the name data.txt.
Now we can apply the scan function to read this text file into R:
data1 scan("data.txt", what = "character") # Apply scan function to txt file data1 # Print scan output to RStudio console # [1] "x1" "x2" "x3" "4" "1" "5" "4" "8" "3" "1" "4" "5" "9" "0" "6"
As you can see, the previous code created a vector, which contains all values of our data frame (including column names).
So what if we want to read the data in a handier format? Keep on reading…
The scan command also allows us to read data as into a list. With the following R code, we are creating a list with three list elements. Each of the list elements contains one column of our original data frame (i.e. we scan the data file line by line):
data2 scan("data.txt", what = list("", "", "")) # Read txt file into list data2 # Print scan output to RStudio console # [[1]] # [1] "x1" "4" "4" "1" "9" # # [[2]] # [1] "x2" "1" "8" "4" "0" # # [[3]] # [1] "x3" "5" "3" "5" "6"
Note: The column names are kept again. So what if we want to get rid of them? That’s what I’m going to show you next.
The scan functions provides many additional specifications – And one of them is the skip option. The skip option allows to skip the first n lines of the input file. Since the column names are usually the first input lines of a file, we can simply skip them with the specification skip = 1:
data3 scan("data.txt", skip = 1) # Skip first line of txt file data3 # Print scan output to RStudio console # [1] 4 1 5 4 8 3 1 4 5 9 0 6
Note: Of cause we could skip even more lines, in case we are not interested in the first n rows of our data.
So far, we have only read txt files into R. However, based on the scan function we can also read many other file formats. Let’s create a csv file for the next example:
write.table(data, # Write data as csv file to directory file = "data.csv", row.names = FALSE)
If you go to your currently used directory, there should be a file with the name data.csv.
We can now apply the scan function to this csv file as we did before:
data4 scan("data.csv", what = "character") # Apply scan function to csv file data4 # Print scan output to RStudio console # [1] "x1" "x2" "x3" "4" "1" "5" "4" "8" "3" "1" "4" "5" "9" "0" "6"
As you can see: Exactly the same output as in Example 1.
Another useful functionality of scan is that the function is able to read input from the RStudio console. In order to do that, we first need to execute the following line of code:
data5 scan("") # Scan RStudio console input
And then we can write any input to the RStudio console, e.g.:
If we now print data5, we get exactly the input that we have written to the console before:
data5 # Print scan output to RStudio console # 5 7 20 8 13 2 2 2 5
In general, there are many different ways to read data into R. If you want to read a structured csv file, the most common functions are read.csv and read.table. If you want to read (unstructured) text data, then you could also have a look at functions such as readLines, n.readLines, and readline.
Furthermore, you could also have a look at the following video of the Docworld Academy YouTube channel. In the video, the speaker explains how to use the readline function in a live R programming example. Have fun with the video and let me know in the comments, in case you have further questions or any feedback on the tutorial.