The R Data Analysis System
Writing from R to External Files
Importing and Exporting Data with R Commander
R is an open-source (i.e. free) system for statistics and data analysis. It is very powerful and flexible. It takes some time to learn to use R effectively, but it is worth the effort because R can do a lot more than most point-and-click systems based on spreadsheet layouts.
Go to the website for the R Project for Statistical
Computing at http://www.r-project.org
. Under “Getting Started” you will see a link to a menu of different CRAN
mirror sites to download from. Click on that link and then select a CRAN
site in the
Near the top of the page you will see links for downloading precompiled binary files for three operating systems: Linux, MacOS, and Windows. Choose the one for your operating system and download. The rest of the installation procedure should be straightforward if you follow the pop-up instructions. Once you have installed R you will have a big blue “R” icon on your desktop. You can start R by double clicking on that icon.
Start R by double clicking on the R icon on your desktop. After you have been using R for a while you will probably have R workspaces with icons saved in other folders or directories. You can start R from any of them as well by clicking on the icon. When you do that, R automatically recalls the files and objects you previously created in that workspace.
When you start R a console window opens with some general information about the system at the top and a prompt at the bottom, which looks like this
>
The prompt is where you type commands. At the top of the console window there are drop down menus for file tasks, editing, miscellaneous, windows, packages, and help. They are pretty self-explanatory. Explore them.
To see a list of the objects in your workspace, type
> ls( )
or use the "Misc" drop-down menu on the task bar of R.
When you are finished with an R session, you can exit by using the "File" drop-down menu and selecting “Exit” or by typing the following at the command line;
> quit( )
Many R functions can be abbreviated. That is the case with “quit” above, which may be abbreviated “q”. After you type “quit( )” or “q( )”, you will be asked if you want to save your workspace. If you choose “Yes”, the objects you created during your session will be saved and will be available for future sessions.
This document will focus on using the basic Rgui console with commands typed at the command line. Packages are available that provide a menu interface to the most common statistical tasks. You will not need them right away. One such package is called R Commander. After you have installed R if you want R Commander go to the “Packages” menu at the top of the console window and choose “Install package(s)”. The name of the package is “Rcmdr” and you will be prompted to choose a CRAN site to download it from. There are a couple of required supporting packages that you will be prompted for also. The whole process is pretty straightforward.
> library(Rcmdr)
This opens another window from which you can do things by menu selection if you like. However, your choices are limited and you will always have to do some things from the Rgui console.
> Commander( )
At the top of the console window there is a drop down help menu. It includes Html help, pdf manuals, FAQS, and an internal help function for obtaining information about permanent R objects. Make full use of these sources and you will have a much easier time learning R. You can call the internal help function from the command line by typing
> help(object)
where object is the name of the permanent R object you need to read about. (Note: Whenever a word is italicized like object above, it is just a stand-in for the real name of an object that you would substitute when actually using the command.)
Pay special attention to Html help in the help menu of the R console, especially the section entitled “Introduction to R”. It gives an easy to navigate description of basic R functions. The help menu also has a “Search help” selection when you want to search the R libraries and files for a particular phrase. You can invoke it at the command line like this:
> help.search(“topic”)
Almost all the work done by R is accomplished by built-in or user defined functions. The functions “q”, “help”, and “help.search” used above are examples. When invoked, functions must have parentheses enclosing the arguments, as in “help.search(“topic”)”. Functions may have both required and optional arguments. If all the arguments of a function are optional and you don’t need them, you still have to include the empty parentheses when you call the function. That was the case with “q( )” above. To see the optional arguments that we did not use, call
> help(quit)
If you just type the name of a function without the parentheses, you will get a display of the R code for the function. In general, typing the name of an R object displays that object, or at least a summary of that object.
You can use R as a calculator by invoking its built-in mathematical functions. For example, to find the cosine of 30 degrees (pi/6 radians), type
> cos(pi/6)
> help(cos)
R also has logarithmic and exponential functions, special functions like Bessel functions, and functions for manipulating matrices and vectors. The basic mathematical operators are “+” for pairwise addition, “*” for multiplication, “/” for division, “^” for exponentiation, and “%%” for the remainder upon division of one integer by another. Functions and operators can be combined in various ways, e.g.,
> asin(cos(3*pi/7))
> atan(2)+atan(3)
Complex arithmetic is permitted. Complex numbers are indicated, for example, by expressions like “2.20 -3.14i”. If you evaluate complex arithmetic expressions in R it is wise to use explicit real and imaginary parts and to use parentheses liberally.
Operation |
R Function or Operator |
Pairwise Addition |
+ |
Pairwise Multiplication |
* |
Division |
/ |
Exponentiation |
^ |
Remainder |
%% |
Square Root |
sqrt( ) |
Absolute Value |
abs( ) |
Greatest Integer Less Than or
Equal |
floor( ) |
Least Integer Greater Than or
Equal |
ceiling( ) |
Exponential Function |
exp( ) |
Logarithm |
log( ) |
Trig Functions |
sin( ), cos(
), tan( ), etc. |
Inverse Trig Functions |
asin( ), acos( ), atan( ), etc. |
Factorial Function |
factorial( ) |
Gamma Function |
gamma( ) |
Binomial Coefficient (Number of
Combinations) |
choose( ) |
Some of these functions have optional arguments. Call help to read about them.
If you call a function like this:
> factorial(3)
the returned value will be displayed on your screen. You can assign the returned value to a named object by using the assignment operator “=” like this:
> name = factorial(3)
Then if you type the name of the object, you will see its value displayed.
> name
You can name your objects just about anything, but don’t use the same name as that of a permanent object. If you do, you may not be able to access your object. You probably will get a warning if you try to use a permanent name, but don’t count too much on it.
There is another assignment operator, which is completely equivalent to “=” and is used in the help files and other references. It goes like this:
> name <- factorial(3)
You can use it if you like, but it involves an extra keystroke. You can rename an object like this:
> name2 = name1
This duplicates the object, so now you have two identical objects. If you want to get rid of the old object, use
> remove(name1)
or just
> rm(name1)
If you need a function and it isn’t built-in, you can define it yourself. For example, suppose you want to make up a function that accepts an argument x and returns x if x is positive or 0 if x is not positive. This is one way to do it:
> pospart=function(x) {
+ y=(x+abs(x))/2
+ return(y)
+ }
> pospart(2)
> pospart(-1)
The + sign at the beginning of the second through the fourth lines is R’s prompt for a continuation. It occurs when R thinks you haven’t finished the preceding command or group of commands. The braces { and } enclose a group of commands to be executed as a unit. This is the general syntax for defining a function, but in this example it did not have to be this complicated. All of the above could have been accomplished by
> pospart=function(x) (x+abs(x))/2
Once you have defined a function in R you can use it like
any built-in function. If you save your workspace when you quit R the
function will be saved for future use.
R objects are classified as functions, vectors, matrices, arrays, lists, or data frames. A vector is a sequence of a certain length of numbers or character strings (character strings must be enclosed in quotes). All the entries or components of the vector are of the same mode, namely, numeric, integer, character, logical, or complex. Single numbers or character strings are vectors of length 1. A component of a vector has a single index. If vector is the name of a vector, then vector[2] is the second component of vector.
> vector
will cause the entire vector to be displayed.
> vector[1]
will display only its first component.
> vector[1:4]
will cause the first 4 components of vector to be displayed.
> matrix[2, 1]
displays the entry in the 2^{nd} row, 1^{st} column.
> matrix[1:5,2:6]
displays the entries in rows 1 through 5 and columns 2 through 6.
> matrix
shows the whole thing.
Arrays are higher dimensional analogs of matrices. They may have any number of indices. Like vectors and matrices, the entries of an array must all be of the same mode.
A data frame is a table of data arranged by rows and columns, much like a spreadsheet. The columns of a data frame have names, which are the variables addressed in applications of R functions to the data. The rows of a data frame often correspond to different cases in the study that generated the data, or different individuals on which the variables are recorded. The rows of a data frame might have names, or they might not.
The difference between a data frame and a matrix is that the entries of a matrix all have to have the same storage mode. A matrix cannot have some entries that are numeric and others that are character strings. In a data frame, the columns can have different modes, but the entries within a single column must have the same mode. The example below shows a small data frame.
Weight Time Chick Diet
1 164 14 25 B
2 98 8 36 C
3 116 14 8 A
4 238 21 46 D
5 261 18 48 D
6 57 4 7 A
7 72 6 30 B
8 199 20 43 D
9 89 10 17 A
10 202 21 3 A
In this example, there are four variables: “Weight”, “Time”, “Chick”, and “Diet”. There are ten cases or rows, labeled 1 through 10. Each row corresponds to one experimental run. The first three variables appear to be numeric in character, but in fact one of them is not. The numerals under “Chick” are mere identifiers and have no significance as numbers. They could just as well be letters or names. The variables “Chick” and “Diet” are called factors, as opposed to numeric variables like “Weight” and “Time”.
A list is just a bag of named components. The components of a list can be objects of different types. Many functions return lists when they are called because they are designed to create several objects of different types.
You can create a vector of small length by using the concatenate function. Here are two examples.
> vector1=c(2,5,21,-9)
> vector1
> officers = c(“Abby”, ”Bobby”, “Cleo”, “Dinesh”)
> officers
The function “c” can be used to combine previously created vectors by concatenating them.
> vector2=c(-3,8,10)
> vector3 = c(vector1, vector2)
> vector3
There are quick ways of creating special vectors. For example, to create a vector of length 10, all of whose components are equal to 1, type
> tenones=rep(1,10)
“rep” stands for replicate or repeat. To make a vector of consecutive integers starting with -2 and ending with 6, type
> -2:6
Notice that this is different from
> -(2:6)
Also, the sequence can be decreasing rather than increasing.
> 6:-2
A vector with constant differences between consecutive entries can be created with the “seq” function. This function can be used in two ways: by prescribing the difference between consecutive entries, or by prescribing the number of entries in the sequence. As an example of the first,
> seq(from=1, to=20, by=4)
or simply
> seq(1, 20, 4)
An example of the second usage is
> seq(1, 20, len=8)
This produces a vector of length 8 beginning with 1, ending with 20, and having constant differences.
An alternative to the concatenate function for entering data manually is the “scan” function. Call the function without arguments, like this:
> scan( )
1:
The “1:” above is a new prompt for you to enter the first element of the vector you are creating. Type the entries one at a time, separated by blank spaces. You can hit “Enter” at any time to start a new line of type. If you do, you will get a new numerical prompt like the one above with the number of the entry that comes next. Continue with the input until you are finished. Then hit “Return” twice to signal that the last entry has been typed. For example, if you wanted to create a vector whose entries are the integers strictly between 1 and 10, with the even ones coming first, this is what it might look like.
> scan( )
1: 2 4 6 8 (Now hit “Return” once)
5: 3 5 7 9 (Double “Return”)
Read 8 items.
[1] 2 4 6 8 3 5 7 9
Perhaps the most convenient tool for creating a small matrix or data frame is the function “data.entry”. It is invoked like this:
> object = data.entry(“variable1”)
where variable1 is the name you want to give to the first column in your object (matrix or data frame). This opens an editing window in tabular form where you can enter new values for variable1 and add new named columns. Click on a column heading to change its name and mode. Double click on a cell to change its value. In current versions of R this function has some peculiarities that take getting used to.
Data stored in external files can be imported into the R workspace in several ways. The best method to use depends on the amount and kind of data. To begin, suppose that you have an unformatted text file named file in the same directory with your R workspace. Also suppose that the data in file is either all numeric or all character strings in quotes. Such a text file might be created with Microsoft Notepad, for example, or some other simple text editor. The “scan” function can be used to bring the file into the workspace, like this:
> object = scan(“file.txt”)
Both the extension “.txt” and the quotes are necessary. If the file is in a directory other than the one containing your workspace, you can either change the working directory by using the file menu at the top of the R console and then changing it back to the original directory after you have imported the file, or you can give the complete pathname of the file in your local file system. You can also give the url of a file for downloading a file from the internet.
Data frames are usually imported from previously created tabular text files or spreadsheets. These are imported with the function “read.table” or a variant “read.csv”. Suppose that you have a spreadsheet named file in your working directory. Save or copy it as a comma separated values file with the same name. Then import it with the command
> frame = read.csv(“file.csv”, header = T)
The argument “header = T” is to tell R that the first line of the file contains the column headings of the spreadsheet. Actually, the argument is unnecessary if this is the case because that is the default. If it is not the case, then you must use “header = F” instead. The column headings will become the variable names in the data frame frame.
If the file you want to import does not come from a spreadsheet it may be more convenient to use “read.table” which has different defaults. However, either function can be used in any situation.
Tabular data in an external file can be created from a matrix or data frame in R with the commands
> write.table(object, file=”filename”)
> write.csv(object, file=”filename.csv”, na=” “)
Here, object is the name of a matrix or data frame in your R workspace. The file filename will be written in the working directory containing your R workspace. If you want to put it somewhere else, you can use the complete pathname in your file system.
The “write.csv” option is probably more useful than “write.table” since it is more amenable to spreadsheet applications. Both these functions have optional arguments which you can read about with help.
One of the great advantages of R Commander is that it makes it easy to import and export data. To import a data set from an external source, such as an Excel file, tabbed text file, a file created by another software package, or an internet site, just click on the "Data" menu at the top of the R Commander window and follow the instructions. They are straightforward. The data will be imported as a data frame in R.
Similarly, you can export data from R to a tabular text file in your local file system by clicking on "Data" and choosing "Active Data Set - Export active data set".