Subscribe to our Newsletter

R Tutorial for Beginners: A Quick Start-Up Kit

Originally posted on Data Science Central, by Mitchell A. Sanders.


Learn R: A Statistical Programming Language


Here's my quick start-up kit for you.
  1. Install R
    1. Linux: "sudo apt-get install r-base" should do it
    2. Windows: go get it here
  2. Open a Script Windows alongside the Console window when you run R
    1. It should look something like this. Your Console allows typing direct, hit <enter> and R runs the line. If it goes to prompt (the Red ">"), then that command processed.
      Console and Script windows
    2. Your script file is for typing in as much as you want. To run whatever is there, highlight what you want to run and hit Ctrl+R or the icon on top. It will run in the console.
    3. This basic setup is useful over to begin.
  3. The quickest approach is to go to the Appendix of the Intro Manual and walk though typing in all the commands to see how it basically works. You'll see quickly that you feed equations, functions, values, objects, etc. from the right to the named variable or object on the left using the " <- " characters.


After you get the very basics, one of the first things you'll want to be able to do is load a file. Use the sample data provided below for a starter to practice on.


Set your working directory


From Console window:
File > Change dir...
In Console, type: "dir()" <enter>
You should see your file there you've saved as "mydata.csv" (the below data cut n pasted and saved as such in a folder). If not, change your working directory till it is right.


Load and view your data with:

> read.csv("mydata.csv")

Do it again, but load it into a variable named "data" and then view it by just typing "data" like this:
> data <- read.csv("mydata.csv")
> data

Now you're ready to start manipulating the data by pulling out only subset of it:
Pull a single column (hash symbol is for comments).
> oz <- x$Ozone ## pulled out Ozone column into a vector

Look at it by typing in the variable object "oz" and <enter>
Now let go find all the values in oz where we have no data (i.e. "NA") and let's put that into a vector object called "badoz":
> badoz <- is.na(oz) ## pulled na's from column

Now let's use "oz" and "badoz" to give me everything in the Ozone column ("oz") that DOESN'T have bad data by using the exclamation character with means "NOT this thing".
> oz[!badoz] ## found index where by all that's not NA

You've just filtered out a whole column and then eliminated all the bad data from that column into it's own data set.

Let's find the average of this data set:
> mean(oz[!badoz])

Let's get a summary:
> summary(oz[!badoz])

Make another subset of data from the original where Ozone was more than 31 and Temperature was over 90 degrees:
> myY <- subset(data, (Ozone > 31 & Temp > 90))

Let's get only the month of May into a subset:
> mM <- subset(x, Month==5) ##find subset month of May

So by now you get the idea here. The rest is mostly syntax.


Next thing you will start wanting to do is to write FUNCTIONS to manipulate your data.


Basic syntax is like this:
myfunction <- function() {
x <- rnorm(100)
mean (x)
}

Let's write a function that will accept two parameters: 1.) Filename and 2:) Folder and return to us that data. We'll add another parameter initially set to FALSE that will return to us a summary of this data also when set to TRUE. This would be done something like this:

getdata <- function(id, directory, summarize=FALSE) {
filelist <- dir(directory);
file <- paste(directory,filelist,sep='/')
data <- read.csv(file)
if(summarize == "TRUE") {
mySum <- summary(data)
print(mySum)
}
return(data)
}

Now to run this function in memory after this you would simply call it, adding the filename and directory and option TRUE (or leave it off since it auto-sets to false) like this:

> getdata("mydata.csv", "datafolder", TRUE)

Save your function code as getdata.R and reuse it in other functions and as a basis to expand upon as you further explore and get better at using R.

Enjoy!
Mitchell
*********************


##############
##Here is a cut n' paste data set to work with. Save it as "mydata.csv":
##############

"Ozone","Solar.R","Wind","Temp","Month","Day"
41,190,7.4,67,5,1
36,118,8,72,5,2
12,149,12.6,74,5,3
18,313,11.5,62,5,4
NA,NA,14.3,56,5,5
28,NA,14.9,66,5,6
23,299,8.6,65,5,7
19,99,13.8,59,5,8
8,19,20.1,61,5,9
NA,194,8.6,69,5,10
7,NA,6.9,74,5,11
16,256,9.7,69,5,12
11,290,9.2,66,5,13
14,274,10.9,68,5,14
18,65,13.2,58,5,15
14,334,11.5,64,5,16
34,307,12,66,5,17
6,78,18.4,57,5,18
30,322,11.5,68,5,19
11,44,9.7,62,5,20
1,8,9.7,59,5,21
11,320,16.6,73,5,22
4,25,9.7,61,5,23
32,92,12,61,5,24
NA,66,16.6,57,5,25
NA,266,14.9,58,5,26
NA,NA,8,57,5,27
23,13,12,67,5,28
45,252,14.9,81,5,29
115,223,5.7,79,5,30
37,279,7.4,76,5,31
NA,286,8.6,78,6,1
NA,287,9.7,74,6,2
NA,242,16.1,67,6,3
NA,186,9.2,84,6,4
NA,220,8.6,85,6,5
NA,264,14.3,79,6,6
29,127,9.7,82,6,7
NA,273,6.9,87,6,8
71,291,13.8,90,6,9
39,323,11.5,87,6,10
NA,259,10.9,93,6,11
NA,250,9.2,92,6,12
23,148,8,82,6,13
NA,332,13.8,80,6,14
NA,322,11.5,79,6,15
21,191,14.9,77,6,16
37,284,20.7,72,6,17
20,37,9.2,65,6,18
12,120,11.5,73,6,19
13,137,10.3,76,6,20
NA,150,6.3,77,6,21
NA,59,1.7,76,6,22
NA,91,4.6,76,6,23
NA,250,6.3,76,6,24
NA,135,8,75,6,25
NA,127,8,78,6,26
NA,47,10.3,73,6,27
NA,98,11.5,80,6,28
NA,31,14.9,77,6,29
NA,138,8,83,6,30
135,269,4.1,84,7,1
49,248,9.2,85,7,2
32,236,9.2,81,7,3
NA,101,10.9,84,7,4
64,175,4.6,83,7,5
40,314,10.9,83,7,6
77,276,5.1,88,7,7
97,267,6.3,92,7,8
97,272,5.7,92,7,9
85,175,7.4,89,7,10
NA,139,8.6,82,7,11
10,264,14.3,73,7,12
27,175,14.9,81,7,13
NA,291,14.9,91,7,14
7,48,14.3,80,7,15
48,260,6.9,81,7,16
35,274,10.3,82,7,17
61,285,6.3,84,7,18
79,187,5.1,87,7,19
63,220,11.5,85,7,20
16,7,6.9,74,7,21
NA,258,9.7,81,7,22
NA,295,11.5,82,7,23
80,294,8.6,86,7,24
108,223,8,85,7,25
20,81,8.6,82,7,26
52,82,12,86,7,27
82,213,7.4,88,7,28
50,275,7.4,86,7,29
64,253,7.4,83,7,30
59,254,9.2,81,7,31
39,83,6.9,81,8,1
9,24,13.8,81,8,2
16,77,7.4,82,8,3
78,NA,6.9,86,8,4
35,NA,7.4,85,8,5
66,NA,4.6,87,8,6
122,255,4,89,8,7
89,229,10.3,90,8,8
110,207,8,90,8,9
NA,222,8.6,92,8,10
NA,137,11.5,86,8,11
44,192,11.5,86,8,12
28,273,11.5,82,8,13
65,157,9.7,80,8,14
NA,64,11.5,79,8,15
22,71,10.3,77,8,16
59,51,6.3,79,8,17
23,115,7.4,76,8,18
31,244,10.9,78,8,19
44,190,10.3,78,8,20
21,259,15.5,77,8,21
9,36,14.3,72,8,22
NA,255,12.6,75,8,23
45,212,9.7,79,8,24
168,238,3.4,81,8,25
73,215,8,86,8,26
NA,153,5.7,88,8,27
76,203,9.7,97,8,28
118,225,2.3,94,8,29
84,237,6.3,96,8,30
85,188,6.3,94,8,31
96,167,6.9,91,9,1
78,197,5.1,92,9,2
73,183,2.8,93,9,3
91,189,4.6,93,9,4
47,95,7.4,87,9,5
32,92,15.5,84,9,6
20,252,10.9,80,9,7
23,220,10.3,78,9,8
21,230,10.9,75,9,9
24,259,9.7,73,9,10
44,236,14.9,81,9,11
21,259,15.5,76,9,12
28,238,6.3,77,9,13
9,24,10.9,71,9,14
13,112,11.5,71,9,15
46,237,6.9,78,9,16
18,224,13.8,67,9,17
13,27,10.3,76,9,18
24,238,10.3,68,9,19
16,201,8,82,9,20
13,238,12.6,64,9,21
23,14,9.2,71,9,22
36,139,10.3,81,9,23
7,49,10.3,69,9,24
14,20,16.6,63,9,25
30,193,6.9,70,9,26
NA,145,13.2,77,9,27
14,191,14.3,75,9,28
18,131,8,76,9,29
20,223,11.5,68,9,30
E-mail me when people leave their comments –

You need to be a member of Data Plumbing to add comments!

Join Data Plumbing

Webinar Series

Follow Us

@DataScienceCtrl | RSS Feeds

Data Science Jobs