Intro to R

Hello, and welcome to my blog. The goal of this is to introduce people to R in a way that is easy to grasp. It's command line interface can be pretty intimidating, so hopefully this can help ease you into it. Chances are, if you're reading this, you're a close friend of mine (I don't have much reputation on the internet yet), but no matter who you are, I welcome comments, questions, suggestions, etc. So please speak up and tell me if you like it, ask me a question if you're having trouble following, or leave a witty “your mom” joke!
OK, lets jump right in. If you're going to learn R, you're going to need to install it first. Here comes the best part - R is free!
What???
Yes, totally free! It's an open source software, which means it's constantly being added to and improved by an ever increasing community of programmers and enthusiasts.

So how to get it?

You can install just raw R and work from that, but I prefer to use a programming environment, which is just sort of a fancier way of using R that keeps everything more organized and adds sweet features such as spell check, closes parentheses and quotes for you, etc. This makes using a command line interface less intimidating. There are a few such environments available, but the one I use is R Studio so I'll suggest you use this as well and I'll walk you through installing it.

Installing R

  1. Go to the R website
  2. Click the version of R appropriate for your computer (the options are front and center, you can't miss them)
  3. You will be given a list of further options. Mac users chose the version compatible with your current Operating System, Windows users, you want Base R and that's it.
  4. Once you've made the right choice on the previous page, just click "Download R 2.15.xx for (my OS)" and run the installer. You have just seriously upgraded the things your computer is capable of.
  5. Go to the R Studio website
  6. Click “Download R Studio” (upper right corner)
  7. Choose “Download R Studio Desktop”
  8. Click the link to download the version that is recommended for your system. (The first hyperlink on the page)
  9. Save it somewhere on your computer. I put it in my Programs folder, but put it where you like.
  10. What you have just downloaded is the installer, so find it where you just saved it and run the installer to actually put R Studio on your computer.

Now you've installed R Studio (and R) and you R ready to slay some code! (the puns get worse from here people, just accept it and move on)

Using R, the very basics

When you open R Studio, you'll see three panes within the window you just opened. This is part of the organizational beauty of R Studio. The big one on the left side is your command window. This is where you interact with R. You enter something here, R evaluates it, and returns a response to you. Lets not worry about the other 2 panes yet. These are the very basics, remember?

R as a calculator

Lets enter something into that intimidating command line. The > beckons!
Try entering:
6 * 6
[1] 36
Try:
6/3
[1] 2
Simple enough, but not particularly useful.
Now try:
6/0
[1] Inf
R returns Inf.
Now this is useful, it does not return an error, but infiniti. This is actually considered a numeric value to R, so it will not necessarily wreck your program if something ends up going to infiniti. Cool!
The multiplication and division operators in R (* and /) are intuitive and I'm sure you can go out on a limb and guess how to add and subtract as well, but what if we want to do something more complicated?
There are functions built into R that we can call to do fancier things.
Try:
sqrt(100)
[1] 10
log(10)
[1] 2.303
log10(10)
[1] 1
Cool. But now I want to get even fancier. I want to do math on a few numbers at a time. In R, if you ever want to do some operation on a group of numbers, you must concatenate those numbers.
Try entering:
1, 2, 3, 4, 5
R returns an error.
Now try using R's concatenate function c():
c(1, 2, 3, 4, 5)
[1] 1 2 3 4 5
No error this time.
This is incredibly important. Concatenating numbers creates an object in R. You may have heard someone say that R is an object-oriented language. This is what they're talking about. You've just created a very simple object. Built into R there are many, many things that you can now do with/to this object you have created.
Try:
mean(c(1, 2, 3, 4, 5))
[1] 3
Also try:
sum(c(1, 2, 3, 4, 5))
[1] 15
There are are tons of similar computations that can be don such as max(), min(), length(), sum(), median(), etc. Each of these computations is called a function. A function is different from an operator (for example the “ * ” symbol we used earlier to multiply). A function can be a lot more flexible than just a simple mathematical operation such as finding the mean, but we will discuss functions in more depth later. An operator is very rigid in function. It is purely mechanical. For example, 6 times 6 just IS 36. You add the number 6, six times, and that is the result. Now that we've covered concatenating numbers, lets try out some neat new operators.
Try:
c(1, 2, 3, 4, 5) * 2
[1]  2  4  6  8 10
Now we've done an operation on 5 numbers at a time. Sure beats typing in 5 different multiplication commands.
Now try:
c(1, 2, 3, 4, 5) * c(1, 2, 3, 4, 5)
[1]  1  4  9 16 25
See how this works? Alternatively, I could have typed:
c(1, 2, 3, 4, 5)^2
[1]  1  4  9 16 25
Now try:
c(1, 2, 3, 4, 5) %in% c(1, 2, 3)
[1]  TRUE  TRUE  TRUE FALSE FALSE
%something% is the symbol for a special operator. These usually perform some kind of element by element operations. There are a few of these in R. Notice this evaluates to a series of TRUEs and FALSEs. Once again, very mechanical. The number 1 either IS or IS NOT in the second group of numbers. Other operators that evaluate to either TRUE or FALSE are <, >, ==, <=, >=, &, |, &&, and ||. I'm sure you can guess what the first few do. Try:
3 < 6
[1] TRUE
or:
c(1, 2, 3, 4, 5) <= 3
[1]  TRUE  TRUE  TRUE FALSE FALSE
Now try using the & operator:
3 < 4 & 5 < 6
[1] TRUE
3 < 4 & 5 < 4
[1] FALSE
See what that does?
The vertical bar (acheived by holding Shift and pressing the button just above your Enter key) is the “or” operator. It works similar to the “and” operator (&):
3 < 4 | 5 < 4
[1] TRUE
The “double and” (&&) and “double or” are just programming semantics and you don't need to worry about them for now.
I'll leave you with one challenge that I'll answer in the beggining of the next post. I'll do this at the end of each post to help you learn by doing.

Challenge:

Compute the Kronecker product of the vectors < 1, 2, 3, 4, 5 > and < 1, 2, 3 >

Hint: The Kronecker product of two matrices is given below. 

Another Hint: The Kronecker product of 2 vectors is still a vector. My example is in matrices only because it illustrates well what a Kronecker product is.

Matrices:

11
12
13
24

Kroenecker product:

1212
3434
1224
3468

0

Add a comment

Purpose

The caret package includes a function for data splitting, createTimeSlices(), that creates data partitions using a fixed or growing window. The main arguments to this function, initialWindow and horizon, allow the user to create training/validation resamples consisting of contiguous observations with the validation set always consisting of n = horizon rows. If fixedWindow = TRUE, the training set always has n =initialWindow rows.

Understanding data.table Rolling Joins

Robert Norberg

June 5, 2016

Introduction

Rolling joins in data.table are incredibly useful, but not that well documented. I wrote this to help myself figure out how to use them and perhaps it can help you too.

library(data.table)

The Setup

Imagine we have an eCommerce website that uses a third party (like PayPal) to handle payments.
2

A Custom caret C5.0 Model for 2-Class Classification Problems with Class Imbalance

Robert Norberg

Monday, April 06, 2015

Introduction

In this post I share a custom model tuning procedure for optimizing the probability threshold for class imbalanced data. This is done within the excellent caret package framework and is akin to the example on the package website, but the example shows an extension of therandom forest (or rf) method while I present an extension to the C5.0 method.
3

Getting Data From One Online Source

Robert Norberg

Hello world. It’s been a long time since I posted anything here on my blog. I’ve been busy getting my Masters degree in statistical computing and I haven’t had much free time to blog. But I’ve writing R code as much as ever. Now, with graduation approaching, I’m job hunting and I thought it would be good to put together a few things to show potential employers.
2

Generating Tables Using Pander, knitr, and Rmarkdown

I use a pretty common workflow (I think) for producing reports on a day to day basis. I write them in rmarkdown using RStudio, knit them into .html and .md documents using knitr, then convert the resulting .md file to a .docx file using pander, which is really just a way of communicating with Pandoc via my R terminal.
2

R vs. Perl/mySQL - an applied genomics showdown

Recently I was given an assignment for a class I'm taking that got me thinking about speed in R. This isn't something I'm usually concerned with, but the first time I tried to run my solution (ussing plyr's ddply() it was going to take all night to compute.

Stop Sign Sampling Project

Post 1: Planning Phase

Welcome back to the blog y'all. It's been a while since my last post and I've got some fun stuff for you. I'm currently enrooled in a survey sampling methodology class and we've been given a semester-long project, which I will of course be doing entirely in R. My group's assignment is to estimate the proportion of cars that actually stop at a stop sign in Chapel Hill.
1

A while ago I was asked to give a presentation at my job about using R to create statistical graphics. I had also just read some reviews of the Slidify package in R and I thought it would be extremely appropriate to create my presentation about visualization in R, in R. So I set about breaking in the Slidify package and I've got to give a huge shout out to Ramnath Vaidyanathan who created this package.

In class today we were discussing several types of survey sampling and we split into groups and did a little investigation. We were given a page of 100 rectangles with varying areas and took 3 samples of size 10. Our first was a convenience sample. We just picked a group of 10 rectangles adjacent to each other and counted their area. Next, we took a simple random sample (SRS), numbering the rectangles 1 through 100 and choosing 10 with a random number generator.

For a class I'm taking this semester on genomics we're dealing with some pretty large data and for this reason we're learning to use mySQL. I decided to be a geek and do the assignments in R as well to demonstrate the ability of R to handle pretty large data sets quickly.
My Blog List
My Blog List
Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.