<- "hello"
string string
[1] "hello"
Open a new R script file. To create a new R script go to
File -> New File -> R Script
This should have opened a blank Script window called Untitled.
The Script window is a file where you are saving your code. This is where you will write, edit, delete, and re-write your code.
To follow along with the tutorial, you should type (for now, resist just copying and pasting) the lines of code I display in the tutorial into your script.
Save your empty script somewhere on your computer
In R, everything that exists is an object and everything you do to objects are functions. You can define an object using the assignment operator <-
.
Everything on the left hand side of the <-
assignment operator is an object. Everything on the right hand side of <-
are functions or values. Go ahead and type the following two lines of code in your script
<- "hello"
string string
[1] "hello"
You can execute/run a line of code by placing the cursor anywhere on the line and press Ctrl + Enter
. Go ahead an run the two lines of code.
In this example, the first line creates a new object called string
with a value of “hello”. The second line simply prints the output of string
to the Console window. In the second line there is no assignment operator. When there is no <-
this means you are essentially just printing to the console. You can’t do anything with stuff that is just printed to the console, it is just for viewing purposes.
For instance, if I wanted to calculate 1 + 2
I could do this by printing it to the console
1 + 2
[1] 3
However, if I wanted to do something else with the result of that calculation then I would not be able to unless I assigned the result to an object using <-
<- 1 + 2
result <- result * 5
result
result
[1] 15
The point is, you are almost always going to assign the result of some function or value to an object. Printing to the console is not very useful. Almost every line of code, then, will have an object name on the left hand side of <-
and a function or value on the right hand side of <-
In the first example above, notice how I included " "
around hello. This tells R that hello is a string, not an object. If I were to not include " "
, then R would think I am calling an object. And since there is no object with the name hello it will print an error
<- hello string
Error in eval(expr, envir, enclos): object 'hello' not found
Do not use " "
for Numerical values
<- "5" + "1" a
Error in "5" + "1": non-numeric argument to binary operator
You can execute lines of code by:
Typing them directly into the Console window
Typing them into the Script window and then on that line of code pressing Ctrl+Enter. With Ctrl+Enter you can execute one line of your code at a time.
Clicking on Source at the top right of the Script window. This will run ALL the lines of code contained in the script file.
It is important to know that EVERYTHING in R is case sensitive.
<- 5
a
+ 5 a
[1] 10
+ 5 A
Error in eval(expr, envir, enclos): object 'A' not found
Classes are types of values that exist in R:
character "hello"
, "19"
numeric (or double) 2
, 32.55
integer 5
, 99
logical TRUE
, FALSE
To evaluate the class of an object you can use the typeof()
typeof(a)
[1] "double"
To change the class of values in an object you can use the function as.character()
, as.numeric()
, as.double()
, as.integer()
, as.logical()
functions.
as.integer(a)
[1] 5
as.character(a)
[1] "5"
as.numeric("hello")
[1] NA
Okay so now I want to talk about creating more interesting objects than just a <- 5
. If you are going to do anything in R it is important that you understand the different data types and data structures you can use in R. I will not cover all of them in this tutorial. For more information on data types and structures see this nice Introduction to R
Vectors contain elements of data. The length of a vector is the number of elements in the vector. For instance, the variable a
we created earlier is actually a vector of length 1. It contains one element with a value of 5. Now let’s create a vector with more than one element.
<- c(1,3,5) b
c()
is a function. Functions contain arguments that are inputs for the function. Arguments are separated by commas. In this example the c()
function concatenates the arguments (1, 3, 5
) into a vector. We are passing the result of this function to the object b. What do you think the output of b will look like?
b
[1] 1 3 5
You can see that we now have a vector that contains 3 elements; 1, 3, 5
. If you want to reference the value of specific elements of a vector you use brackets [ ]
. For instance,
2] b[
[1] 3
The value of the second element in vector b
is 3. Let’s say we want to grab only the 2nd and 3rd elements. We can do this at least two different ways.
2:3] b[
[1] 3 5
-1] b[
[1] 3 5
Now, it is important to note that we have not been changing vector b
. If we display the output of b
, we can see that it still contains the 3 elements.
b
[1] 1 3 5
To change vector b
we need to define b
as vector b
with the first element removed
<- b[-1]
b b
[1] 3 5
Vector b
no longer contains 3 elements. Now, let’s say we want to add an element to vector b
.
c(5,b)
[1] 5 3 5
Here the c()
function created a vector with the value 5 as the first element followed by the values in vector b
Or we can use the variable a
that has a value of 5. Let’s add this to vector b
<- c(a,b)
b b
[1] 5 3 5
What if you want to create a long vector with many elements? If there is a pattern to the sequence of elements in the vector then you can create the vector using seq()
seq(0, 1000, by = 4)
[1] 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56
[16] 60 64 68 72 76 80 84 88 92 96 100 104 108 112 116
[31] 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176
[46] 180 184 188 192 196 200 204 208 212 216 220 224 228 232 236
[61] 240 244 248 252 256 260 264 268 272 276 280 284 288 292 296
[76] 300 304 308 312 316 320 324 328 332 336 340 344 348 352 356
[91] 360 364 368 372 376 380 384 388 392 396 400 404 408 412 416
[106] 420 424 428 432 436 440 444 448 452 456 460 464 468 472 476
[121] 480 484 488 492 496 500 504 508 512 516 520 524 528 532 536
[136] 540 544 548 552 556 560 564 568 572 576 580 584 588 592 596
[151] 600 604 608 612 616 620 624 628 632 636 640 644 648 652 656
[166] 660 664 668 672 676 680 684 688 692 696 700 704 708 712 716
[181] 720 724 728 732 736 740 744 748 752 756 760 764 768 772 776
[196] 780 784 788 792 796 800 804 808 812 816 820 824 828 832 836
[211] 840 844 848 852 856 860 864 868 872 876 880 884 888 892 896
[226] 900 904 908 912 916 920 924 928 932 936 940 944 948 952 956
[241] 960 964 968 972 976 980 984 988 992 996 1000
Vectors can only contain elements of the same “class”.
<- c(1, "2", 5, 9)
d d
[1] "1" "2" "5" "9"
as.numeric(d)
[1] 1 2 5 9
Factors are special types of vectors that can represent categorical data. You can change a vector into a factor object using factor()
factor(c("male", "female", "male", "male", "female"))
[1] male female male male female
Levels: female male
factor(c("high", "low", "medium", "high", "low"))
[1] high low medium high low
Levels: high low medium
<- factor(c("high", "low", "medium", "high", "low"),
f levels = c("high", "medium", "low"))
f
[1] high low medium high low
Levels: high medium low
Lists are containers of objects. Unlike Vectors, Lists can hold different classes of objects.
list(1, "2", 2, 4, 9, "hello")
[[1]]
[1] 1
[[2]]
[1] "2"
[[3]]
[1] 2
[[4]]
[1] 4
[[5]]
[1] 9
[[6]]
[1] "hello"
You might have noticed that there are not only single brackets, but double brackets [[ ]]
This is because Lists can hold not only single elements but can hold vectors, factors, lists, data frames, and pretty much any kind of object.
<- list(c(1,2,3,4), "2", "hello", c("a", "b", "c"))
l l
[[1]]
[1] 1 2 3 4
[[2]]
[1] "2"
[[3]]
[1] "hello"
[[4]]
[1] "a" "b" "c"
You can see that the length of each element in a list does not have to be the same. To reference the elements in a list you need to use the double brackets [[ ]]
.
1]] l[[
[1] 1 2 3 4
To reference elements within list elements you use double brackets followed by a single bracket
4]][2] l[[
[1] "b"
You can even give names to the list elements
<- list(name = "Jason",
person phone = "123-456-7890",
age = 23,
favorite_colors = c("blue", "red", "brown"))
person
$name
[1] "Jason"
$phone
[1] "123-456-7890"
$age
[1] 23
$favorite_colors
[1] "blue" "red" "brown"
And you can use the names to reference elements in a list
"name"]] person[[
[1] "Jason"
"favorite_colors"]][3] person[[
[1] "brown"
You are probably already familiar with data frames. SPSS and Excel uses this type of structure. It is just rows and columns of data. A data table! This is the format that is used to perform statistical analyses on.
So let’s create a data frame so you can see what one looks like in RStudio
<- data.frame(id = 1:10,
data x = c("a", "b"),
y = seq(10,100, by = 10))
data
You can view the Data Frame by clicking on the object in the Environment window or by executing the command View(data)
Notice that it created three columns labeled id
, x
, and y
. Also notice that since we only specified a vector of length 2 for x
this column is coerced into 10 rows of repeating “a” and “b”. All columns in a data frame must have the same number of rows.
You can use the $
notation to reference just one of the columns in the data frame
$y data
[1] 10 20 30 40 50 60 70 80 90 100
Alternatively you can use
"y"] data[
To reference only certain rows within a column
$y[1:5] data
[1] 10 20 30 40 50
1:5,"y"] data[
[1] 10 20 30 40 50
If…then statements are useful for when you need to execute code only if a certain statement is TRUE. For instance,…
First we need to know how to perform logical operations in R
Okay, we have this variable a
<- 5 a
Now let’s say we want to determine if the value of a
is greater than 3
> 3 a
[1] TRUE
You can see that the output of this statement a > 3
is TRUE
Now let’s write an if…then statement. If a
is greater than 3, then multiply a
by 2.
if (a > 3) {
<- a*2
a
} a
[1] 10
The expression that is being tested is contained in parentheses, right after the if
statement. If this expression is evaluated as TRUE
then it will perform the next line(s) of code.
The {
is just a way of encasing multiple lines of code within one if statement. The lines of code then need to be closed of with }
. In this case, since we only had one line of code b <- a*2
we could have just written it as.
<- 5
a if (a > 3) a <- a*2
a
[1] 10
What if we want to do something to a
if a
is NOT greater than 3? In other words… if a
is greater than 3, then multiple a
by 2 else set a
to missing
<- 5
a if (a > 3) {
<- a*2
a else {
} <- NA
a
} a
[1] 10
You can keep on chaining if…then… else… if… then statements together.
<- 5
a if (is.na(a)) {
print("Missing Value")
else if (a < 0) {
} print("A is less than 0")
else if (a > 3) {
} print("A is greater than 3")
}
[1] "A is greater than 3"
For additional tips in the basics of coding R see:
https://ramnathv.github.io/pycon2014-r/visualize/README.html
https://www.datacamp.com/courses/free-introduction-to-r/?tap_a=5644-dce66f&tap_s=10907-287229