Chapter 2 Introduction to R language

Since this book includes a lot of R code examples, this chapter will describe some basics for those, who is not familiar with R. For purposes of understanding R code in this book you don’t need any deep knowledge of R. In case you want to learn more, there are a lot of good books on it. I will list only few of them:

2.1 Instalation

2.1.1 R instalation

To download R, go to CRAN. Don’t try to pick a mirror that’s close to you, instead it is better to use the cloud mirror, https://cloud.r-project.org.

2.1.2 RStudio

RStudio is an integrated development environment, or IDE, for R programming. There are two possibilities for using it:

  • type R code in the R console pane, and press enter to run it;
  • type R code in the Code editor pane, and press Control/Command + Enter to run selected part. It is easier to correct and it is possible to save the result as a script.
RStudio layout

Figure 2.1: RStudio layout

When you first launch RStudio it is more likely, that you won’t see the Code Editor pane. It is possible to decrease R Console pane on icons in the pane’s right upper corner.

Everything from this book will be availible without RStudio instalation. There are a lot of possibilities to work with R not using RStudio such as R console, command line, Jupyter Notebook, some plugins for working in Sublime, Vim, Emacs, Atom, Notepad++ and other programming text editors.

2.1.3 RStuio cloud

It is also possible not to install anything on your own PC, using RStudio Cloud, a web-based interface for Rstudio and R. In RStudio Cloud it is also possible to share your R projects and collaborate with a select group in a private space. RStudio Cloud is currently free to use, but soon there will be free and paid options.

2.2 Basic elements, variables, vectors, dataframe

2.2.1 Basic elements

7
[1] 7
-5.7
[1] -5.7
"bonjour"
[1] "bonjour"
"bon mot"
[1] "bon mot"
TRUE
[1] TRUE
FALSE
[1] FALSE

2.2.2 Arithmetic operations

7+7
[1] 14
21-8
[1] 13
4*3
[1] 12
12/4
[1] 3
4^3
[1] 64
4**3
[1] 64
sum(1, 2,3, 4)
[1] 10
prod(1, 2,3, 4)
[1] 24
log(1)
[1] 0
log(100, base = 10)
[1] 2
pi
[1] 3.141593
exp(5)
[1] 148.4132
sin(13)
[1] 0.420167
cos(13)
[1] 0.9074468

2.2.3 Variables

my_var <- 7
my_var
[1] 7
my_var+7
[1] 14
my_var
[1] 7
my_var <- my_var + 7

2.2.4 Vectors

5:9
[1] 5 6 7 8 9
11:4
[1] 11 10  9  8  7  6  5  4
numbers <- c(7, 9.9, 24)
multiple_strings <- c("the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog")
one_string <- c("the quick brown fox jumps over the lazy dog")
true_false <- c(TRUE, FALSE, FALSE, TRUE)
length(numbers)
[1] 3
length(multiple_strings)
[1] 9
length(one_string)
[1] 1

2.2.5 Dataframes

my_df <- data.frame(latin = c("a", "b", "c"),
                    cyrillic = c("а", "б", "в"),
                    greek = c("α", "β", "γ"),
                    numbers = c(1:3),
                    is.vowel = c(TRUE, FALSE, FALSE),
                    stringsAsFactors = FALSE)
my_df
  latin cyrillic greek numbers is.vowel
1     a        а     α       1     TRUE
2     b        б     β       2    FALSE
3     c        в     γ       3    FALSE
nrow(my_df)
[1] 3
ncol(my_df)
[1] 5

2.2.6 Indexing

numbers[3]
[1] 24
multiple_strings[9]
[1] "dog"
my_df[2, 3]
[1] "β"
my_df[2,]
  latin cyrillic greek numbers is.vowel
2     b        б     β       2    FALSE
my_df[,3]
[1] "α" "β" "γ"
my_df$is.vowel
[1]  TRUE FALSE FALSE
my_df$is.vowel[2]
[1] FALSE

2.3 Reading files

We can read to R a dataset about Numeral Classifiers from AUTOTYP database.

new_df <- read.csv("https://raw.githubusercontent.com/autotyp/autotyp-data/master/data/Numeral_classifiers.csv")
head(new_df)
  LID NumClass.n NumClass.Presence
1 148          0             FALSE
2  65          0             FALSE
3  75          0             FALSE
4  85          0             FALSE
5 111         NA                NA
6 163          0             FALSE
tail(new_df)
     LID NumClass.n NumClass.Presence
250 1397          0             FALSE
251 2994          5              TRUE
252 2779          0             FALSE
253  192          0             FALSE
254  551          0             FALSE
255 2564          2              TRUE

It could be also a file on your computer, just provide a whole path to the file. Windows users need to change backslashes \ to slashes /.

new_df_2 <- read.csv("/home/agricolamz/my_file.csv")

2.4 Writing files from R

write.csv(new_df_2, "/home/agricolamz/my_new_file.csv",
          row.names = FALSE)

2.5 Missing data

In R, missing values are represented by the symbol NA (not available).

is.na(new_df$NumClass.Presence)
  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
 [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[100] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
[111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[155] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[166] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
[188] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[199] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[210] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[221] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[232] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[243] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[254] FALSE FALSE
sum(is.na(new_df$NumClass.Presence))
[1] 5
sum(is.na(new_df))
[1] 22

2.6 How to get help in R

?nchar

2.7 Packages

There are a lot of R packages for solving a lot of different problems. There are two way for install them (you need an internet connection):

  • packages on CRAN are checked in multiple ways and should be stable
install.packages("lingtypology")
  • packages on GitHub are NOT checked and could contain anything, but it is the place where all package developers keep the last vertion of they work.
install.packages("devtools")
devtools::install_github("ropensci/lingtypology")
  • or package file
install.packages("lingtypology",
                 destdir = "/path/to/your/package")

After the package is installed you need to load the package using the following command:

library("lingtypology")

There is a nice picture from Phillips N. D. (2017) YaRrr! The Pirate’s Guide to R:

Lamp metaphore

Figure 2.2: Lamp metaphore