<- function(name){
myFirstFunction
<- paste0(name," ", "is a good student.")
statement
return(cat(statement))
}
5 Functions and R
packages
We have studied the most important objects in R
. There are other objects that will be studied in upcoming advanced topics.
I left behind an important type of object: functions. A function is an object that performs an operation based on an input. The function will ask for input information and after that, the function will give an output.
Functions can be created with the command function()
which is in simple words a function that creates other functions. Sounds redundant but it is an accurate statement!
In the code above, I’m creating a new function by using function
. Inside the parenthesis there is one argument named name
. That’s the only information this function is asking for. You need to provide a name or a vector of names. The function will return a phrase, let’s try it out:
myFirstFunction("Nicole")
Nicole is a good student.
You may notice now, that R
is very smart. It knows that the argument must be a character string in this case. You also should pay attention to how I use the word “argument”. All functions will require at least one argument. An argument is the input information needed to generate an output. In this case myFirstFunction()
works with character vectors.
We can try myFirstFunction()
again, but this time we can use several names at the same time:
<- c("Nicole", "Emma","Nina")
severalNames
myFirstFunction(severalNames)
Nicole is a good student. Emma is a good student. Nina is a good student.
Great! Our function is smart enough to repeat the same statement on different names. As you can see, functions are useful objects in R
, in fact; functions are everywhere in the R
space.
5.1 Built in functions
I have good news! You don’t have to create new functions in this course. We will use functions already programmed by other users called “developers”. R
is an open source language, any person can create new functions and allow the use of the function for free. R
has many of this functions already programmed in the global environment.
Again, let’s see some examples:
mean(mtcars$hp) ### estimates mean or average
[1] 146.6875
R
also has data sets that you can use for practicing. In this example, I’m estimating the mean horse power
, in the data set called mtcars
. You may see the list of data set using the function data()
.
median(mtcars$hp) ### Estimates the median
[1] 123
The function median()
is also an example of a built-in function in R
.
There are many more functions in R
already programmed to be used. We will study more functions according to the specific topic.
5.2 R
Packages
R
is not a software, it is a programming language. Programming languages are made to create software. In R
we call these software packages. Packages are a set of functions that allow you to simplify steps in your analysis. As a new R
user in this class, you don’t need to know how to program a package, you will use packages already created by other programmers.
Packages in R
are free, but you have to install the package in your local R
environment. You will have to install the package only once. All packages are available in a huge repository called CRAN (The Comprehensive R Archive Network). R
is automatically connected to CRAN. You only need to install a new package typing the function install.packages()
:
install.packages("ggplot2")
In the code above, I’m installing a new package called ggplot2
. This is a package with a lot of instructions to create beautiful plots. We will learn more about ggplot2
in the next chapters.
5.2.1 Packages are like books in your shelf.
After installing a package, you need to call the package every time you open a new session in R
. For instance, in the next code, I’m creating a scatter plot using the package ggplot()
. You will notice that I’m calling the package from the “bookshelf” using the function library(ggplot2)
. If you are a Pokemon fan, your package is a Pokemon, and you have to call your Pokemon to the battle field.
Also, I will plot data from the package palmerpenguins
(Horst et al., 2020)
Click here to see the packages
### The rule is to write the packages required by your code at the beginning
## Packages loaded or called
library("jpeg") ## reads pictures into R
library(patchwork) ## more tools to add features in a plot
library(ggplot2) ## creates plots
library(palmerpenguins) ## This package has the penguin data
<- "penguins.jpg"
picture <- readJPEG(picture, native = TRUE) img
ggplot(penguins, aes(x = flipper_length_mm,
y= body_mass_g,
color = species)) +
geom_point() +
geom_smooth(se = FALSE, method = "lm" ) +
theme_classic() +
xlab("Flipper Length in milimeters")+
ylab("Body Mass in grams")+
inset_element(p = img,
left = 0.05,
bottom = 0.65,
right = 0.5,
top = 0.95)
5.3 Pipes in R
Yes, you are right. We will use pipes in R
. A pipe is a connection between the Left Hand Statement (LHS) with the Right Hand Statement (RHS). You already know that you can write several lines of code in an R
script. However, pipes will help to join together several statements at the same time. Each statement is a action performed on an object.
Taken from Isabella Velazquez
Take a look at the following code:
library(palmerpenguins)
|>
penguins with(mean(bill_length_mm, na.rm = TRUE ))
[1] 43.92193
In the above code, you’ll see the sign |>
. This is the pipe operator in R
, you may also notice that we don’t make a reference to the data set penguins
in the next line where the estimation of the mean happens. R
understands that the object where the operation takes place is penguins
because of the pipe.
The next code shows how we would estimate the mean wihout using a pipe:
mean(penguins$bill_length_mm, na.rm = TRUE)
[1] 43.92193
You might be thinking: Come on! That was easier! I don’t need pipes. Well, in this case the pipe does not look very useful. But let’s make the example more complicated. Let’s create a plot:
library(palmerpenguins) ## I'm loading the package "palmerpenguins". You should install
## the package "palmerpenguins" before loading the package.
plot(penguins$bill_length_mm, penguins$body_mass_g)
Now, let’s create the same plot using a pipe:
|>
penguins with(plot(bill_length_mm, body_mass_g))
The last example includes a pipe, and it looks easier to understand compare to the code in the first plot.
5.3.1 Piping can be done differently
R
offers another option to create pipes. The package dplyr
(Wickham et al., 2022) is popular for implementing the magrittr's
pipe (Bache & Wickham, 2022). This pipe is represented by using the function %>%
.
library(dplyr) ### run install.packages("dplyr") to install the package.
%>%
penguins na.omit() %>%
group_by(sex) %>%
summarise(Mean = round(mean(body_mass_g),2), ### body_mass_g is the penguin's weight in grams
SD = round(sd(body_mass_g),2))
# A tibble: 2 × 3
sex Mean SD
<fct> <dbl> <dbl>
1 female 3862. 666.
2 male 4546. 788.
6 Questions
R
contains data sets available for practicing. You can see the list of data sets by runningdata()
. Run thedata()
function, after that select one data set, then run the functionsummary()
. What did you see after runningsummary()
?
### Example: I'm running summary() with the data set named 'mtcars'. Select
### another data set from data()
data()
summary(mtcars)