Questions

  1. How can I manipulate dataframes without repeating myself?

Objectives

  1. To be able to use the six main dataframe manipulation ‘verbs’ with pipes in dplyr.
  2. To understand how “group_by()” and “summarize()” can be combined to summarize datasets.
  3. Be able to analyze a subset of data using logical filtering.

For example, if we wnat to find the mean “gdpPercap” for several countries, we can do it like this:

mean(gapminder[gapminder$continent == "Africa", "gdpPercap"]) 

Warning message: In mean.default(gapminder[gapminder$continent == “Africa”, “gdpPercap”]) : argument is not numeric or logical: returning NA

From R 3.0.0 onwards, use “colMeans”

colMeans(gapminder[gapminder$continent == "Africa", "gdpPercap"])

gdpPercap 2193.755

colMeans(gapminder[gapminder$continent == "Americas", "gdpPercap"])

gdpPercap 7136.11

colMeans(gapminder[gapminder$continent == "Asia", "gdpPercap"])

gdpPercap 7902.15

The dplyr package

Commonly used functions

  1. select()
  2. filter()
  3. group_by()
  4. summarize()
  5. mutate()

Be sure to:

install.packages('dplyr')
library("dplyr")

Use “select()” to choose a few variables from the dataframe.

year_country_gdp <- select(gapminder, year, country, gdpPercap)
year_country_gdp

Now let’s try the same thing with a “pipe”, %>%. A pipe takes the output of one function and passes it into another function as an argument.

year_country_gdp2 <- gapminder %>% select(year, country, gdpPercap)
year_country_gdp2

Tip: Renaming dataframe columns in dplyr

Using the “rename()” function in dplyr

rename(new_name = old_name)

tidy_gdp <- year_country_gdp %>% rename(gdp_per_capita = gdpPercap)
tidy_gdp

Using filter()

Use “select()” and “filter()” functions to look at European countries only

year_country_gdp_euro <- gapminder %>% 
filter(continent == "Europe") %>%
select(year, country, gdpPercap)

year_country_gdp_euro

Challenge 1

Write a single command, using muliple lines and pipes, that creates a dataframe showing lifeExp, country, and year for African countries only.

Afr_lifeExp_country_year <- gapminder %>%
filter(continent == "Africa") %>%
select(lifeExp, country, year)

Afr_lifeExp_country_year