Questions
Objectives
For example, if we wnat to find the mean “gdpPercap” for several countries, we can do it like this:
mean(gapminder[gapminder$continent == "Africa", "gdpPercap"])
Warning message: In mean.default(gapminder[gapminder$continent == “Africa”, “gdpPercap”]) : argument is not numeric or logical: returning NA
From R 3.0.0 onwards, use “colMeans”
colMeans(gapminder[gapminder$continent == "Africa", "gdpPercap"])
gdpPercap 2193.755
colMeans(gapminder[gapminder$continent == "Americas", "gdpPercap"])
gdpPercap 7136.11
colMeans(gapminder[gapminder$continent == "Asia", "gdpPercap"])
gdpPercap 7902.15
Commonly used functions
Be sure to:
install.packages('dplyr')
library("dplyr")
Use “select()” to choose a few variables from the dataframe.
year_country_gdp <- select(gapminder, year, country, gdpPercap)
year_country_gdp
Now let’s try the same thing with a “pipe”, %>%. A pipe takes the output of one function and passes it into another function as an argument.
year_country_gdp2 <- gapminder %>% select(year, country, gdpPercap)
year_country_gdp2
Tip: Renaming dataframe columns in dplyr
Using the “rename()” function in dplyr
rename(new_name = old_name)
tidy_gdp <- year_country_gdp %>% rename(gdp_per_capita = gdpPercap)
tidy_gdp
Using filter()
Use “select()” and “filter()” functions to look at European countries only
year_country_gdp_euro <- gapminder %>%
filter(continent == "Europe") %>%
select(year, country, gdpPercap)
year_country_gdp_euro
Challenge 1
Write a single command, using muliple lines and pipes, that creates a dataframe showing lifeExp, country, and year for African countries only.
Afr_lifeExp_country_year <- gapminder %>%
filter(continent == "Africa") %>%
select(lifeExp, country, year)
Afr_lifeExp_country_year