Open RStudio.
Open a new R script in R and save it as
wpa_3_LastFirst.R
(where Last and First is your last and
first name).
Careful about: capitalizing, last and first name order, and using
_
instead of -
.
At the top of your script, write the following (with appropriate changes):
# Assignment: WPA 3
# Name: Laura Fontanesi
# Date: 29 March 2022
Up to this point, I gave you the code to load datasets in R.
Say instead you have your own data saved on your computer or somewhere online. How can you analyze this data in R?
You have two main ways to do it: - using the “Import Dataset” button in the “Environment” tab in R Studio - using code
Today, we will learn how to import data using code within the the tidyverse package. This allow us to import data directly into tidyverse objects (i.e., tibbles, as we will see in the next 2 lessons).
The specific sub-package for importing data is called readr.
Data can come from different sources, e.g.: - text files stored locally - text files from a website
The functions you will use, depend on the specific format the data were written in:
read_delim()
is the principal and more general means
of reading tabular data into R
read_csv()
sets the default separator to a
comma
read_csv2()
is its European cousin, using a comma
for decimal places and a semicolon as a separator
read_tsv()
import tab-delimited files
Excel files: https://readxl.tidyverse.org/reference/read_excel.html
STATA, SPSS, SAS files: https://haven.tidyverse.org/
At this point, you should have a folder on your laptop for our R
course, where you stored all your scripts. Create a subfolder called
data
.
When you are done, download the content of this folder ‘https://www.dropbox.com/sh/kw6o7ztouwpiawk/AACG5YtjeF58YaKjkK9h428Ka?dl=0’
in your data
folder, so that the 5 data files are in your
data
folder.
To load these files in R, we need to write the path to your data
folder. We can do this using code completion (Tab key): - On Mac, you
can start from read_delim('~/')
(or similar functions for
loading data) and press Tab, to start navigating from your home folder -
On Windows, you can do the same, but starting from
read_delim('C:\Users\')
In my case, this folder was on Dropbox:
library(tidyverse)
data_a = read_delim('~/Dropbox/teaching/r-course22/data/data_to_import_a.txt', delim='\t')
##
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## cols(
## index = col_double(),
## participant = col_double(),
## gender = col_character(),
## age = col_double(),
## options = col_character(),
## accuracy = col_double(),
## RT_msec = col_double()
## )
head(data_a)
## # A tibble: 6 x 7
## index participant gender age options accuracy RT_msec
## <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
## 1 1 8 male 18 CD 1 2381
## 2 2 8 male 18 CD 1 1730
## 3 3 8 male 18 AB 1 1114
## 4 4 8 male 18 AC 1 600
## 5 5 8 male 18 CD 1 683
## 6 6 8 male 18 AC 0 854
data_b = read_csv('~/Dropbox/teaching/r-course22/data/data_to_import_b.csv')
##
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## cols(
## id = col_character(),
## gender = col_double(),
## age = col_double(),
## income = col_double(),
## p1 = col_double(),
## p2 = col_double(),
## p3 = col_double(),
## p4 = col_double(),
## p5 = col_double(),
## p6 = col_double(),
## p7 = col_double(),
## p8 = col_double(),
## p9 = col_double(),
## p10 = col_double(),
## task = col_double(),
## havemore = col_double(),
## haveless = col_double(),
## pcmore = col_double()
## )
# same as: data_b = read_delim('~/Dropbox/teaching/r-course22/data/data_to_import_b.csv', delim = ",")
head(data_b)
## # A tibble: 6 x 18
## id gender age income p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 task havemore haveless
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 R_3PtNn51L… 2 26 7 1 1 1 1 1 1 1 1 1 1 0 NA 50
## 2 R_2AXrrg62… 2 32 4 1 1 1 1 1 1 1 1 1 1 0 NA 25
## 3 R_cwEOX3Hg… 1 25 2 0 1 1 1 1 1 1 1 0 0 0 NA 10
## 4 R_d59iPwL4… 1 33 5 1 1 1 1 1 1 1 1 1 1 0 NA 50
## 5 R_1f3K2HrG… 1 24 1 1 1 0 1 1 1 1 1 1 1 1 99 NA
## 6 R_3oN5ijzT… 1 22 2 1 1 0 0 1 1 1 1 0 1 0 NA 20
## # … with 1 more variable: pcmore <dbl>
data_c = read_csv2('~/Dropbox/teaching/r-course22/data/data_to_import_c.csv')
## ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
##
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## cols(
## .default = col_character(),
## age = col_double(),
## Medu = col_double(),
## Fedu = col_double(),
## traveltime = col_double(),
## studytime = col_double(),
## failures = col_double(),
## famrel = col_double(),
## freetime = col_double(),
## goout = col_double(),
## Dalc = col_double(),
## Walc = col_double(),
## health = col_double(),
## absences = col_double(),
## G1 = col_double(),
## G2 = col_double(),
## G3 = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
head(data_c)
## # A tibble: 6 x 33
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian traveltime studytime failures
## <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 GP F 18 U GT3 A 4 4 at_home teacher course mother 2 2 0
## 2 GP F 17 U GT3 T 1 1 at_home other course father 1 2 0
## 3 GP F 15 U LE3 T 1 1 at_home other other mother 1 2 0
## 4 GP F 15 U GT3 T 4 2 health services home mother 1 3 0
## 5 GP F 16 U GT3 T 3 3 other other home father 1 2 0
## 6 GP M 16 U LE3 T 4 3 services other reput… mother 1 2 0
## # … with 18 more variables: schoolsup <chr>, famsup <chr>, paid <chr>, activities <chr>, nursery <chr>,
## # higher <chr>, internet <chr>, romantic <chr>, famrel <dbl>, freetime <dbl>, goout <dbl>, Dalc <dbl>, Walc <dbl>,
## # health <dbl>, absences <dbl>, G1 <dbl>, G2 <dbl>, G3 <dbl>
library(readxl)
# maybe try first: install.packages("readxl")
data_d = read_excel('~/Dropbox/teaching/r-course22/data/data_to_import_d.xls')
head(data_d)
## # A tibble: 6 x 9
## Year `Average population` `Live births` Deaths `Natural change` `Crude birth ra… `Crude death ra… `Natural change…
## <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1900 3,300,000 94,316 63,606 30,710 28.6 19.3 9.3
## 2 1901 3,341,000 97,028 60,018 37,010 29 18 11.1
## 3 1902 3,384,000 96,480 57,702 38,778 28.5 17.1 11.5
## 4 1903 3,428,000 93,824 59,626 34,198 27.4 17.4 10
## 5 1904 3,472,000 94,867 60,857 34,010 27.3 17.5 9.8
## 6 1905 3,516,000 94,653 61,800 32,853 26.9 17.6 9.3
## # … with 1 more variable: Total fertility rates <dbl>
library(haven)
# maybe first: install.packages("haven")
data_e = read_sav('~/Dropbox/teaching/r-course22/data/data_to_import_e.sav')
head(data_e)
## # A tibble: 6 x 54
## case_ID wave year weight_wave weight_aggregate happening cause_original cause_other_text cause_recoded
## <dbl> <dbl+lbl> <dbl+lbl> <dbl> <dbl> <dbl+lbl> <dbl+lbl> <chr> <dbl+lbl>
## 1 2 1 [Nov 2008] 1 [2008] 0.54 0.294 3 [Yes] 1 [Caused mos… "" 6 [Caused mo…
## 2 3 1 [Nov 2008] 1 [2008] 0.85 0.463 2 [Don't… 1 [Caused mos… "" 6 [Caused mo…
## 3 5 1 [Nov 2008] 1 [2008] 0.49 0.267 2 [Don't… 2 [Caused mos… "" 4 [Caused mo…
## 4 6 1 [Nov 2008] 1 [2008] 0.29 0.158 3 [Yes] 2 [Caused mos… "" 4 [Caused mo…
## 5 7 1 [Nov 2008] 1 [2008] 1.29 0.702 3 [Yes] 1 [Caused mos… "" 6 [Caused mo…
## 6 8 1 [Nov 2008] 1 [2008] 2.56 1.39 2 [Don't… 2 [Caused mos… "" 4 [Caused mo…
## # … with 45 more variables: sci_consensus <dbl+lbl>, worry <dbl+lbl>, harm_personally <dbl+lbl>, harm_US <dbl+lbl>,
## # harm_dev_countries <dbl+lbl>, harm_future_gen <dbl+lbl>, harm_plants_animals <dbl+lbl>, when_harm_US <dbl+lbl>,
## # reg_CO2_pollutant <dbl+lbl>, reg_utilities <dbl+lbl>, fund_research <dbl+lbl>, reg_coal_emissions <dbl+lbl>,
## # discuss_GW <dbl+lbl>, hear_GW_media <dbl+lbl>, gender <dbl+lbl>, age <dbl>, age_category <dbl+lbl>,
## # generation <dbl+lbl>, educ <dbl+lbl>, educ_category <dbl+lbl>, income <dbl+lbl>, income_category <dbl+lbl>,
## # race <dbl+lbl>, ideology <dbl+lbl>, party <dbl+lbl>, party_w_leaners <dbl+lbl>, party_x_ideo <dbl+lbl>,
## # registered_voter <dbl+lbl>, region9 <dbl+lbl>, region4 <dbl+lbl>, religion <dbl+lbl>, …
Let’s say we want to load in R some data, directly from a website (without saving it to a file). In this case, we get some data from the website “https://support.spatialkey.com/spatialkey-sample-csv-data/”. Instead of writing a local address, we can simply use the same functions with the web address.
data_transactions = read_csv("https://support.spatialkey.com/wp-content/uploads/2021/02/Sacramentorealestatetransactions.csv")
##
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## cols(
## street = col_character(),
## city = col_character(),
## zip = col_double(),
## state = col_character(),
## beds = col_double(),
## baths = col_double(),
## sq__ft = col_double(),
## type = col_character(),
## sale_date = col_character(),
## price = col_double(),
## latitude = col_double(),
## longitude = col_double()
## )
head(data_transactions)
## # A tibble: 6 x 12
## street city zip state beds baths sq__ft type sale_date price latitude longitude
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 3526 HIGH ST SACRAMENTO 95838 CA 2 1 836 Residential Wed May 21 00:00… 59222 38.6 -121.
## 2 51 OMAHA CT SACRAMENTO 95823 CA 3 1 1167 Residential Wed May 21 00:00… 68212 38.5 -121.
## 3 2796 BRANCH ST SACRAMENTO 95815 CA 2 1 796 Residential Wed May 21 00:00… 68880 38.6 -121.
## 4 2805 JANETTE WAY SACRAMENTO 95815 CA 2 1 852 Residential Wed May 21 00:00… 69307 38.6 -121.
## 5 6001 MCMAHON DR SACRAMENTO 95824 CA 2 1 797 Residential Wed May 21 00:00… 81900 38.5 -121.
## 6 5828 PEPPERMILL CT SACRAMENTO 95841 CA 3 1 1122 Condo Wed May 21 00:00… 89921 38.7 -121.
If we want, we can then save it to file from R, using similar set of
functions that start with write_
instead of
read_
. You can use such functions also to save your data in
a different format from the original for later use.
# save it to file
write_csv(data_transactions, file = "~/Dropbox/teaching/r-course22/data/Sacramentorealestatetransactions.csv")
# load it again
data_transactions = read_csv("~/Dropbox/teaching/r-course22/data/Sacramentorealestatetransactions.csv")
##
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## cols(
## street = col_character(),
## city = col_character(),
## zip = col_double(),
## state = col_character(),
## beds = col_double(),
## baths = col_double(),
## sq__ft = col_double(),
## type = col_character(),
## sale_date = col_character(),
## price = col_double(),
## latitude = col_double(),
## longitude = col_double()
## )
head(data_transactions)
## # A tibble: 6 x 12
## street city zip state beds baths sq__ft type sale_date price latitude longitude
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 3526 HIGH ST SACRAMENTO 95838 CA 2 1 836 Residential Wed May 21 00:00… 59222 38.6 -121.
## 2 51 OMAHA CT SACRAMENTO 95823 CA 3 1 1167 Residential Wed May 21 00:00… 68212 38.5 -121.
## 3 2796 BRANCH ST SACRAMENTO 95815 CA 2 1 796 Residential Wed May 21 00:00… 68880 38.6 -121.
## 4 2805 JANETTE WAY SACRAMENTO 95815 CA 2 1 852 Residential Wed May 21 00:00… 69307 38.6 -121.
## 5 6001 MCMAHON DR SACRAMENTO 95824 CA 2 1 797 Residential Wed May 21 00:00… 81900 38.5 -121.
## 6 5828 PEPPERMILL CT SACRAMENTO 95841 CA 3 1 1122 Condo Wed May 21 00:00… 89921 38.7 -121.
Go on the data
folder where I load the datasets for the
seminar: ‘https://github.com/laurafontanesi/r-seminar22/tree/main/data’.
Click on tdcs.csv
.
To be able to load these data in R, we first need to get to the raw data.
You can get them by clickin on View raw
.
Note that for some files, instead of getting to the raw
data page, you can directly dowload them to a local directory. From
there, you can simply load them in R using the appropriate
read_
function.
Copy the adress of the page containing the raw data. It should be https://raw.githubusercontent.com/laurafontanesi/r-seminar22/master/data/tdcs.csv
You can use now this url with one of our read_
functions:
data_tdcs = read_csv("https://raw.githubusercontent.com/laurafontanesi/r-seminar22/main/data/tdcs.csv")
##
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## cols(
## RT = col_double(),
## acc_spd = col_character(),
## accuracy = col_double(),
## angle = col_double(),
## block = col_double(),
## coherence = col_double(),
## dataset = col_character(),
## id = col_character(),
## left_right = col_double(),
## subj_idx = col_double(),
## tdcs = col_character(),
## trial_NR = col_double()
## )
head(data_tdcs)
## # A tibble: 6 x 12
## RT acc_spd accuracy angle block coherence dataset id left_right subj_idx tdcs trial_NR
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl>
## 1 799 spd 1 180 1 0.417 berkeley S1.1 2 1 sham 1
## 2 613 spd 1 180 1 0.417 berkeley S1.1 2 1 sham 2
## 3 627 spd 1 180 1 0.417 berkeley S1.1 1 1 sham 3
## 4 1280 acc 0 180 1 0.417 berkeley S1.1 1 1 sham 4
## 5 800 spd 1 180 1 0.417 berkeley S1.1 2 1 sham 5
## 6 760 acc 1 180 1 0.417 berkeley S1.1 2 1 sham 6
Task A
From the data
folder on Github, get the data sets in the list below. Load them in
R giving the respective names: qualtrics_data
,
data_f
, data_g
, data_h
. Inspect
them using head()
or glimpse()
. Finally, save
them to your local data
directory (that you should have as
a sub-directory in your R course directory) as csv
files.
Task B
Go to this website: https://www.britishelectionstudy.com/data-objects/cross-sectional-data/ (you can register for free).
Download the 2017 Face-to-face Post-election Survey Version 1.5
SPSS file in your local data
directory
(see above). Then, load it in R assigning it to the name
british_cross_sectional_data
using the appropriate function
for SPSS files and inspect it using head()
or
glimpse()
.
Save and email your script to me at laura.fontanesi@unibas.ch by the end of Friday.