Open RStudio.
Open a new R script in R and save it as
wpa_2_LastFirst.R
(where Last and First is your last and
first name).
Careful about capitalizing, and using _
: DO NOT
write WPA-2-LastFirst.R
or
WPA_2_LastFirst.R
or wpa_2_FirstLast.R
At the top of your script, write the following (with appropriate changes):
# Assignment: WPA 2
# Name: Laura Fontanesi
# Date: 22 March 2022
Technically R is an expression language with a very simple syntax. It
is case sensitive as are most UNIX based packages, so A
and
a
are different symbols and would refer to different
variables.
In this course, we will try to avoid numbers in R names, as well as
capital letters and .
. We will thus use names such as:
my_data
, anova_results
,
square_root
, etc.
Elementary commands consist of either expressions or assignments.
If an expression is given as a command, it is evaluated, printed, and the value is lost:
57*2 + 100
## [1] 214
An assignment also evaluates an expression and passes the value to a variable but the result is not automatically printed:
some_variable = 57*2 + 100
print(some_variable)
## [1] 214
Commands are separated by a newline.
Comments can be put almost anywhere, starting with a hashmark
(#
), everything to the end of the line is a comment.
In the console, if a command is not complete at the end of a line, R
will give a different prompt (by default +
) on second and
subsequent lines and continue to read input until the command is
syntactically complete.
R operates on named data structures. The simplest such structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers. To set up a vector named x, say, consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command:
x = c(10.4, 5.6, 3.1, 6.4, 21.7)
print(x)
## [1] 10.4 5.6 3.1 6.4 21.7
This is an assignment statement using the function c()
which in this context can take an arbitrary number of vector arguments
and whose value is a vector got by concatenating its arguments end to
end.
If an expression is used as a complete command, the value is printed and lost. So now if we were to use the command:
1/x + 5
## [1] 5.096154 5.178571 5.322581 5.156250 5.046083
the reciprocals of the five values would be printed at the terminal (and the value of x, of course, unchanged).
The further assignment:
y = c(x, 0, x)
would create a vector y with 11 entries consisting of two copies of x with a zero in the middle place:
print(y)
## [1] 10.4 5.6 3.1 6.4 21.7 0.0 10.4 5.6 3.1 6.4 21.7
Vectors can be used in arithmetic expressions, in which case the operations are performed element by element. Vectors occurring in the same expression need not all be of the same length. If they are not, the value of the expression is a vector with the same length as the longest vector which occurs in the expression. Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector. In particular, a constant is simply repeated. So with the above assignments the commands:
x
## [1] 10.4 5.6 3.1 6.4 21.7
z = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
print(z)
## [1] 1 2 3 4 5 6 7 8 9 10
v = 2*x + z + 1
print(v)
## [1] 22.8 14.2 10.2 17.8 49.4 27.8 19.2 15.2 22.8 54.4
generates a new vector v
of length 10 constructed by
adding together, element by element, 2*x
repeated 2 times,
z
repeated just once, and 1
repeated 10
times.
However, when trying:
print(y)
## [1] 10.4 5.6 3.1 6.4 21.7 0.0 10.4 5.6 3.1 6.4 21.7
w = 2*x + y + 1
## Warning in 2 * x + y: longer object length is not a multiple of shorter object length
print(w)
## [1] 32.2 17.8 10.3 20.2 66.1 21.8 22.6 12.8 16.9 50.8 43.5
It gives us the Warning that y
(the longest object, of
lenght 11) is not a multiple of x
(the shortest, with
length 5).
The elementary arithmetic operators are the usual +
,
-
, *
, /
and ^
for
raising to a power.
In addition all of the common arithmetic functions are available.
log
, exp
, sin
, cos
,
tan
, sqrt
, and so on, all have their usual
meaning.
max
and min
select the largest and
smallest elements of a vector respectively.
range
is a function whose value is a vector of
length two, namely c(min(x), max(x))
.
length(x)
is the number of elements in
x
.
sum(x)
gives the total of the elements in
x
, and prod(x)
their product.
Two statistical functions are mean(x)
which
calculates the sample mean, which is the same as
sum(x)/length(x)
, and var(x)
which gives
sum((x-mean(x))^2)/(length(x)-1)
or sample
variance.
sort(x)
returns a vector of the same size as x with
the elements arranged in increasing order.
min(y)
## [1] 0
max(y)
## [1] 21.7
range(y)
## [1] 0.0 21.7
log(y)
## [1] 2.341806 1.722767 1.131402 1.856298 3.077312 -Inf 2.341806 1.722767 1.131402 1.856298 3.077312
length(y)
## [1] 11
sort(y)
## [1] 0.0 3.1 3.1 5.6 5.6 6.4 6.4 10.4 10.4 21.7 21.7
mean(x)
## [1] 9.44
sd(x)
## [1] 7.33846
var(x)
## [1] 53.853
R has a number of facilities for generating commonly used sequences of numbers.
For example:
4:12
## [1] 4 5 6 7 8 9 10 11 12
The colon operator has high priority within an
expression, so, for example 2*1:15
is the vector
c(2, 4, ..., 28, 30)
.
Put n = 10 and compare the sequences 1:n-1
and
1:(n-1)
.
The construction 5:1
may be used to generate a sequence
backwards:
n = 10
1:n-1
## [1] 0 1 2 3 4 5 6 7 8 9
1:(n-1)
## [1] 1 2 3 4 5 6 7 8 9
The function seq()
is a more general
facility for generating sequences. It has five arguments, only some of
which may be specified in any one call. The first two arguments, if
given, specify the beginning and end of the sequence, and if these are
the only two arguments given the result is the same as the colon
operator. That is seq(2,10)
is the same vector as
2:10
.
Arguments to seq(), and to many other R functions, can also be given
in named form, in which case the order in which they appear is
irrelevant. The first two arguments may be named from=value
and to=value
; thus seq(1,30)
,
seq(from=1, to=30)
and seq(to=30, from=1)
are
all the same as 1:30
.
The next two arguments to seq() may be named by=value
and length=value
, which specify a step size and a length
for the sequence respectively. If neither of these is given, the default
by=1
is assumed.
For example:
seq(from=-1, to=10, by=3)
## [1] -1 2 5 8
seq(length=5, from=-1, by=.5)
## [1] -1.0 -0.5 0.0 0.5 1.0
A related function is rep()
which can
be used for replicating an object.
It has two main arguments, times
and each
.
The first tells how many copies do we want of the same object, and the
second tells how many copies of each element of the object we want to
repeat.
a = 1:4
print(a)
## [1] 1 2 3 4
rep(a, times=3)
## [1] 1 2 3 4 1 2 3 4 1 2 3 4
rep(a, each=3)
## [1] 1 1 1 2 2 2 3 3 3 4 4 4
rep(a, times=3, each=2)
## [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
As well as numerical vectors, R allows manipulation of logical
quantities. The elements of a logical vector can have the values
TRUE
, FALSE
, and NA
(for “not
available”). The first two are often abbreviated as T
and
F
, respectively. Note however that T
and
F
are just variables which are set to TRUE
and
FALSE
by default, but are not reserved words and hence can
be overwritten by the user. Hence, you should always use
TRUE
and FALSE
.
Logical vectors are generated by conditions. For example:
x
## [1] 10.4 5.6 3.1 6.4 21.7
x > 10
## [1] TRUE FALSE FALSE FALSE TRUE
x <= 10
## [1] FALSE TRUE TRUE TRUE FALSE
x == 10.4
## [1] TRUE FALSE FALSE FALSE FALSE
x != 10.4
## [1] FALSE TRUE TRUE TRUE TRUE
temp = (x > 10)
temp
## [1] TRUE FALSE FALSE FALSE TRUE
sets temp
as a vector of the same length as
x
with values FALSE
corresponding to elements
of x
where the condition is not met and TRUE
where it is.
The logical operators are <
, <=
,
>
, >=
, ==
for exact
equality and !=
for inequality.
In addition, if c1
and c2
are logical
expressions, then c1 & c2
is their intersection
(“and”), c1 | c2
is their union (“or”), and
!c1
is the negation of c1.
temp_opposite = (x <= 10)
temp_opposite
## [1] FALSE TRUE TRUE TRUE FALSE
!temp
## [1] FALSE TRUE TRUE TRUE FALSE
temp & temp_opposite
## [1] FALSE FALSE FALSE FALSE FALSE
temp | temp_opposite
## [1] TRUE TRUE TRUE TRUE TRUE
Logical vectors may be used in ordinary arithmetic,
in which case they are coerced into numeric vectors, FALSE
becoming 0 and TRUE
becoming 1. However, there are
situations where logical vectors and their coerced numeric counterparts
are not equivalent, for example see the next subsection.
sum(temp)
## [1] 2
mean(temp)*100
## [1] 40
In some cases the components of a vector may not be completely known.
When an element or value is “not available” or a “missing value” in the
statistical sense, a place within a vector may be reserved for it by
assigning it the special value NA
. In general, any
operation on an NA
becomes an NA
. The
motivation for this rule is simply that if the specification of an
operation is incomplete, the result cannot be known and hence is not
available.
The function is.na(x)
gives a logical vector of the same
size as x
with value TRUE
if and only if the
corresponding element in x
is NA
.
z = c(1:3, NA)
z
## [1] 1 2 3 NA
ind = is.na(z)
ind
## [1] FALSE FALSE FALSE TRUE
Note that there is a second kind of “missing” values which are
produced by numerical computation, the so-called Not a Number,
NaN
, values. Examples are:
0/0
## [1] NaN
30/0
## [1] Inf
30/0 - 30/0
## [1] NaN
which give NaN
since the result cannot be defined
sensibly.
In summary, is.na(xx)
is TRUE
both for
NA
and NaN
values. To differentiate these,
is.nan(xx)
is only TRUE
for
NaNs
.
is.na(NA)
## [1] TRUE
is.nan(NA)
## [1] FALSE
Character quantities and character vectors are used frequently in R, for example as plot labels. Where needed they are denoted by a sequence of characters delimited by the double quote character, e.g., “x-values”, “New iteration results”.
Character strings are entered using either matching double (“) or
single (’) quotes, but are printed using double quotes (or sometimes
without quotes). Useful escape sequences are \n
, newline
and \t
, tab.
Character vectors may be concatenated into a vector by the
c()
function.
char_first = c('a', 'b', 'c')
char_second = c('E', 'F', 'G')
char_third = c(char_first, char_second)
print(char_third)
## [1] "a" "b" "c" "E" "F" "G"
print("This is the assginmenet of Laura Fontanesi.\nIt was done on the 9th of March.")
## [1] "This is the assginmenet of Laura Fontanesi.\nIt was done on the 9th of March."
cat("This is the assginmenet of Laura Fontanesi.\nIt was done on the 9th of March.")
## This is the assginmenet of Laura Fontanesi.
## It was done on the 9th of March.
cat("This is the assginmenet of Laura Fontanesi.\n\tIt was done on the 9th of March.")
## This is the assginmenet of Laura Fontanesi.
## It was done on the 9th of March.
The paste()
function takes an arbitrary
number of arguments and concatenates them one by one into character
strings. Any numbers given among the arguments are coerced into
character strings in the evident way, that is, in the same way they
would be if they were printed. The arguments are by default separated in
the result by a single blank character, but this can be changed by the
named argument, sep=string
, which changes it to string,
possibly empty.
For example:
labs = paste(c("X", "Y"), 1:10, sep="-")
labs
## [1] "X-1" "Y-2" "X-3" "Y-4" "X-5" "Y-6" "X-7" "Y-8" "X-9" "Y-10"
Note particularly that recycling of short lists takes place here too;
thus c("X", "Y")
is repeated 5 times to match the sequence
1:10
.
Subsets of the elements of a vector may be selected by appending to the name of the vector an index vector in square brackets. More generally, any expression that evaluates to a vector may have subsets of its elements similarly selected by appending an index vector in square brackets immediately after the expression.
Such index vectors can be any of 3 main types:
In this case the index vector is recycled to the same length as the
vector from which elements are to be selected. Values corresponding to
TRUE
in the index vector are selected and those
corresponding to FALSE
are omitted. For example:
z
## [1] 1 2 3 NA
!is.na(z)
## [1] TRUE TRUE TRUE FALSE
y = z[!is.na(z)]
print(y)
## [1] 1 2 3
creates (or re-creates) an object y
which will contain
the non-missing values of z
, in the same order. Note that
if z
has missing values, y
will be shorter
than z
.
Also:
y = (z+1)[(!is.na(z)) & (z+1)>1]
print(y)
## [1] 2 3 4
creates an object y
and places in it the values of the
vector z+1
for which the corresponding value in
z
was both non-missing and positive.
You can also create a new variable new_z
as a step
before creating y
:
new_z = z + 1
y = new_z[(!is.na(new_z)) & new_z>1]
print(y)
## [1] 2 3 4
In this case, the values in the index vector must lie in the set {1,
2, . . ., length(x)
}. The corresponding elements of the
vector are selected and concatenated, in that order, in the result.
The index vector can be of any length and the result is of the same length as the index vector.
For example x[6]
is the sixth component of
x
and x[1:10]
selects the first 10 elements of
x
(assuming length(x)
is not less than
10).
Also:
c(12, 4, 67)[2]
## [1] 4
weird_indices = rep(c(1,2,2,1), times=3)
print(weird_indices)
## [1] 1 2 2 1 1 2 2 1 1 2 2 1
c("x","y")[weird_indices]
## [1] "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x"
(an admittedly unlikely thing to do) produces a character vector of length 12 consisting of “x”, “y”, “y”, “x” repeated 3 times.
Or, more likely:
x = seq(from=-3, to=3, by=1)
print(x)
## [1] -3 -2 -1 0 1 2 3
length(x)
## [1] 7
x[c(1, length(x))]
## [1] -3 3
to get the first and last elements of x
.
Such an index vector specifies the values to be excluded rather than included. Thus:
print(x)
## [1] -3 -2 -1 0 1 2 3
x[-1]
## [1] -2 -1 0 1 2 3
x[-(1:3)]
## [1] 0 1 2 3
gives all but the first 3 elements of x
.
A factor is a vector object used to specify a discrete classification (grouping) of the components of other vectors of the same length. R provides both ordered and unordered factors. While the “real” application of factors is with model formulae (we will see this when fitting regressions, ANOVAs, etc.), we here look at a specific example.
Suppose, for example, we have a sample of 30 tax accountants from all the states and territories of Australia and their individual state of origin is specified by a character vector of state mnemonics as:
state = c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa",
"wa", "qld", "vic", "nsw", "vic", "qld", "qld",
"sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw",
"nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act")
sort(state)
## [1] "act" "act" "nsw" "nsw" "nsw" "nsw" "nsw" "nsw" "nt" "nt" "qld" "qld" "qld" "qld" "qld" "sa" "sa" "sa"
## [19] "sa" "tas" "tas" "vic" "vic" "vic" "vic" "vic" "wa" "wa" "wa" "wa"
Notice that in the case of a character vector, “sorted” means sorted in alphabetical order.
A factor is similarly created using the factor()
function:
state_factor = factor(state)
print(state_factor)
## [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa tas sa nt wa vic qld nsw nsw wa sa act nsw vic
## [29] vic act
## Levels: act nsw nt qld sa tas vic wa
To find out the levels of a factor the function levels()
can be used.
levels(state_factor)
## [1] "act" "nsw" "nt" "qld" "sa" "tas" "vic" "wa"
The levels of factors are stored in alphabetical order, or in the order they were specified to factor if they were specified explicitly.
Sometimes the levels will have a natural ordering that we want to
record and want our statistical analysis to make use of. The
ordered()
function creates such ordered factors but is
otherwise identical to factor
. For most purposes the only
difference between ordered and unordered factors is that the former are
printed showing the ordering of the levels, but the contrasts generated
for them in fitting linear models are different.
x = factor(c("Man", "Male", "Man", "Lady", "Female"))
print(x)
## [1] Man Male Man Lady Female
## Levels: Female Lady Male Man
x = factor(c("Man", "Male", "Man", "Lady", "Female"), levels = c("Man", "Female", "Lady"))
print(x)
## [1] Man <NA> Man Lady Female
## Levels: Man Female Lady
sort(x)
## [1] Man Man Female Lady
## Levels: Man Female Lady
A data frame is a list where its components are vectors (numeric, character, or logical) or factors of equal length.
n = c(2, 3, 5)
s = c("aa", "bb", "cc")
b = c(TRUE, FALSE, TRUE)
df = data.frame(n, s, b)
print(df)
## n s b
## 1 2 aa TRUE
## 2 3 bb FALSE
## 3 5 cc TRUE
Select one column by name using the $
operator:
df$n
## [1] 2 3 5
Select one or more columns by name using the brackets:
df[,"n"]
## [1] 2 3 5
df[,c("n", "s")]
## n s
## 1 2 aa
## 2 3 bb
## 3 5 cc
Select one or more columns by index using the brackets:
df[,1]
## [1] 2 3 5
df[,c(1, 3)]
## n b
## 1 2 TRUE
## 2 3 FALSE
## 3 5 TRUE
Select one or more rows by index using the brackets:
df[1,]
## n s b
## 1 2 aa TRUE
df[c(1,3),]
## n s b
## 1 2 aa TRUE
## 3 5 cc TRUE
Finally, you can select both rows and columns:
df[2, c("n", "s")]
## n s
## 2 3 bb
Note that you can use all the ways to index vectors, both for rows and columns:
design = data.frame(
stimulus = rep(c('a', 'b', 'c'), each=4),
item = seq(from=1, by=1, length.out=12),
correct_response = rep(c(0, 1), times=6)
)
print(design)
## stimulus item correct_response
## 1 a 1 0
## 2 a 2 1
## 3 a 3 0
## 4 a 4 1
## 5 b 5 0
## 6 b 6 1
## 7 b 7 0
## 8 b 8 1
## 9 c 9 0
## 10 c 10 1
## 11 c 11 0
## 12 c 12 1
design[design$stimulus == 'c',]
## stimulus item correct_response
## 9 c 9 0
## 10 c 10 1
## 11 c 11 0
## 12 c 12 1
design[design$stimulus == 'c' & design$correct_response == 0,]
## stimulus item correct_response
## 9 c 9 0
## 11 c 11 0
design[design$correct_response == 1, "item"]
## [1] 2 4 6 8 10 12
design[seq(from=1, to=5),]
## stimulus item correct_response
## 1 a 1 0
## 2 a 2 1
## 3 a 3 0
## 4 a 4 1
## 5 b 5 0
design[-seq(from=1, to=5),]
## stimulus item correct_response
## 6 b 6 1
## 7 b 7 0
## 8 b 8 1
## 9 c 9 0
## 10 c 10 1
## 11 c 11 0
## 12 c 12 1
To add a column after creating the dataframe, you have 3 options.
$
operator:df$first = seq(from=7, by=10, length.out=3)
print(df)
## n s b first
## 1 2 aa TRUE 7
## 2 3 bb FALSE 17
## 3 5 cc TRUE 27
df['second'] = seq(from=70, by=2, length.out=3)
print(df)
## n s b first second
## 1 2 aa TRUE 7 70
## 2 3 bb FALSE 17 72
## 3 5 cc TRUE 27 74
cbind
:df= cbind(df, seq(from=700, by=5, length.out=3))
print(df)
## n s b first second seq(from = 700, by = 5, length.out = 3)
## 1 2 aa TRUE 7 70 700
## 2 3 bb FALSE 17 72 705
## 3 5 cc TRUE 27 74 710
colnames(df)
## [1] "n" "s"
## [3] "b" "first"
## [5] "second" "seq(from = 700, by = 5, length.out = 3)"
colnames(df) = c("n", "s", "b", "first", "second", "third")
print(df)
## n s b first second third
## 1 2 aa TRUE 7 70 700
## 2 3 bb FALSE 17 72 705
## 3 5 cc TRUE 27 74 710
design_second = data.frame(
stimulus = rep(c('d', 'e', 'f'), each=2),
item = seq(from=13, by=1, length.out=6),
correct_response = rep(c(0, 1), times=3)
)
print(design_second)
## stimulus item correct_response
## 1 d 13 0
## 2 d 14 1
## 3 e 15 0
## 4 e 16 1
## 5 f 17 0
## 6 f 18 1
design_new = rbind(design, design_second)
print(design_new)
## stimulus item correct_response
## 1 a 1 0
## 2 a 2 1
## 3 a 3 0
## 4 a 4 1
## 5 b 5 0
## 6 b 6 1
## 7 b 7 0
## 8 b 8 1
## 9 c 9 0
## 10 c 10 1
## 11 c 11 0
## 12 c 12 1
## 13 d 13 0
## 14 d 14 1
## 15 e 15 0
## 16 e 16 1
## 17 f 17 0
## 18 f 18 1
design_new[13, "correct_response"] = NaN
print(design_new)
## stimulus item correct_response
## 1 a 1 0
## 2 a 2 1
## 3 a 3 0
## 4 a 4 1
## 5 b 5 0
## 6 b 6 1
## 7 b 7 0
## 8 b 8 1
## 9 c 9 0
## 10 c 10 1
## 11 c 11 0
## 12 c 12 1
## 13 d 13 NaN
## 14 d 14 1
## 15 e 15 0
## 16 e 16 1
## 17 f 17 0
## 18 f 18 1
design_third = data.frame(
missing_value = is.nan(design_new$correct_response)
)
print(design_third)
## missing_value
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## 11 FALSE
## 12 FALSE
## 13 TRUE
## 14 FALSE
## 15 FALSE
## 16 FALSE
## 17 FALSE
## 18 FALSE
design_final = cbind(design_new, design_third)
print(design_final)
## stimulus item correct_response missing_value
## 1 a 1 0 FALSE
## 2 a 2 1 FALSE
## 3 a 3 0 FALSE
## 4 a 4 1 FALSE
## 5 b 5 0 FALSE
## 6 b 6 1 FALSE
## 7 b 7 0 FALSE
## 8 b 8 1 FALSE
## 9 c 9 0 FALSE
## 10 c 10 1 FALSE
## 11 c 11 0 FALSE
## 12 c 12 1 FALSE
## 13 d 13 NaN TRUE
## 14 d 14 1 FALSE
## 15 e 15 0 FALSE
## 16 e 16 1 FALSE
## 17 f 17 0 FALSE
## 18 f 18 1 FALSE
Select and change values:
design_final[3, "stimulus"] = "f"
design_final
## stimulus item correct_response missing_value
## 1 a 1 0 FALSE
## 2 a 2 1 FALSE
## 3 f 3 0 FALSE
## 4 a 4 1 FALSE
## 5 b 5 0 FALSE
## 6 b 6 1 FALSE
## 7 b 7 0 FALSE
## 8 b 8 1 FALSE
## 9 c 9 0 FALSE
## 10 c 10 1 FALSE
## 11 c 11 0 FALSE
## 12 c 12 1 FALSE
## 13 d 13 NaN TRUE
## 14 d 14 1 FALSE
## 15 e 15 0 FALSE
## 16 e 16 1 FALSE
## 17 f 17 0 FALSE
## 18 f 18 1 FALSE
design_final[design_final$stimulus == "a", "stimulus"] = "f"
design_final
## stimulus item correct_response missing_value
## 1 f 1 0 FALSE
## 2 f 2 1 FALSE
## 3 f 3 0 FALSE
## 4 f 4 1 FALSE
## 5 b 5 0 FALSE
## 6 b 6 1 FALSE
## 7 b 7 0 FALSE
## 8 b 8 1 FALSE
## 9 c 9 0 FALSE
## 10 c 10 1 FALSE
## 11 c 11 0 FALSE
## 12 c 12 1 FALSE
## 13 d 13 NaN TRUE
## 14 d 14 1 FALSE
## 15 e 15 0 FALSE
## 16 e 16 1 FALSE
## 17 f 17 0 FALSE
## 18 f 18 1 FALSE
R provides convenient functions to simulate observations from different distributions.
For example:
Distribution | R Name | Arguments |
---|---|---|
uniform | runif | n, min, max |
normal | rnorm | n, mean, sd |
binomial | rbinom | n, size, prob |
runif(n=10, min=1, max=10)
## [1] 5.558175 1.006282 2.728632 6.323042 8.255786 3.737572 5.240531 8.433854 7.949573 1.578601
x = rnorm(n=20, mean=20, sd=5)
mean(x)
## [1] 19.50263
range(x)
## [1] 11.54756 32.98557
rbinom(n=6, size=1, prob=.6)
## [1] 0 0 1 0 0 1
Task A
participants
, containing
integer numbers from 1 to 20, using c()
and
seq()
.conditions
, of length
20, containing alternating values of “a” and “b” (“a”, “b”, “a”, “b”,
“a”, …), using rep()
.first_half
, containing only the
first half of the participants
vector’s values.participants
and
conditions
have length 20, and that first_half
has length 10, using length()
.conditions
, insert a
missing value. Print the conditions
after the change.participants_cond
, by pasting
together (using paste()
) participants
and
conditions
, separated by _
. So, the first
element should be “1_a”.Task B
my_data
, with as columns the
vectorsparticipants
, conditions
and
participants_cond
that you created before. Print the data
frame to have a look if it worked.response_times
made of 20 samples
from the normal distribution, with mean .8 and standard deviation 1.
Print the data frame to have a look if it worked.response_times
column that are
negative and set them to 0. Print the data frame to have a look if it
worked.log_response_times
, made of
the logarithm of response_times
.correct_response
made of 20 samples
from the binomial distribution, with size 1 and probability of success
.65. Print the data frame to have a look if it worked.data_correct
and
data_incorrect
made of, respectively, the subset of
my_data
where correct_response
is 1, and the
subset of my_data
where correct_response
is 0.
Print the data frame to have a look if it worked. Print the result to
check.Save and email your script to me at laura.fontanesi@unibas.ch by the end of Friday.