Primary exercises
- Dietary intakes.
Four patients had daily dietary intakes of 2314, 2178, 1922, 2004
kcal.
Make a vector intakesKCal
of these four values.
What is the class of this vector?
Convert the values into in kJ using 1 kcal = 4.184 kJ.
intakesKCal <- c( 2314, 2178, 1922, 2004 )
intakesKCal
[1] 2314 2178 1922 2004
class( intakesKCal )
[1] "numeric"
intakesKCal * 4.184
[1] 9681.776 9112.752 8041.648 8384.736
- Combining (appending) vectors.
Additional set of intakes is provided: 2122, 2616, NA, 1771 kcal.
Use c()
to append the new intakes after values in
intakesKCal
and store the result in
allIntakesKCal
.
Print the combined vector and print its calculated
length
.
intakesKCal2 <- c( 2122, 2616, NA, 1771 )
allIntakesKCal <- c( intakesKCal, intakesKCal2 )
allIntakesKCal
[1] 2314 2178 1922 2004 2122 2616 NA 1771
length( allIntakesKCal )
[1] 8
- Mean and sum.
Calculate mean
intake for patients in vector
intakesKCal
.
Next, calculate mean
intake for patients in vector
allIntakesKCal
.
Can you explain the result?
Check help for ?mean
, in particular the na.rm
argument.
Use the extra argument na.rm=TRUE
to calculate the
mean
of non-NA
elements of
allIntakesKCal
.
Check help for ?sum
how to omit NA
elements in
sum calculation.
Now, calculate the total sum
of allIntakesKCal
intakes ignoring the NA
element.
mean( intakesKCal )
[1] 2104.5
mean( allIntakesKCal )
[1] NA
# since one element is missing, the mean is unknown
# ?mean, adding argument na.rm=TRUE will omit NA elements
mean( allIntakesKCal, na.rm = TRUE )
[1] 2132.429
# ?sum also allows na.rm=TRUE argument to skip NA elements
sum( allIntakesKCal, na.rm = TRUE )
[1] 14927
- Selecting and counting (non)missing elements.
Understand the result of is.na( allIntakesKCal )
.
Now, negate the above result with !
operator.
Use above vectors as argument to sum
to calculate the
number of missing and non-missing elements in
allIntakesKCal
.
Understand allIntakesKCal[ !is.na( allIntakesKCal ) ]
.
is.na( allIntakesKCal ) # TRUE marks positions with missing data
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
!is.na( allIntakesKCal ) # TRUE marks positions with available data
[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
sum( is.na( allIntakesKCal ) ) # number of missing elements
[1] 1
sum( !is.na( allIntakesKCal ) ) # number of non-missing elements
[1] 7
allIntakesKCal[ !is.na( allIntakesKCal ) ] # keeps elements which are not NA
[1] 2314 2178 1922 2004 2122 2616 1771
sum( allIntakesKCal[ !is.na( allIntakesKCal ) ] ) # same as sum( allIntakesKCal, na.rm = TRUE )
[1] 14927
- Descriptive statistics of a vector; normally distributed random
numbers.
The code v <- rnorm( 10 )
would sample 10 numbers from
the normal distribution and store them as a vector in
v
.
Print v
. Then repeat v <- rnorm( 10 )
and
print v
again. Has v
changed? Print
v
and find by eye the smallest and the largest of these
numbers.
Try to use the functions min
and max
on
v
– have you found the same numbers?
Calculate the mean
, median
and the standard
deviation (sd
) of v
.
v <- rnorm( 10 ) # a vector of random numbers
v
[1] 1.0511405 2.2616264 0.9624446 -0.2308904 -0.1442357 2.1462364 0.4030651 -0.8069300 -1.6214931 0.5083432
v <- rnorm( 10 ) # another vector of random numbers
v
[1] -0.6114128 0.3804211 1.1583897 -0.0909023 -0.2836115 0.3570499 0.7524309 -1.0042217 1.6144101 0.2439055
min( v )
[1] -1.004222
max( v )
[1] 1.61441
mean( v )
[1] 0.2516459
median( v )
[1] 0.3004777
sd( v )
[1] 0.7946881
- Selecting and counting elements by a condition. Type
v < 0
and understand the result.
How to interpret the number produced by sum( v < 0 )
?
How to interpret the number produced by
sum( !( v < 0 ) )
?
v < 0 # TRUE corresponds to elements of vector v smaller than 0
[1] TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE
sum( v < 0 ) # calculates the number of negative elements in vector v
[1] 4
sum( !( v < 0 ) ) # calculates the number of non-negative (so, positive OR ZERO) elements in vector v
[1] 6
sum( v >= 0 ) # same as above
[1] 6
- Head and tail.
Often there is a need to quickly look at the beginning
(head
) or at the end (tail
) of a vector.
Try these functions to show the first 5 and the last 7 elements of a
randomly generated vector v <- rnorm( 20 )
.
v <- rnorm( 20 )
v
[1] -1.6516896 -0.5787457 -0.7727924 -1.7989574 1.2496351 0.3954721 2.0918303 1.9987067 1.3590783 -0.1880865 -0.8593891 -0.5041967 -0.6520981 0.2147365
[15] -0.4676657 0.9303386 1.9522221 -0.2483244 -0.4018048 -1.1981813
head( v, 5 )
[1] -1.6516896 -0.5787457 -0.7727924 -1.7989574 1.2496351
tail( v, 7 )
[1] 0.2147365 -0.4676657 0.9303386 1.9522221 -0.2483244 -0.4018048 -1.1981813