Primary exercises
- Dietary intakes. (Create a vector, use it in
calculation.)
Four patients had daily dietary intakes of 2314, 2178, 1922, 2004
kcal.
Make a vector intakesKCal
of these four values.
What is the class of this vector?
Convert the values into in kJ using 1 kcal = 4.184 kJ.
intakesKCal <- c( 2314, 2178, 1922, 2004 )
intakesKCal
[1] 2314 2178 1922 2004
class( intakesKCal )
[1] "numeric"
intakesKCal * 4.184
[1] 9681.776 9112.752 8041.648 8384.736
- More dietary intakes. (Combining/appending/merging
vectors.)
Additional set of intakes is provided: 2122, 2616, NA, 1771 kcal.
Use c()
to append the new intakes after values in
intakesKCal
and store the result in
allIntakesKCal
.
Print the combined vector and print its calculated
length
.
intakesKCal2 <- c( 2122, 2616, NA, 1771 )
allIntakesKCal <- c( intakesKCal, intakesKCal2 )
allIntakesKCal
[1] 2314 2178 1922 2004 2122 2616 NA 1771
length( allIntakesKCal )
[1] 8
- The average and total intakes. (Calculating means and sums,
skipping missing values.)
Calculate mean
intake for patients in vector
intakesKCal
.
Next, calculate mean
intake for patients in vector
allIntakesKCal
.
Can you explain the result?
Check help for ?mean
, in particular the na.rm
argument.
Use the extra argument na.rm=TRUE
to calculate the
mean
of non-NA
elements of
allIntakesKCal
.
Check help for ?sum
how to omit NA
elements in
sum calculation.
Now, calculate the total sum
of allIntakesKCal
intakes ignoring the NA
element.
mean( intakesKCal )
[1] 2104.5
mean( allIntakesKCal )
[1] NA
# since one element is missing, the mean is unknown
# ?mean, adding argument na.rm=TRUE will omit NA elements
mean( allIntakesKCal, na.rm = TRUE )
[1] 2132.429
# ?sum also allows na.rm=TRUE argument to skip NA elements
sum( allIntakesKCal, na.rm = TRUE )
[1] 14927
- Selecting valid intakes. (Selecting non-missing elements;
logical vectors.)
Understand the result of is.na( allIntakesKCal )
.
Now, negate the above result with !
operator.
Use above vectors as argument to sum
to calculate the
number of missing and non-missing elements in
allIntakesKCal
.
Understand allIntakesKCal[ !is.na( allIntakesKCal ) ]
.
is.na( allIntakesKCal ) # TRUE marks positions with missing data
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
!is.na( allIntakesKCal ) # TRUE marks positions with available data
[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
sum( is.na( allIntakesKCal ) ) # number of missing elements
[1] 1
sum( !is.na( allIntakesKCal ) ) # number of non-missing elements
[1] 7
allIntakesKCal[ !is.na( allIntakesKCal ) ] # keeps elements which are not NA
[1] 2314 2178 1922 2004 2122 2616 1771
sum( allIntakesKCal[ !is.na( allIntakesKCal ) ] ) # same as sum( allIntakesKCal, na.rm = TRUE )
[1] 14927
- Generating random kcal intakes. (Generating normally distributed
random numbers; descriptive statistics.)
The code v <- rnorm( 10 )
would sample 10 numbers from
the normal distribution and store them as a vector in
v
.
Print v
. Then repeat v <- rnorm( 10 )
and
print v
again. Has v
changed?
Next, read the manual of rnorm
and find how to generate
random numbers with given mean
and standard deviation
(sd
).
Now, in v
simulate kcal intake by generating 15 random
numbers with mean=2000
and sd=300
.
Print v
and find by eye the smallest and the largest of
these numbers.
Try to use the functions min
and max
on
v
– have you found the same numbers by eye?
Calculate the mean
, median
and the standard
deviation (sd
) of v
.
v <- rnorm( 10 ) # a vector of random numbers
v
[1] -0.08684587 -0.12305461 0.93499636 0.69911305 0.96411882 1.28574192 -0.49437674 1.90782428 -0.20086221 -0.26340391
v <- rnorm( 10 ) # another vector of random numbers
v
[1] -0.17723685 0.64198290 -1.74998522 -0.67957231 1.48208103 -1.23709930 -0.54653992 -1.24279637 0.06258482 1.51158781
v <- rnorm( n = 15, mean = 2000, sd = 300 )
v
[1] 2340.5955 1977.0104 999.0417 2336.3170 1751.8906 1667.8493 2281.3080 1359.0786 2182.9799 2264.8113 1774.7514 1947.8293 1745.1161 1812.9882 2245.1084
min( v )
[1] 999.0417
max( v )
[1] 2340.595
mean( v ) # is it close to 2000? try several random v vectors and see the effect of growing n
[1] 1912.445
median( v )
[1] 1947.829
sd( v ) # is it close to 300? try several random v vectors and see the effect of growing n
[1] 386.8787
- Selecting and counting some kcal intakes. (Selecting elements by
a condition; logical vectors.)
In v
simulate kcal intake by generating 15 random numbers
with mean=2000
and sd=300
.
Type v < 2000
and understand the result.
How to interpret the number produced by
sum( v < 2000 )
?
How to interpret the number produced by
sum( !( v < 2000 ) )
?
v <- rnorm( n = 15, mean = 2000, sd = 300 )
v
[1] 1916.076 1740.645 1586.926 1797.370 1883.537 2220.240 1719.392 2464.522 1889.610 1670.940 2368.731 2298.471 2415.978 2179.834 1944.372
v < 2000 # TRUE corresponds to elements of vector v SMALLER THAN 2000
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE
v[ v < 2000 ] # selected elements of v smaller than 2000
[1] 1916.076 1740.645 1586.926 1797.370 1883.537 1719.392 1889.610 1670.940 1944.372
sum( v < 2000 ) # number of elements in vector v smaller than 2000
[1] 9
sum( !( v < 2000 ) ) # number of elements in vector v GREATER OR EQUAL than 2000
[1] 6
sum( v >= 2000 ) # same as above
[1] 6
- Head and tail.
Often there is a need to quickly look at the beginning
(head
) or at the end (tail
) of a vector.
Try these functions to show the first 5 and the last 7 elements of a
randomly generated vector v <- rnorm( 20 )
.
v <- rnorm( 20 )
v
[1] -0.07448741 1.78763658 -0.24168726 0.22382714 0.90324773 0.85413009 0.98366668 0.34567395 0.56134277 -1.28184912 -0.29374576 1.90902040 -2.21578592
[14] -0.05379028 -0.49776397 1.13799609 -0.27835452 -1.50099517 0.11999368 -0.05882277
head( v, 5 )
[1] -0.07448741 1.78763658 -0.24168726 0.22382714 0.90324773
tail( v, 7 )
[1] -0.05379028 -0.49776397 1.13799609 -0.27835452 -1.50099517 0.11999368 -0.05882277
- Elements of a vector.
Let’s assume that eight persons had caloric intakes of 2122, 2616, NA,
1771, 2314, 2178, 1922, 2004 kcal.
Make a vector intakesKCal
of these eight values (in the
given order).
Use the square brackets to get the 4th element of
intakesKCal
.
Use the square brackets and the colon operator (:
) to get
the elements from the second to the fifth (inclusive).
Define another vector poses
with values 1, 3, 5, 7. Use it
get the 1st, 3rd, 5th and 7th element of intakesKCal
.
Finally, get the 1st, 3rd, 5th and 7th element of
intakesKCal
typing numbers directly inside
[...]
(without using an extra poses
variable).
intakesKCal <- c( 2122, 2616, NA, 1771, 2314, 2178, 1922, 2004 )
intakesKCal
[1] 2122 2616 NA 1771 2314 2178 1922 2004
intakesKCal[ 4 ]
[1] 1771
intakesKCal[ 2:5 ]
[1] 2616 NA 1771 2314
poses <- c(1,3,5,7)
intakesKCal[ poses ]
[1] 2122 NA 2314 1922
intakesKCal[ c(1,3,5,7) ]
[1] 2122 NA 2314 1922