Vectors (introduction) (solutions)

Primary exercises

Dietary intakes. (Create a vector, use it in calculation.)
Four patients had daily dietary intakes of 2314, 2178, 1922, 2004 kcal.
Make a vector intakesKCal of these four values.
What is the class of this vector?
Convert the values into in kJ using 1 kcal = 4.184 kJ.

intakesKCal <- c( 2314, 2178, 1922, 2004 )
intakesKCal

[1] 2314 2178 1922 2004

class( intakesKCal )

[1] "numeric"

intakesKCal * 4.184

[1] 9681.776 9112.752 8041.648 8384.736

More dietary intakes. (Combining/appending/merging vectors.)
Additional set of intakes is provided: 2122, 2616, NA, 1771 kcal.
Use c() to append the new intakes after values in intakesKCal and store the result in allIntakesKCal.
Print the combined vector and print its calculated length.

intakesKCal2 <- c( 2122, 2616, NA, 1771 )
allIntakesKCal <- c( intakesKCal, intakesKCal2 )
allIntakesKCal

[1] 2314 2178 1922 2004 2122 2616   NA 1771

length( allIntakesKCal )

[1] 8

The average and total intakes. (Calculating means and sums, skipping missing values.)
Calculate mean intake for patients in vector intakesKCal.
Next, calculate mean intake for patients in vector allIntakesKCal.
Can you explain the result?
Check help for ?mean, in particular the na.rm argument.
Use the extra argument na.rm=TRUE to calculate the mean of non-NA elements of allIntakesKCal.
Check help for ?sum how to omit NA elements in sum calculation.
Now, calculate the total sum of allIntakesKCal intakes ignoring the NA element.

mean( intakesKCal )

[1] 2104.5

mean( allIntakesKCal )

[1] NA

# since one element is missing, the mean is unknown
# ?mean, adding argument na.rm=TRUE will omit NA elements
mean( allIntakesKCal, na.rm = TRUE )

[1] 2132.429

# ?sum also allows na.rm=TRUE argument to skip NA elements
sum( allIntakesKCal, na.rm = TRUE )

[1] 14927

Selecting valid intakes. (Selecting non-missing elements; logical vectors.)
Understand the result of is.na( allIntakesKCal ).
Now, negate the above result with ! operator.
Use above vectors as argument to sum to calculate the number of missing and non-missing elements in allIntakesKCal.
Understand allIntakesKCal[ !is.na( allIntakesKCal ) ].

is.na( allIntakesKCal )         # TRUE marks positions with missing data

[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

!is.na( allIntakesKCal )        # TRUE marks positions with available data

[1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE

sum( is.na( allIntakesKCal ) )                # number of missing elements

[1] 1

sum( !is.na( allIntakesKCal ) )               # number of non-missing elements

[1] 7

allIntakesKCal[ !is.na( allIntakesKCal ) ]    # keeps elements which are not NA

[1] 2314 2178 1922 2004 2122 2616 1771

sum( allIntakesKCal[ !is.na( allIntakesKCal ) ] )    # same as sum( allIntakesKCal, na.rm = TRUE )

[1] 14927

Generating random kcal intakes. (Generating normally distributed random numbers; descriptive statistics.)
The code v <- rnorm( 10 ) would sample 10 numbers from the normal distribution and store them as a vector in v.
Print v. Then repeat v <- rnorm( 10 ) and print v again. Has v changed?
Next, read the manual of rnorm and find how to generate random numbers with given mean and standard deviation (sd).
Now, in v simulate kcal intake by generating 15 random numbers with mean=2000 and sd=300.
Print v and find by eye the smallest and the largest of these numbers.
Try to use the functions min and max on v – have you found the same numbers by eye?
Calculate the mean, median and the standard deviation (sd) of v.

v <- rnorm( 10 ) # a vector of random numbers
v

 [1] -0.08014027  0.02781421 -1.14643535 -1.37640016  1.04083668 -0.45170997 -0.09980696 -0.88568256 -0.17339554  0.92416339

v <- rnorm( 10 ) # another vector of random numbers
v

 [1]  1.0508364  0.9383004 -1.7972574 -0.4477479 -0.4582001 -0.1555337 -0.6572334  0.2961003  0.6409897  1.0179417

v <- rnorm( n = 15, mean = 2000, sd = 300 )
v

 [1] 1674.487 1820.581 2202.785 1740.477 1732.053 2495.922 2048.361 2057.338 2077.960 1728.177 2195.785 1620.888 2332.465 2460.538 1968.616

min( v )

[1] 1620.888

max( v )

[1] 2495.922

mean( v )    # is it close to 2000? try several random v vectors and see the effect of growing n

[1] 2010.429

median( v )

[1] 2048.361

sd( v )      # is it close to 300? try several random v vectors and see the effect of growing n

[1] 287.1024

Selecting and counting some kcal intakes. (Selecting elements by a condition; logical vectors.)
In v simulate kcal intake by generating 15 random numbers with mean=2000 and sd=300.
Type v < 2000 and understand the result.
How to interpret the number produced by sum( v < 2000 )?
How to interpret the number produced by sum( !( v < 2000 ) )?

v <- rnorm( n = 15, mean = 2000, sd = 300 )
v

 [1] 2011.623 2468.275 1659.662 2692.553 1652.411 1808.490 2355.260 2104.421 2339.715 2077.445 1653.405 1702.933 1802.469 2348.825 1552.633

v < 2000             # TRUE corresponds to elements of vector v SMALLER THAN 2000

 [1] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE

v[ v < 2000 ]        # selected elements of v smaller than 2000

[1] 1659.662 1652.411 1808.490 1653.405 1702.933 1802.469 1552.633

sum( v < 2000 )      # number of elements in vector v smaller than 2000

[1] 7

sum( !( v < 2000 ) ) # number of elements in vector v GREATER OR EQUAL than 2000

[1] 8

sum( v >= 2000 )     # same as above

[1] 8

Head and tail.
Often there is a need to quickly look at the beginning (head) or at the end (tail) of a vector.
Try these functions to show the first 5 and the last 7 elements of a randomly generated vector v <- rnorm( 20 ).

v <- rnorm( 20 )
v

 [1] -0.10178879  1.87577347 -1.24796881 -0.22497185  0.94161188 -0.65941366 -0.23865700  0.88255762  0.66689656 -0.91123660  0.36221385
[12] -0.68092834  0.46454017  0.12073366 -0.58593021  1.25987244 -2.38299901  1.05236804 -0.35539605  0.03865801

head( v, 5 )

[1] -0.1017888  1.8757735 -1.2479688 -0.2249719  0.9416119

tail( v, 7 )

[1]  0.12073366 -0.58593021  1.25987244 -2.38299901  1.05236804 -0.35539605  0.03865801

Elements of a vector.
Let’s assume that eight persons had caloric intakes of 2122, 2616, NA, 1771, 2314, 2178, 1922, 2004 kcal.
Make a vector intakesKCal of these eight values (in the given order).
Use the square brackets to get the 4th element of intakesKCal.
Use the square brackets and the colon operator (:) to get the elements from the second to the fifth (inclusive).
Define another vector poses with values 1, 3, 5, 7. Use it get the 1st, 3rd, 5th and 7th element of intakesKCal.
Finally, get the 1st, 3rd, 5th and 7th element of intakesKCal typing numbers directly inside [...] (without using an extra poses variable).

intakesKCal <- c( 2122, 2616, NA, 1771, 2314, 2178, 1922, 2004 )
intakesKCal

[1] 2122 2616   NA 1771 2314 2178 1922 2004

intakesKCal[ 4 ]

[1] 1771

intakesKCal[ 2:5 ]

[1] 2616   NA 1771 2314

poses <- c(1,3,5,7)
intakesKCal[ poses ]

[1] 2122   NA 2314 1922

intakesKCal[ c(1,3,5,7) ]

[1] 2122   NA 2314 1922

Extra exercises

Sequences of numbers.
Read help (see Help pane) about seq function.
Use it to generate sequence: 10, 7, 4, 1, -2, -5.
Understand the error message of seq( 10, -5, 3 ).

seq( 10, -5, -3 )

[1] 10  7  4  1 -2 -5

seq( from = 10, to = -5, by = -3 )

[1] 10  7  4  1 -2 -5

Repetitions.
Read help (see Help pane) about rep function.
Use it to generate sequence: 0, 0, 1, 0, 0, 1, 0, 0, 1.

rep( c( 0, 0, 1 ), 3 )

[1] 0 0 1 0 0 1 0 0 1

1380 2589 1586 2622 2849 2226 3. Type conversion to a character vector.
Sometimes it is necessary to convert a numerical vector to a character vector.
Understand what the function as.character does for argument 1:5.

1:5

[1] 1 2 3 4 5

as.character( 1:5 )

[1] "1" "2" "3" "4" "5"

class( 1:5 )

[1] "integer"

class( as.character( 1:5 ) )

[1] "character"

Type conversion to a numerical vector.
Sometimes it is necessary to convert a character vector to a numerical vector.
Understand what the function as.numeric does for argument c( "1", "-1", "x" ).
Note the warning message. Why is there NA?

as.numeric( c( "1", "-1", "x" ) )

Warning: NAs introduced by coercion

[1]  1 -1 NA

Naming vector elements.
It is possible to give names to vector elements.
Define ages <- c( Amy = 10, 'Dan' = 6, "Eve" = 11, "Eve 2" = 3, Grandma = NA ).
Print ages and understand names( ages ).
Use square brackets to access age of Dan. Try also for Eve 2.

ages <- c( Amy = 10, 'Dan' = 6, "Eve" = 11, "Eve 2" = 3, Grandma = NA )
ages

    Amy     Dan     Eve   Eve 2 Grandma 
     10       6      11       3      NA

names( ages )

[1] "Amy"     "Dan"     "Eve"     "Eve 2"   "Grandma"

ages[ 'Dan' ]

Dan 
  6

ages[ 'Eve 2' ]

Eve 2 
    3

# Another way to create a vector with named elements
ages2 <- c( 10, 6, 11, 3, NA )
names( ages2 ) <- c( "Amy", "Dan", "Eve", "Eve 2", "Grandma" )
ages2

    Amy     Dan     Eve   Eve 2 Grandma 
     10       6      11       3      NA

(ADV) Write a text vector to a file and read it back.
This exercise demonstrates writing a single-column vector (later multicolumn tables will be discussed).
First choose a name for the file (e.g. test.txt) and store it in the variable fileName.
Next, create a character/text vector v with several text elements.
Check manual for writeLines and try writeLines(v) to see in the console what will be written to a file.
Now, set the argument con = fileName and write to the file.
Use readLines( con = fileName ) to read the file and put it back to variable w.
Understand identical( v, w ).

fileName <- "test.txt"
v <- c( "First line", "Second", "Third", "4th", "5th", "6th" )
v

[1] "First line" "Second"     "Third"      "4th"        "5th"        "6th"

writeLines( v )                     # writes to the console

First line
Second
Third
4th
5th
6th

writeLines( v, con = fileName )     # writes to a file
w <- readLines( con = fileName )
identical( v, w )   # checks whether v and w are exactly equal

[1] TRUE

unlink( fileName )  # removes the file

(ADV) Write/read a numerical vector; problems.
In the previous exercise change v to be a vector of some numbers.
Use as.character to make writeLines work (do not change v).
Why identical( v, w ) fails? Check class(v) and class(w).
What conversions of w would be needed to make identical work?

fileName <- "test.txt"
v <- sample( 1:100, 10 )
v

 [1]  73  57 100  59  18  52  34  13  48  11

writeLines( as.character( v ) )                 # conversion to character needed

writeLines( as.character( v ), con = fileName )
w <- readLines( con = fileName )
identical( v, w )     # numbers are not the same as their text representation

[1] FALSE

w <- as.numeric( w )
identical( v, w )     # still not identical; class(v) is different than class(w)

[1] FALSE

w <- as.integer( w )
identical( v, w )     # now identical

[1] TRUE

unlink( fileName )    # removes the file

(ADV) Merging data from corresponding vectors.
Let’s assume that we have data on incomes and spendings of several persons.
The data are provided in three vectors: nms, incomes and spendings (as shown below).
One person is described by corresponding elements of the three vectors.
Find a way to calculate:
- balances: (income minus spending) for each person;
- name of the person with the largest balance;
- sort balances in descending order and print the names of persons corresponding to this order.
Hints: which.max, names, sort, decreasing.

nms <- c( "Amy", "Bob", "Carl", "Dany", "Ela", "Fred" )
incomes <- c( 1380, 2589, 1586, 2622, 2849, 2226 )
spendings <- c( 1198, 2111, 1224, 780, 3266, 2200 )

balance <- incomes - spendings
balance

[1]  182  478  362 1842 -417   26

max(balance)

[1] 1842

which.max(balance)

[1] 4

nms[which.max(balance)]

[1] "Dany"

names(balance) <- nms
balance

 Amy  Bob Carl Dany  Ela Fred 
 182  478  362 1842 -417   26

sort(balance)

 Ela Fred  Amy Carl  Bob Dany 
-417   26  182  362  478 1842

sort(balance, decreasing = TRUE)

Dany  Bob Carl  Amy Fred  Ela 
1842  478  362  182   26 -417

names(sort(balance, decreasing = TRUE))

[1] "Dany" "Bob"  "Carl" "Amy"  "Fred" "Ela"

↑ Lecture ⇄ Practice