Primary exercises

Apply the following to survey data:

  1. Select personal information {name, age, gender, height} into a new tibble survey_personal_info.

  2. Select personal information as previous exercise into a new tibble survey_personal_info but with variable names initials in uppercase, e.g. Name, Age etc.

  3. Reorder the variables in survey dataset as such that name,age and gender appear as first, second and the third column followed by the remaining variables.

  4. Deselect variables that relate to hand and/or arm (e.g. span1, span2, hand, etc.). See also description survey data.

  5. Select the top 20 names along with gender.

  6. Reproduce the following tibbles (note that variables are renamed and reshuffled):

    6.1 First 5 observations.

    # A tibble: 5 × 13
      SPAN1 SPAN2 name   gender hand  fold    pulse clap    exercise smokes height m.i        age
      <dbl> <dbl> <chr>  <chr>  <chr> <chr>   <dbl> <chr>   <chr>    <chr>   <dbl> <chr>    <dbl>
    1  18.5  18   Alyson female right right      92 left    some     never    173  metric    18.2
    2  19.5  20.5 Todd   male   left  right     104 left    none     regul    178. imperial  17.6
    3  18    13.3 Gerald male   right left       87 neither none     occas     NA  <NA>      16.9
    4  18.8  18.9 Robert male   right right      NA neither none     never    160  metric    20.3
    5  20    20   Dustin male   right neither    35 right   some     never    165  metric    23.7

    6.1 Last 3 observations.

    # A tibble: 3 × 13
      Hand  Fold  Clap  name   gender span1 span2 pulse exercise smokes height m.i      age
      <chr> <chr> <chr> <chr>  <chr>  <dbl> <dbl> <dbl> <chr>    <chr>   <dbl> <chr>  <dbl>
    1 right right right Tracey female  17.5  16.5    NA some     never    170  metric  18.6
    2 right right right Keith  male    21    21.5    90 some     never    183  metric  17.2
    3 right right right Celina female  17.6  17.3    85 freq     never    168. metric  17.8

Extra exercises

  1. Rename the m.i variable to system.

  2. Select name along with all categorical variables into a new tibble survey_cats.

  3. Create a new tibble survey_nums with name and all numerical variables.

  4. For this exercise you’ll need an additional helper function where explained
    here.

    4.1 Reproduce the result from the previous exercise (3) without dictating all numerical variable names. Hint: you’ll also need is.numeric function (see ?is.numeric for help).

    4.2 Select all non-numerical variables.

Selection by pattern matching

In data sets with large number of variables, finding variables will become tedious. Several helper functions are available to speed up the variable name search.

starts_with(), ends_with() and contains()

The functions help to find fixed patterns in variable names:

The helper functions can be used with logical operators {!,|,&} which will be explained later. You have already encountered one in the lecture on Useful R functions, !, the negation operator. In short it complements the results. For example, above we could select variables which started with character ‘a’ with select(pulse, starts_with("a")) which resulted into a tibble with the two variables age and alcohol. Using ! in front of the helper function in the expression will produce the complement of the previous result, namely all variables that do not start with a:

Note that age and alcohol do not occur in the result.

There are several other helper functions which fall beyond the scope of this lecture, visit here for more details.

  1. Select variables, from survey data, by pattern matching.

    5.1 Select variables that end with ‘e’.

    5.2 Select variables that start with ‘s’.

    5.3 Select hand span variables using a helper function.



Copyright © 2024 Biomedical Data Sciences (BDS) | LUMC