Declaring conditions in the multiverse
Abhraneel Sarma
2024-10-06
Source:vignettes/conditions.Rmd
conditions.Rmd
library(knitr)
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
library(gganimate)
library(cowplot)
library(multiverse)
The data
We will be using the same data as the vignette: A complete
implementation of a multiverse analysis (see
vignette("complete-multiverse-analysis")
).
data("durante")
df_durante <- durante |>
mutate(
Abortion = abs(7 - Abortion) + 1,
StemCell = abs(7 - StemCell) + 1,
Marijuana = abs(7 - Marijuana) + 1,
RichTax = abs(7 - RichTax) + 1,
StLiving = abs(7 - StLiving) + 1,
Profit = abs(7 - Profit) + 1,
FiscConsComp = FreeMarket + PrivSocialSec + RichTax + StLiving + Profit,
SocConsComp = Marriage + RestrictAbortion + Abortion + StemCell + Marijuana
)
Specifying conditions in the multiverse analysis
In a multiverse analysis, it may occur that the value of one variable
might depend on the value of another variable defined previously. For
example, in our example, we are excluding participants based on their
cycle length. This can be done in two ways: we can use the
values of the variable,ComputedCycleLength
or
ReportedCycleLength
. If we are using
ComputedCycleLength
to exclude participants, this means
that we should not calculate the variable
NextMenstrualOnset
(date for the onset of the next
menstrual cycle) using the ReportedCycleLength
value.
Similarly, if we are using ReportedCycleLength
to exclude
participants it is inconsistent to calculate
NextMenstrualOnset
using
ComputedCycleLength
.
We should be able to express these conditions in the multiverse. We
can do this in two ways: 1. %when%
: when declaring a
branch, we can use this operator to specify the conditional
as A %when% B
. The conditional
is also referred to as the connective
.
This has the meaning “if A is true, then B is also true” and is an
abbreviation for
2. branch_assert
: this function allows the user to specify
any condition in the form of a logical operation
The %when%
operator
There are two ways in which you can specify the %when%
operator. The first is to specify it at the end of the branch. This will
work even if you omit the branch option name.
df <- df |>
mutate(NextMenstrualOnset = branch(menstrual_calculation,
"mc_option1" ~ (StartDateofLastPeriod + ComputedCycleLength) %when% (cycle_length != "cl_option3"),
"mc_option2" ~ (StartDateofLastPeriod + ReportedCycleLength) %when% (cycle_length != "cl_option2"),
"mc_option3" ~ StartDateNext)
)
The other is to specify it at the head of the branch, right after the option name:
df |>
mutate(NextMenstrualOnset = branch(menstrual_calculation,
"mc_option1" %when% (cycle_length != "cl_option3") ~ StartDateofLastPeriod + ComputedCycleLength,
"mc_option2" %when% (cycle_length != "cl_option2") ~ (StartDateofLastPeriod + ReportedCycleLength),
"mc_option3" ~ StartDateNext)
)
Note: In this example we will be using the
inside
syntax to enter code into the multiverse, instead of
multiverse code blocks to highlight the syntax of the
conditional declaration as multiverse code blocks shows the
code for a universe, and hides the actual code declared by the user.
We can write the complete analysis by specifying the condition with the %when% operator as:
M = multiverse()
inside(M, {
df.1 <- df_durante |>
mutate( ComputedCycleLength = StartDateofLastPeriod - StartDateofPeriodBeforeLast ) |>
dplyr::filter( branch(cycle_length,
"cl_option1" ~ TRUE,
"cl_option2" ~ ComputedCycleLength > 25 & ComputedCycleLength < 35,
"cl_option3" ~ ReportedCycleLength > 25 & ReportedCycleLength < 35
)) |>
mutate(NextMenstrualOnset = branch(menstrual_calculation,
"mc_option1" %when% (cycle_length != "cl_option3") ~ StartDateofLastPeriod + ComputedCycleLength,
"mc_option2" %when% (cycle_length != "cl_option2") ~ StartDateofLastPeriod + ReportedCycleLength,
"mc_option3" ~ StartDateNext)
)
})
In multiverse code chunks
conditionals can be declared
in pretty much the same way:
```{multiverse default-m-1, inside = M}
df <- df_durante |>
mutate( ComputedCycleLength = StartDateofLastPeriod - StartDateofPeriodBeforeLast ) |>
dplyr::filter( branch(cycle_length,
"cl_option1" ~ TRUE,
"cl_option2" ~ ComputedCycleLength > 25 & ComputedCycleLength < 35,
"cl_option3" ~ ReportedCycleLength > 25 & ReportedCycleLength < 35
)) |>
mutate(NextMenstrualOnset = branch(menstrual_calculation,
"mc_option1" %when% (cycle_length != "cl_option3") ~ StartDateofLastPeriod + ComputedCycleLength,
"mc_option2" %when% (cycle_length != "cl_option2") ~ StartDateofLastPeriod + ReportedCycleLength,
"mc_option3" ~ StartDateNext)
)
```
Note: in this vignette we used the script-oriented
inside()
function for implementing the multiverse. However,
we can implement the exact same multiverse in RMarkdown using the
multiverse-code-block
for more interactive programming. To
implement this using a multiverse-code-block
, we can simply
place the code passed into the inside function (the second argument)
inside a code block of type multiverse
, provide it with the
appropriate labels and multiverse object, and execute it. See
(multiverse-in-rmd) and (branch) for more details and examples.
As the condition implies, the parameter
menstrual_calcaultion
(which is for calculating the
variable NextMenstrualCalculation
) cannot take the value of
“mc_option1” when we filter cycle_length
using
“cl_option3”. Similarly, menstrual_calcaultion
cannot take
the value of “mc_option2” when we filter cycle_length
using
“cl_option2”. In the multiverse table below, you can see that those two
parameter combinations are absent (universes indexed 3 and 5.
expand(M)
## # A tibble: 7 × 7
## .universe cycle_length menstrual_calculation .parameter_assignment
## <int> <chr> <chr> <list>
## 1 1 cl_option1 mc_option1 <named list [2]>
## 2 2 cl_option1 mc_option2 <named list [2]>
## 3 3 cl_option1 mc_option3 <named list [2]>
## 4 4 cl_option2 mc_option1 <named list [2]>
## 5 5 cl_option2 mc_option3 <named list [2]>
## 6 6 cl_option3 mc_option2 <named list [2]>
## 7 7 cl_option3 mc_option3 <named list [2]>
## # ℹ 3 more variables: .code <list>, .results <list>, .errors <list>
The branch_assert()
function
The same can be performed with the branch_assert()
function. The benefit of using this is that within the branch assert
function, the user can specify any logical operation.
For eg: the above logical operations can be specified as:
branch_assert((menstrual_calculation != "mc_option1" | cycle_length != "cl_option3"))
Both these operations have the same result, but the first may not be
as easily interpretable. We specify the conditionals using the
branch_assert()
function in our example as:
df |>
branch_assert( (menstrual_calculation != "mc_option1" | (cycle_length != "cl_option3")) ) |>
branch_assert( (menstrual_calculation != "mc_option2" | (cycle_length != "cl_option2")) )
Using the branch_assert()
, we can perform the exact same
analysis:
M = multiverse()
inside(M, {
df.2 <- df_durante |>
mutate( ComputedCycleLength = StartDateofLastPeriod - StartDateofPeriodBeforeLast ) |>
filter( branch(cycle_length,
"cl_option1" ~ TRUE,
"cl_option2" ~ ComputedCycleLength > 25 & ComputedCycleLength < 35,
"cl_option3" ~ ReportedCycleLength > 25 & ReportedCycleLength < 35
)) |>
mutate(NextMenstrualOnset = branch(menstrual_calculation,
"mc_option1" ~ StartDateofLastPeriod + ComputedCycleLength,
"mc_option2" ~ StartDateofLastPeriod + ReportedCycleLength,
"mc_option3" ~ StartDateNext)
) |>
branch_assert( (menstrual_calculation != "mc_option1" | (cycle_length != "cl_option3")) ) |>
branch_assert( (menstrual_calculation != "mc_option2" | (cycle_length != "cl_option2")) )
})
## Warning in FUN(X[[i]], ...): error in universe 1
##
## Error in FUN(X[[i]], ...) : object 'df.2' not found
## base::tryCatch -> tryCatchOne -> tryCatchList -> base::withCallingHandlers -> base::saveRDS -> base::do.call -> -> -> rmarkdown::render -> knitr::knit -> process_file -> xfun:::handle_error -> withCallingHandlers -> process_group -> call_block -> block_exec -> eng_r -> in_input_dir -> in_dir -> evaluate -> evaluate::evaluate -> withRestarts -> withRestartList -> withOneRestart -> withRestartList -> withOneRestart -> withRestartList -> withOneRestart -> with_handlers -> eval -> eval -> withCallingHandlers -> withVisible -> eval -> eval -> inside -> execute_universe -> lapply -> FUN -> tryStack -> lapply -> FUN -> FUN -> FUN(X[[i]], ...)
As we can see, this results in the same multiverse, where universes indexed 3 and 5 are not compatible.
expand(M)
## # A tibble: 7 × 7
## .universe cycle_length menstrual_calculation .parameter_assignment
## <int> <chr> <chr> <list>
## 1 1 cl_option1 mc_option1 <named list [2]>
## 2 2 cl_option1 mc_option2 <named list [2]>
## 3 3 cl_option1 mc_option3 <named list [2]>
## 4 4 cl_option2 mc_option1 <named list [2]>
## 5 5 cl_option2 mc_option3 <named list [2]>
## 6 6 cl_option3 mc_option2 <named list [2]>
## 7 7 cl_option3 mc_option3 <named list [2]>
## # ℹ 3 more variables: .code <list>, .results <list>, .errors <list>
Implementing conditions in a complete analysis
Specifying these conditions allows us to exclude inconsistent combinations from our analyses. Let’s update the example from home page by including these conditions:
M = multiverse()
```{multiverse default-m-2, inside = M, echo = FALSE}
df <- df_durante |>
mutate( ComputedCycleLength = StartDateofLastPeriod - StartDateofPeriodBeforeLast ) |>
dplyr::filter( branch(cycle_length,
"cl_option1" ~ TRUE,
"cl_option2" ~ ComputedCycleLength > 25 & ComputedCycleLength < 35,
"cl_option3" ~ ReportedCycleLength > 25 & ReportedCycleLength < 35
)) |>
dplyr::filter( branch(certainty,
"cer_option1" ~ TRUE,
"cer_option2" ~ Sure1 > 6 | Sure2 > 6
)) |>
mutate(NextMenstrualOnset = branch(menstrual_calculation,
"mc_option1" %when% (cycle_length != "cl_option3") ~ StartDateofLastPeriod + ComputedCycleLength,
"mc_option2" %when% (cycle_length != "cl_option2") ~ StartDateofLastPeriod + ReportedCycleLength,
"mc_option3" ~ StartDateNext)
) |>
mutate(
CycleDay = 28 - (NextMenstrualOnset - DateTesting),
CycleDay = ifelse(CycleDay > 1 & CycleDay < 28, CycleDay, ifelse(CycleDay < 1, 1, 28))
) |>
mutate( Fertility = branch( fertile,
"fer_option1" ~ factor( ifelse(CycleDay >= 7 & CycleDay <= 14, "high", ifelse(CycleDay >= 17 & CycleDay <= 25, "low", NA)) ),
"fer_option2" ~ factor( ifelse(CycleDay >= 6 & CycleDay <= 14, "high", ifelse(CycleDay >= 17 & CycleDay <= 27, "low", NA)) ),
"fer_option3" ~ factor( ifelse(CycleDay >= 9 & CycleDay <= 17, "high", ifelse(CycleDay >= 18 & CycleDay <= 25, "low", NA)) ),
"fer_option4" ~ factor( ifelse(CycleDay >= 8 & CycleDay <= 14, "high", "low") ),
"fer_option45" ~ factor( ifelse(CycleDay >= 8 & CycleDay <= 17, "high", "low") )
)) |>
mutate(RelationshipStatus = branch(relationship_status,
"rs_option1" ~ factor(ifelse(Relationship==1 | Relationship==2, 'Single', 'Relationship')),
"rs_option2" ~ factor(ifelse(Relationship==1, 'Single', 'Relationship')),
"rs_option3" ~ factor(ifelse(Relationship==1, 'Single', ifelse(Relationship==3 | Relationship==4, 'Relationship', NA))) )
)
```
After excluding the inconsistent choice combinations, choice combinations remain:
## [1] 210
Now, we’ve created the complete multiverse that was presented as example #2 from Steegen et al.’s paper.