uncertainty-representation.Rmd
This vignette describes how to use the vmc
package to
specify the visual representations used to describe the distribution of
model predictions and observed data. We developed vmc
to
support visual representations of three types: area/extent, visual
variable, and countable icon. For a more general introduction to
vmc
and its use on a standard model check workflow, see
vignette("vmc")
.
The following libraries are required to run this vignette:
library(dplyr)
library(purrr)
library(vmc)
library(ggplot2)
library(ggdist)
library(cowplot)
library(rstan)
library(brms)
library(gganimate)
library(beeswarm)
theme_set(theme_tidybayes() + panel_border())
These options help Stan run faster:
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
We use the built-in model mpg_model
, a
brmsfit
object fitted in Gaussian model family with
push-forward transformations mu
and sigma
, to
demonstrate different visual representations vmc
defines to
show the distribution of model predictions and observed data.
mpg_model
#> Family: gaussian
#> Links: mu = identity; sigma = log
#> Formula: mpg ~ disp + vs + am
#> sigma ~ vs + am
#> Data: mtcars (Number of observations: 32)
#> Draws: 4 chains, each with iter = 6000; warmup = 3000; thin = 1;
#> total post-warmup draws = 12000
#>
#> Regression Coefficients:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept 23.25 2.87 17.57 28.92 1.00 6198 6420
#> sigma_Intercept 0.87 0.20 0.50 1.30 1.00 10336 7471
#> disp -0.02 0.01 -0.04 -0.01 1.00 6404 6672
#> vs 2.74 1.74 -0.73 6.14 1.00 6770 6991
#> am 2.74 1.81 -0.76 6.36 1.00 6100 7119
#> sigma_vs 0.27 0.34 -0.38 0.95 1.00 6544 7801
#> sigma_am 0.34 0.36 -0.38 1.03 1.00 6925 7174
#>
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
vmc
supports a list of visual representations to encode
uncertainty using area/extent variable: - mc_model_slab()
(mc_obs_slab()
) - mc_model_ccdf()
(mc_obs_ccdf()
) - mc_model_cdf()
(mc_obs_cdf()
) - mc_model_eye()
(mc_obs_eye()
) - mc_model_halfeye()
(mc_obs_halfeye()
) - mc_model_histogram()
(mc_obs_histogram()
) -
mc_model_pointinterval()
(mc_obs_pointinterval()
) - mc_model_interval()
(mc_obs_interval()
) - mc_model_lineribbbon()
(mc_obs_lineribbon()
) - mc_model_ribbon()
(mc_obs_ribbon()
)
Examples:
mpg_model %>%
mcplot() +
mc_model_slab(alpha = .5) +
mc_obs_slab(alpha = .5) +
mc_gglayer(coord_flip())
mpg_model %>%
mcplot() +
mc_model_slab(alpha = .5) +
mc_obs_slab(alpha = .5) +
mc_condition_on(x = vars(vs)) +
mc_gglayer(coord_flip())
mpg_model %>%
mcplot() +
mc_model_slab(alpha = .5) +
mc_obs_interval(alpha = .5) +
mc_condition_on(x = vars(vs)) +
mc_gglayer(coord_flip())
#> Warning: Duplicated aesthetics after name standardisation: alpha
mpg_model %>%
mcplot() +
mc_model_lineribbon() +
mc_obs_pointinterval() +
mc_condition_on(x = vars(disp))
vmc
supports a list of visual representation that encode
uncertainty using diffferent visual variables: -
mc_model_gradientinterval()
(mc_obs_gradientinterval()
) -
mc_model_interval()
(mc_obs_interval()
) -
mc_model_lineribbbon()
(mc_obs_lineribbon()
) -
mc_model_ribbon()
(mc_obs_ribbon()
) -
mc_model_tile()
(mc_obs_tile()
)
Examples:
mpg_model %>%
mcplot() +
mc_model_gradientinterval() +
mc_obs_point(shape = '|', size = 10) +
mc_condition_on(x = vars(vs)) +
mc_gglayer(coord_flip())
mpg_model %>%
mcplot() +
mc_model_gradientinterval() +
mc_obs_gradientinterval() +
mc_condition_on(x = vars(vs)) +
mc_layout_nested() +
mc_gglayer(coord_flip())
vmc
supports a list of visual representations that
encode uncertainty using countable icons: - mc_model_dots()
(mc_obs_dots()
) - mc_model_dotsinterval()
(mc_obs_dotsinterval()
) - mc_model_point()
(mc_obs_point()
) - mc_model_line()
(mc_obs_line()
)
Examples:
mpg_model %>%
mcplot() +
mc_model_dots(n_sample = 100) +
mc_obs_dots() +
mc_gglayer(coord_flip())
mpg_model %>%
mcplot() +
mc_model_slab() +
mc_obs_interval() +
mc_obs_dots(side = "left") +
mc_condition_on(x = vars(vs)) +
mc_gglayer(coord_flip())
If your visual representation is not supported directly by
vmc
, you can use mc_model_custom()
and
mc_obs_custom()
to pass in a geom or stat function to
specify the visual representation of model predictions and observed
data. For example, we could use
mc_model_custom(stat_boxplot)
to specify a box plot for
model predictions and mc_obs_custom(geom_swarm)
to specify
a swarm plot for observed data.
mpg_model %>%
mcplot() +
mc_model_custom(stat_boxplot, notch = TRUE) +
mc_obs_custom(geom_swarm) +
mc_condition_on(x = vars(vs)) +
mc_gglayer(coord_flip())
vmc
supports to show the uncertainty information
conditioned on two sources, samples and input data points. Take the slab
representation as example. In the beginning example, we are collapsing
all the samples based on all the input data points together to form one
distribution.
mpg_model %>%
mcplot() +
mc_model_slab(group_sample = "collapse") +
mc_obs_slab() +
mc_gglayer(coord_flip()) +
mc_layout_juxtaposition()
You can also choose to group by the sample when visualizing by slabs (i.e., using individual slabs for each sample).
mpg_model %>%
mcplot() +
mc_model_slab(group_sample = "group", alpha = .2) +
mc_obs_slab() +
mc_gglayer(coord_flip()) +
mc_layout_juxtaposition()
Or you can choose to group on input data point.
mpg_model %>%
mcplot() +
mc_model_slab(group_sample = "group", group_on = "row", alpha = .2) +
mc_obs_slab() +
mc_gglayer(coord_flip()) +
mc_layout_juxtaposition()
#> Warning in layer_slabinterval(data = data, mapping = mapping, stat = StatSlab,
#> : Ignoring unknown parameters: `group_on`
Instead of overlapping all samples or rows together, you can choose to use HOPs to show them each in one time frame.
mpg_model %>%
mcplot() +
mc_draw(ndraws = 50) +
mc_model_slab(group_sample = "hops") +
mc_obs_slab(alpha = .5) +
mc_gglayer(coord_flip())
#> `nframes` and `fps` adjusted to match transition
You can also aggregate the model predictions from samples or input data points by functions.
mpg_model %>%
mcplot() +
mc_model_slab(group_sample = mean) +
mc_obs_slab(alpha = .5) +
mc_gglayer(coord_flip())
mpg_model %>%
mcplot() +
mc_observation_transformation(mean) +
mc_model_slab(group_sample = mean, group_on = "row") +
mc_obs_reference_line() +
mc_gglayer(coord_flip())
#> Warning in layer_slabinterval(data = data, mapping = mapping, stat = StatSlab,
#> : Ignoring unknown parameters: `group_on`