comparative-layout.Rmd
This vignette describes how to use the vmc
package to
specify a comparative layout that organizes the visual representation of
model predictions and observed data into a layout to enhance visual
comparison. We developed vmc
to support comparative layouts
of three types: juxtaposition, superposition, and explicit encoding. For
a more general introduction to vmc
and demonstration of its
use in a standard model check workflow, see
vignette("vmc")
.
The following libraries are required to run this vignette:
library(dplyr)
library(purrr)
library(vmc)
library(ggplot2)
library(ggdist)
library(cowplot)
library(rstan)
library(brms)
library(gganimate)
theme_set(theme_tidybayes() + panel_border())
These options help Stan run faster:
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
<<<<<<< HEAD We use the built-in model
mpg_model
, a brmsfit
object fitted in Gaussian
model family with push-forward transformations mu
and
sigma
, to demonstrate different visual representations
vmc
defines to show the distributions in model and observed
data. ======= We use the built-in model mpg_model
, a
brmsfit
object fitted in Gaussian model family with
checkable push-forward transformations mu
and
sigma
, to demonstrate different visual representations that
vmc
defines to show the distributions of model predictions
and observed data. >>>>>>> main
mpg_model
#> Family: gaussian
#> Links: mu = identity; sigma = log
#> Formula: mpg ~ disp + vs + am
#> sigma ~ vs + am
#> Data: mtcars (Number of observations: 32)
#> Draws: 4 chains, each with iter = 6000; warmup = 3000; thin = 1;
#> total post-warmup draws = 12000
#>
#> Regression Coefficients:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept 23.25 2.87 17.57 28.92 1.00 6198 6420
#> sigma_Intercept 0.87 0.20 0.50 1.30 1.00 10336 7471
#> disp -0.02 0.01 -0.04 -0.01 1.00 6404 6672
#> vs 2.74 1.74 -0.73 6.14 1.00 6770 6991
#> am 2.74 1.81 -0.76 6.36 1.00 6100 7119
#> sigma_vs 0.27 0.34 -0.38 0.95 1.00 6544 7801
#> sigma_am 0.34 0.36 -0.38 1.03 1.00 6925 7174
#>
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
Juxtaposition puts the visual representations of model predictions
and observed data side by side. This comparative layout presents each
object (visual representation for model predictions and observed data)
separately, but requires that the viewer scan the plots to relate them.
We specify the juxtaposition layout for a model check by
mc_layout_juxtaposition()
.
mpg_model %>%
mcplot() +
mc_layout_juxtaposition() +
mc_gglayer(coord_flip())
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
vmc
also supports a variant of juxtaposition, nested
juxtaposition, which helps reduce the amount of scanning required by
viewers by nesting the juxtaposition layout in the plot when the model
check has a discrete conditional variable. We specify this by
mc_layout_nested()
.
mpg_model %>%
mcplot() +
mc_condition_on(x = vars(vs)) +
mc_model_auto(n_sample = 1) +
mc_layout_nested()
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
Superposition overlays the objects (visual representations for model
predictions and observed data) and presents the objects in a single
coordinate system. vmc
uses superposition as the default
comparative layout if users don’t specify a comparative layout.
mpg_model %>%
mcplot() +
mc_layout_superposition() +
mc_gglayer(coord_flip())
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
Explicit encoding directly encodes the connections between objects visually after calculating the difference between model predictions and observed data. For example, a residual plot presents the difference between the model predictions and observed data to encode the comparison about a central line encoding perfect prediction. A quantile-quantile (QQ) plot computes the quantiles of the residuals to encode the comparison.
Explicit encoding can also explicitly enable the user to check a model feature but not depends on viewer’s perception on the distribution of model prediction and observed data shown in raw coordinates of response variable and independent variables. For example, residual plot conditional on response variable enables to check linearity and QQ plot enables to check the normality of residuals.
Let’s check the linearity of the model by specifying a residual plot
conditional on the response variable mpg
, where we use two
visual representations (mc_model_lineribbon()
and
mc_model_point()
) for the model predictions to reveal the
trend of residuals while also showing a set of raw data points.
mpg_model %>%
mcplot() +
mc_condition_on(x = vars(mpg)) +
mc_layout_encoding("residual") +
mc_model_lineribbon(alpha = .2) +
mc_model_point(n_sample = 1) +
mc_gglayer(geom_hline(yintercept = 0))
#> Warning: Duplicated aesthetics after name standardisation: alpha
Next let’s check the normality of the residuals by drawing a QQ plot.
We could specify this easily by using
mc_layout_encoding(tranform = "qq")
.
mpg_model %>%
mcplot() +
mc_layout_encoding("qq") +
mc_model_point(n_sample = 1) +
mc_gglayer(geom_abline())
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
We could use vmc
to specify a customized transformation
function that computes the comparison between model predictions and
observed data. The transformation function should take as input a data
frame that is generated from the newdata
data frame in
mc_draw()
(if not specified, it’s the data used to fit the
model, e.g., you can get that by insight::get_data(model)
)
with additional columns: .row
, .draw
,
prediction
, and observation
(if
newdata
has a column for the response variable). We can
transform the input data frame to generate a new one that has the
columns for the variable shown on the y-axis and x-axis named as
y_axis
and x_axis
(optional).
Here is an example of a customized explicit encoding, where we want to check the variance of residuals.
var_res_func = function(data) {
data %>%
mutate(y_axis = prediction - observation) %>%
mutate(y_axis = sqrt(abs(y_axis / sd(y_axis))))
}
mpg_model %>%
mcplot() +
mc_condition_on(x = vars(mpg)) +
mc_layout_encoding(var_res_func) +
mc_model_lineribbon(alpha = .2) +
mc_model_point(n_sample = 1)
#> Warning: Duplicated aesthetics after name standardisation: alpha