--- title: "Introduction" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r run example model} library(rstan) library(bennu) library(bayesplot) library(ggplot2) rstan_options(auto_write = TRUE) options(mc.cores = 2) ## basic example code d <- generate_model_data() # note iter should be at least 2000 to generate a reasonable posterior sample fit <- est_naloxone(d, iter = 100) mcmc_pairs(fit, pars = c("sigma", "mu0", "zeta"), off_diag_args = list(size = 1, alpha = 0.5) ) ``` Return median and 90th percentiles for posterior samples ```{r summarize draws} library(posterior) summarise_draws(fit, default_summary_measures()) ``` We can plot the posterior predictive distribution of the probability of prior kit use and compare to simulated data using the `plot_kit_use` function ```{r plot draws} plot_kit_use(model = fit, data = d) ``` We can also plot the reported number of kits used along with the posterior predictive distribution using the `plot_kit_use` function with the `reported` argument set to `TRUE` ```{r plot reported draws} plot_kit_use(model = fit, reported = TRUE, data = d) ``` Alternatively, if we only want to plot certain regions we can include that as an optional argument in the `plot_kit_use` function ```{r plot draws filter regions} plot_kit_use( model = fit, data = d, regions_to_plot = c("2") ) ``` We can compare the posterior predictive check to the prior predictive check by re-running the model without the likelihood by setting the `run_estimation` parameter to `FALSE` ```{r run prior} # note iter should be at least 2000 to generate a reasonable posterior sample prior_fit <- est_naloxone(d, run_estimation = FALSE, iter = 100 ) mcmc_pairs(prior_fit, pars = c("sigma", "mu0", "zeta"), off_diag_args = list(size = 1, alpha = 0.5) ) ``` We can compare the prior and posterior predictive distribution of the probability of prior kit use using the `plot_kit_use` function ```{r plot compare prior posterior} plot_kit_use(prior = prior_fit, posterior = fit, data = d) ``` ## Incorporating different priors The default priors for the model are set to be weakly uninformative. To change the default priors we can specify a `list` as follows. Note that `c` is the prior of the region coefficients, `mu0` is the intercept, `ct0` is the initial conditions of the random walk, `zeta` is the standard deviation of the random walk, and `sigma` is the standard deviation of the error. In this example let's set the variation month to month to be smaller by setting the standard deviation of the prior for `zeta` to 0.01. ```{r nondefault prior} nondefault_priors <- list( c = list(mu = 0, sigma = 1), ct0 = list(mu = 0, sigma = 1), zeta = list(mu = 0, sigma = 0.01), mu0 = list(mu = 0, sigma = 1), sigma = list(mu = 0, sigma = 1) ) nondefault_prior_fit <- est_naloxone(d, iter = 100, priors = nondefault_priors) ``` Compare the posterior using non-default priors to the posterior for default priors, ```{r plot compare prior posterior nondefault prior} plot_kit_use(`Default prior` = fit, `Non-default prior` = nondefault_prior_fit, data = d) ``` ## Random walk of order 2 model The second model uses a random walk of order 1 to characterize the time-dependence structure of the inverse logit probability of naloxone use, $$\Delta^2c_i = c_i - c_{i+1} \sim N(0,\tau^{-1}) $$ The model can also be fit using a random walk of order 2 increments for the inverse logit probability of naloxone use using the following independent second order increments, $$\Delta^2c_i = c_i - 2c_{i+1} + c_{i+2} \sim N(0,\tau^{-1}) $$ This version of the model will produce smoother estimates of probability of naloxone use in time. ```{r rw order 2 model} # note iter should be at least 2000 to generate a reasonable posterior sample # note use `rw_type` to specify order of random walk rw2_fit <- est_naloxone(d, rw_type = 2, iter = 100 ) mcmc_pairs(rw2_fit, pars = c("sigma", "mu0", "zeta"), off_diag_args = list(size = 1, alpha = 0.5) ) ``` Summarize random walk order 2 model draws ```{r rw order 2 summary draws} summarise_draws(rw2_fit, default_convergence_measures()) ``` We can compare two different model fits of the probability of prior kit use to simulated data using the `plot_kit_use` function ```{r plot draws comparison} plot_kit_use(rw_1 = fit, rw_2 = rw2_fit, data = d) ``` ## fit missing distribution data As distribution data are reported from sites some months may not be reported. This would lead to a difference in the frequency of the orders data and the distribution data. The model can account for these differences and infer the distribution during missing months. See the following example for generating distribution data once every three months ```{r data generation lower frequency} ## basic example code d_missing <- generate_model_data(reporting_freq = 3) ggplot(aes(x = times, y = Reported_Used, color = as.factor(regions)), data = d_missing ) + geom_point() ``` ```{r missing data inference} missing_fit <- est_naloxone(d_missing, rw_type = 2, iter = 100 ) mcmc_pairs(missing_fit, pars = c("sigma", "mu0", "zeta"), off_diag_args = list(size = 1, alpha = 0.5) ) ``` ```{r plot draws with missing data} plot_kit_use(model = missing_fit, data = d_missing) ``` ```{r plot reported draws with missing data} plot_kit_use(model = missing_fit, data = d_missing, reported = TRUE) ``` ## Change scale of data the original model was designed for data aggregated into months. If the order and distribution data is aggregated differently then this can be incorporated into the model by updating the delay in reporting distribution and the delay in ordering to distributing distribution. The delay in ordering to distributing distribution is composed of a gamma distribution with shape parameter $\alpha$ and inverse scale parameter $\beta = 1/\theta$. In order to update for example from monthly to weekly data, we can use the following property of the gamma distribution, $$\Gamma(cx;\alpha,\beta) = \Gamma(x;\alpha,\beta/c)$$ To convert the delay distribution from month into week set $c = 1/4$ as one over the approximate number of weeks in a month. The reporting delay distribution is a simple vector of length 3 denoting the empirical delay distribution. This can be expanded into months through interpolation and normalization to retain its property as a probability distribution. Example code (not run) is shown below. Note that these distributions will need to be updated when considering another jurisdiction, ```{r example change units of data, eval = FALSE} # monthly reporting delay distribution psi_vec <- c(0.7, 0.2, 0.1) # convert to weeks using interpolation weekly_psi_vec <- rep(psi_vec, 4) / sum(psi_vec) # properties of order to distributed delay distribution in months max_delays <- 3 delay_alpha <- 2 delay_beta <- 1 # convert to weeks weekly_max_delays <- max_delays * 4 weekly_delay_alpha <- delay_alpha weekly_delay_beta <- 0.25 * delay_beta result <- est_naloxone(d, psi_vec = weekly_psi_vec, max_delays = weekly_max_delays, delay_alpha = weekly_delay_alpha, delay_beta = weekly_delay_beta ) ```