---
title: "Introduction"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r run example model}
library(rstan)
library(bennu)
library(bayesplot)
library(ggplot2)

rstan_options(auto_write = TRUE)
options(mc.cores = 2)

## basic example code
d <- generate_model_data()
# note iter should be at least 2000 to generate a reasonable posterior sample
fit <- est_naloxone(d, iter = 100)
mcmc_pairs(fit,
  pars = c("sigma", "mu0", "zeta"),
  off_diag_args = list(size = 1, alpha = 0.5)
)
```

Return median and 90th percentiles for posterior samples

```{r summarize draws}
library(posterior)

summarise_draws(fit, default_summary_measures())
```

We can plot the posterior predictive distribution of the probability of prior 
kit use and compare to simulated data using the `plot_kit_use` function

```{r plot draws}
plot_kit_use(model = fit, data = d)
```

We can also plot the reported number of kits used along with the posterior
predictive distribution using the `plot_kit_use` function with the `reported`
argument set to `TRUE`

```{r plot reported draws}
plot_kit_use(model = fit, reported = TRUE, data = d)
```

Alternatively, if we only want to plot certain regions we can include that
as an optional argument in the `plot_kit_use` function

```{r plot draws filter regions}
plot_kit_use(
  model = fit, data = d,
  regions_to_plot = c("2")
)
```

We can compare the posterior predictive check to the prior predictive check
by re-running the model without the likelihood by setting the `run_estimation`
parameter to `FALSE`

```{r run prior}
# note iter should be at least 2000 to generate a reasonable posterior sample
prior_fit <- est_naloxone(d,
  run_estimation = FALSE,
  iter = 100
)
mcmc_pairs(prior_fit,
  pars = c("sigma", "mu0", "zeta"),
  off_diag_args = list(size = 1, alpha = 0.5)
)
```


We can compare the prior and posterior predictive distribution of the 
probability of prior kit use using the `plot_kit_use` function

```{r plot compare prior posterior}
plot_kit_use(prior = prior_fit, posterior = fit, data = d)
```

## Incorporating different priors

The default priors for the model are set to be weakly uninformative.
To change the default priors we can specify a `list` as follows. Note
that `c` is the prior of the region coefficients, `mu0` is the intercept,
`ct0` is the initial conditions of the random walk, `zeta` is the standard 
deviation of the random walk, and `sigma` is the standard deviation of the 
error. In this example let's set the variation month to month to be smaller
by setting the standard deviation of the prior for `zeta` to 0.01.

```{r nondefault prior}
nondefault_priors <- list(
  c = list(mu = 0, sigma = 1),
  ct0 = list(mu = 0, sigma = 1),
  zeta = list(mu = 0, sigma = 0.01),
  mu0 = list(mu = 0, sigma = 1),
  sigma = list(mu = 0, sigma = 1)
)
nondefault_prior_fit <- est_naloxone(d, iter = 100, priors = nondefault_priors)
```

Compare the posterior using non-default priors to the posterior for
default priors,

```{r plot compare prior posterior nondefault prior}
plot_kit_use(`Default prior` = fit,
             `Non-default prior` = nondefault_prior_fit, data = d)
```

## Random walk of order 2 model

The second model uses a random walk of order 1 to characterize the time-dependence
structure of the inverse logit probability of naloxone use,

$$\Delta^2c_i = c_i - c_{i+1} \sim N(0,\tau^{-1})  $$

The model can also be fit using a random walk of order 2 increments for the inverse logit probability of naloxone use using the following independent second order increments,

$$\Delta^2c_i = c_i - 2c_{i+1} + c_{i+2} \sim N(0,\tau^{-1}) $$

This version of the model will produce smoother estimates of probability of
naloxone use in time.

```{r rw order 2 model}
# note iter should be at least 2000 to generate a reasonable posterior sample
# note use `rw_type` to specify order of random walk
rw2_fit <- est_naloxone(d,
  rw_type = 2,
  iter = 100
)
mcmc_pairs(rw2_fit,
  pars = c("sigma", "mu0", "zeta"),
  off_diag_args = list(size = 1, alpha = 0.5)
)
```


Summarize random walk order 2 model draws

```{r rw order 2 summary draws}
summarise_draws(rw2_fit, default_convergence_measures())
```

We can compare two different model fits of the probability of prior kit use
to simulated data using the `plot_kit_use` function

```{r plot draws comparison}
plot_kit_use(rw_1 = fit, rw_2 = rw2_fit, data = d)
```


## fit missing distribution data

As distribution data are reported from sites some months may not be reported.
This would lead to a difference in the frequency of the orders data and the
distribution data. The model can account for these differences and infer the 
distribution during missing months. See the following example for generating
distribution data once every three months

```{r data generation lower frequency}
## basic example code
d_missing <- generate_model_data(reporting_freq = 3)
ggplot(aes(x = times, y = Reported_Used, color = as.factor(regions)),
  data = d_missing
) +
  geom_point()
```


```{r missing data inference}
missing_fit <- est_naloxone(d_missing,
  rw_type = 2,
  iter = 100
)
mcmc_pairs(missing_fit,
  pars = c("sigma", "mu0", "zeta"),
  off_diag_args = list(size = 1, alpha = 0.5)
)
```


```{r plot draws with missing data}
plot_kit_use(model = missing_fit, data = d_missing)
```

```{r plot reported draws with missing data}
plot_kit_use(model = missing_fit, data = d_missing, reported = TRUE)
```

## Change scale of data

the original model was designed for data aggregated into months. If the order and distribution data is aggregated differently then this can be incorporated into the model by updating the delay in reporting distribution and the delay in ordering to distributing distribution. 

The delay in ordering to distributing distribution is composed of a gamma distribution with shape parameter $\alpha$ and inverse scale parameter $\beta = 1/\theta$. In order to update for example from monthly to weekly data, we can use the following property of the gamma distribution,

$$\Gamma(cx;\alpha,\beta) = \Gamma(x;\alpha,\beta/c)$$

To convert the delay distribution from month into week set $c = 1/4$ as one over the approximate number of weeks in a month. The reporting delay distribution is a simple vector of length 3 denoting the empirical delay distribution. This can be expanded into months through interpolation and normalization to retain its property as a probability distribution. Example code (not run) is shown below. Note that these distributions will need to be updated when considering another jurisdiction,

```{r example change units of data, eval = FALSE}
# monthly reporting delay distribution
psi_vec <- c(0.7, 0.2, 0.1)
# convert to weeks using interpolation
weekly_psi_vec <- rep(psi_vec, 4) / sum(psi_vec)

# properties of order to distributed delay distribution in months
max_delays <- 3
delay_alpha <- 2
delay_beta <- 1

# convert to weeks
weekly_max_delays <- max_delays * 4
weekly_delay_alpha <- delay_alpha
weekly_delay_beta <- 0.25 * delay_beta

result <- est_naloxone(d,
  psi_vec = weekly_psi_vec,
  max_delays = weekly_max_delays,
  delay_alpha = weekly_delay_alpha,
  delay_beta = weekly_delay_beta
)
```