## Abstract

We propose a Bayesian multivariate model in which a single linear combination of the covariates predict multiple outcomes simultaneously. The single linear combination is a data-derived score along the lines of the Apache or Charlson index scores for critically ill patients, the Karnofsky or Eastern Cooperative Oncology Group score for cancer patients or Euro-score for cardiac patients that may be used to predict multiple outcomes. Outcomes may be discrete or continuous and we use a composition of generalized linear models for the marginal distribution for each outcome. We explain how to set the prior distribution and we use Markov chain Monte Carlo methods to calculate the posterior distribution. We propose two types of expanded models to diagnose whether each outcome indeed has predictor effects common with the other outcomes, and whether a particular predictor is commonly predictive for all outcomes. We determine a final model based on the diagnostic models. The method is applied to a study yielding multiple psychometric outcomes of mixed type measured in young people living with human immunodeficiency virus.

## 1. Introduction

Multiple outcomes are common in medical, psychological and sociological studies. Joint analysis of continuous multiple outcomes is common in recent decades [1–18]. Difficulty arises when the outcomes comprise a mix of continuous, binary and non-negative integer variables. In recent years, a number of models have been developed to handle mixed outcomes, e.g. [19–24].

Little & Schluchter [19] and Gueorguieva & Agresti [24] have proposed maximum-likelihood procedures for analysing mixed continuous and binary data with missing values. Little & Schluchter model *q* categorical variables defining a *q*-way contingency table with *C* = ∏*I*_{i} cells where the *i*th categorical variable has *I*_{i} levels. They assumed a multinomial distribution over the *C* cells of the contingency table. Given the contingency table cells, the continuous outcomes follow a multivariate normal distribution. Sparse data can make inference problematic. Gueorguieva & Agresti proposed an underlying multivariate probit model for the binary variables that correlate directly with the continuous normal variables.

Sammel *et al.* [20] proposed a latent variable model for mixed discrete and continuous outcomes with the multiple outcomes correlated through subject-specific latent variables. Conditional on the latent variables, outcomes were assumed independent with each outcome arising from a one parameter exponential family, with its mean a function of the latent variable and other covariates. An expectation-maximization (EM) algorithm was used to fit the model.

Arminger & Kusters [21] assumed each outcome had a latent variable and jointly, the latent variables follow a multivariate normal distribution with expectation potentially dependent on covariates. Dunson [23] proposed a Bayesian latent variable model for clustered mixed outcomes. It generalized Arminger & Kusters' work to accommodate non-normal latent variables for multi-level data. It allowed nonlinear relationships between the covariates and latent variables and used multiple latent variables for each type of outcome and covariate-dependent modification of the relationship between the latent and covariates.

In this paper, we propose a double link common predictor effect (COPE) model in which a single linear combination of the covariates is used to predict all outcomes simultaneously. Generalized linear models are used to model the marginal distribution for each outcome. Outcome-specific double link functions connect each outcome's mean to the single linear predictor, where a double link refers to the composition of one link function in a second link function. Correlations among outcomes are modelled by latent variables. Markov chain Monte Carlo (MCMC) algorithms are used to estimate the posterior distribution of the parameters and the latent variables.

Compared with the model proposed by Sammel *et al.* [20] in which each outcome has an outcome-specific set of fixed effect coefficients, our model furnishes a parsimonious common set of fixed effects for all outcomes, thus we have greater efficiency in our parameter estimates. Compared with Dunson's model, in which all outcomes have a common set of fixed effects, our model returns greater flexibility for modelling COPE through the double link function between the linear predictor and outcomes. In general, the usual multiple outcome model is the *saturated* model, which has a separate set of coefficients for each outcome. Our model, when appropriate, is much more efficient as there are many fewer coefficients.

Frequentist methods are based on asymptotic maximum-likelihood theory and the standard errors and confidence interval are based on asymptotic normality assumptions. In Bayesian inference, posterior distribution of parameters and latent variables can be estimated by MCMC methods, and means and variances for parameters can be appropriately estimated without requiring a large sample size. For our proposed model, a key inference involves the product of parameters; using a maximum-likelihood approach requires the delta method to estimate point estimates and variances of the product of parameters. With the Bayesian approach, inferences can be drawn directly from posterior samples of the product of the parameters.

Index scores for evaluating disease severity are common in medicine. Examples include the Apache [25,26] and Charlson index scores [27,28] for critically ill patients, the Karnofsky and Eastern Cooperative Oncology Group scores for cancer patients [29,30] and EuroScore [31,32] for cardiac patients. These scores are often used for predicting multiple outcomes beyond the outcome used for the initial derivation. The single linear combination estimated by our model is a principled data-derived score along the lines of these scores that may be used to predict multiple outcomes. Our Bayesian methodology using MCMC methodology makes it easy to produce both (i) index calculations for an individual complete with uncertainty intervals and (ii) predictions for single or multiple outcomes. Construction of online calculators is straightforward using MCMC methodology.

This paper is organized as following: §2 proposes the double link COPE model for mixed outcomes. Section 3 explains the prior specification. In §4, we relax our insistence on only one linear combination of the covariates for predicting all outcomes, and we describe two types of extended models both as model diagnostics and as potentially valuable models in their own right. Section 5 applies the method to multiple, mixed-type psychometric outcomes measured on young people living with human immunodeficiency virus (YPLH) and the paper concludes with a discussion in §6.

## 2. Common predictor effect model specification

We observe *k*_{1} continuous outcomes, *k*_{2} binary outcomes, *k*_{3} count outcomes and *L* covariates on subject *i*, *i* = 1, …, *n*. Let *y*_{ik} denote the *k*th outcome for subject *i*, *k* = 1, …, *K*, *K* = *k*_{1} + *k*_{2} + *k*_{3}, and define the *i*th observation *y*_{i} = (*y*_{i1}, …, *y*_{iK})′; let *x*_{il} denote the *l*th predictor, *l* = 1, …, *L*, for subject *i* with *x*_{i} = (*x*_{i1}, …, *x*_{iL})′, where *x*_{i} does not include an intercept. Covariates are the same for all outcomes and a single linear predictor, meaning a single linear combination of covariates, *x*_{i}′*α*, will be used to predict all outcomes.

An outcome-specific double link function is needed for each outcome in our proposed model: the usual link function *g*_{k}(·) for a generalized linear model that connects the expected value of subject *i* outcome *k*, *E*(*Y*_{ik}) = *θ*_{ik}, to the natural parameter *η*_{ik}; and a location-scale link function *ψ*_{k}(·) that connects its natural parameter with the single linear predictor. In our model, we take *ψ*_{k}(·) to be linear, , which allows the COPE for each outcome to be modified by a scale parameter. The usual link functions for generalized linear models are the identity link for continuous outcomes *η*_{ik} = *g*_{k}(*θ*_{ik}) = *θ*_{ik}; the logit link for binary outcomes *η*_{ik} = *g*_{k}(*θ*_{ik}) = logit(*θ*_{ik}) = log(*θ*_{ik}(1−*θ*_{ik})^{−1}), where logit(*x*) = log(*x*(1 − *x*)^{−1}) and its inverse, called expit, expit(*x*) = exp(*x*)(1 + exp(*x*))^{−1}; and the log link for count outcomes *η*_{ik} = *g*_{k}(*θ*_{ik}) = log(*θ*_{ik}). We model the correlations across the multivariate outcomes on the linear predictor scale through a random effect variable *β*_{ik}, .

The double link COPE model for mixed outcomes is
2.1
2.2a
2.2band
2.2cwhere *α* is an *L* × 1 vector of fixed-effect regression coefficients for all outcomes, *γ*_{0k} and *γ*_{1k} are the intercept and scale parameter in the link for outcome *k*, respectively, *γ*_{0} = (*γ*_{01}, … , *γ*_{0K})′, *γ*_{1} = (*γ*_{11}, … , *γ*_{1K})′, *β*_{i} = (*β*_{i1}, …, *β*_{iK})′ follows a multivariate normal distribution with mean 0 and covariance *Σ*(*ξ*), *β*_{i} ∼ *N*_{K}(0,*Σ*(*ξ*)), where *Σ*(*ξ*) models the correlations among the multiple outcomes on the linear predictor scale, and *ξ* is a vector of unknown parameters for the components of the correlation matrix.

As a whole, can be estimated in model (2.1), however there is an identifiability problem if we want to estimate *α* and *γ*_{1} individually. For example, *α* and *γ*_{1} can be replaced by and for an arbitrary constant *c* ≠ 0 and we still have the same prediction. We need to constrain *α* and/or *γ*_{1} to solve the problem. The solution we propose is to divide the regression coefficients by their Euclidean norm ‖*α*‖ = (*α*′*α*)^{1/2} and fix the sign of the regression coefficient for one pre-specified predictor whose regression coefficient is significantly different from 0, without lack of generality, say *α*_{L}. We modify *η*_{ik} in model (2.1) as
2.3

## 3. Prior specification

We specify prior distributions for the parameters in our proposed model to complete the Bayesian modelling. Many researchers have discussed different ways to specify priors [33–36]. Prior information may come from previous studies, published results from similar studies, scientific reasoning about the measured variables, other related data or expert opinions. For any model, it is helpful to have an automatic prior that requires relatively little in the way of substantive input to allow data and model exploration to begin. Our own preference is to add scientific/statistical reasoning on top of that to allow for sensible informative proper priors particularly for parameters not of primary scientific interest. In contrast, specifying a fully subjective prior requires a large investment in time and training of subject matter specialists and is not an activity that can be lightly undertaken.

For models (2.2*a*)–(2.3), we specify the prior distributions for common regression coefficients *α*, intercepts *γ*_{0}, scale parameters *γ*_{1} and the covariance *Σ* of the latent variables *β*_{i}. We first standardize each continuous predictor by subtracting its mean and dividing by its standard deviation, the resulting continuous predictor has mean 0 and standard deviation 1. We assign a normal prior to *α*, *α* ∼ *N*(*μ*_{α,} *Σ*_{α}), where *μ*_{α} is a vector of location parameters and *Σ*_{α} is a covariance matrix. We set *μ*_{α} = 0_{L}, an *L*-vector of 0s, and *α*'s to be independent, , is the *L* × *L* identity matrix, and we give an arbitrary value *c*. This is a flat prior for *α* /‖*α*‖ on the unit ball in ℜ^{L}. The sampling distribution of Y is not affected by the length ‖*α*‖ of *α* and the posterior of ‖*α*‖ is equal to its prior.

We discuss priors for the intercept *γ*_{0k} and scale parameters *γ*_{1k} separately for each type of outcome: Bernoulli, count and continuous. For Bernoulli outcomes, a wide range of probabilities of a response equal to 1 is from 0.001 to 0.999 (0.001 < *P*(*y*_{ik} = 1) < 0.999), which gives a range after logit transformation of −6.9 to 6.9. We assign a normal prior to *γ*_{0k} with mean 0 and variance ((6.9 − (−6.9))/4)^{2} =11.9 for binary outcomes, which is the square of the range divided by 4, inspired by the idea that in the prior, the mean ± 2 s.d. should cover 95 per cent of the prior range. For count outcomes, we take the mean of the response to be somewhere between 0.01 and 100. After a log transformation, we have a range of the linear predictor of −4.6 to 4.6. We assign a normal prior to *γ*_{0k} with mean 0 and variance ((4.6 − (−4.6))/4)^{2} = 5.3 for count outcomes. For continuous outcomes, we assign a normal prior to *γ*_{0k} with mean and variance . We consider these priors to be vague and not particularly informative although proper. The prior for continuous outcome uses the data, but the information in the prior is limited. Another approach for continuous outcomes is to guess at the range = (max−min) of each outcome and use the mid-range, (max + min)/2, as prior mean and ((max−min)/4)^{2} as prior variance.

In equation (2.3), scale parameter *γ*_{1k} is the slope of the regression of *η*_{ik} on . When the linear predictor is perfectly predictive, the absolute value of the slope can be estimated as the range (max − min) of *η*_{ik} divided by the range (max − min) of , with an appropriate sign either positive or negative. When the linear predictor is not predictive at all, *γ*_{1k} = 0. We assign a normal prior with mean 0 and variance the square of the range of the natural parameter divided by 2 times the range of single linear predictor
3.1

We previously discussed the range of the natural parameters for binary, Poisson and continuous outcomes when specifying priors for the intercept. We now discuss the range of . We deal with continuous predictors and binary predictors separately. The standardized continuous predictors, the norm of the fixed effect regression coefficients equal to 1 implies that the absolute value of each regression coefficient *α*_{l} /‖*α*‖ is less than 1; an approximate range of *x*_{il} *α*_{l} /‖*α*‖ is then at most roughly −2 to 2. For binary predictors, we have
3.2

Here *l*^{′} indexes the binary predictors, *L*′ is the total number of binary predictors, **1** is a vector of 1 with length *L*′ and *α*_{b} /‖ *α* ‖ is the vector of regression coefficients for the binary predictors. The range of *x*′*α* /‖ *α* ‖ is therefore roughly , where *L* − *L*′ is the number of continuous predictors. We substitute this range in place of the denominator of equation (3.1) to calculate the prior variances for *γ*_{1k}.

We use a Wishart prior *Σ*^{−1} ∼ Wishart(*ν*, * Λ*) for precision matrix

*Σ*

^{−1}, with degrees of freedom

*ν*and precision

*, where*

*Λ**Σ*

^{−1}is the inverse of covariance matrix

*Σ*in model (2.3). We set

*ν*to be a number somewhat bigger than

*K*, the dimension of

*, and we set*

*Λ**to be a diagonal matrix. We set diagonal element*

*Λ*

*Λ*_{kk}to be

*ν*−

*K*− 1 divided by a prior estimate of the variance of

*η*

_{ik}. For continuous outcomes, we use the prior-estimated range divided by 4 as our prior estimate of the variance of

*η*

_{ik}; for binary outcomes, we use ((6.9–(−6.9))/4)

^{2}= 11.9 and for count outcomes, we use ((4.6–(−4.6))/4)

^{2}= 5.3.

## 4. Diagnostic models

*A priori*, outcomes are chosen to share a single linear predictor. In practice, we may choose outcomes which we hope are appropriate. Similarly, we may have chosen predictors that are generally predictive of all outcomes, however some of them may not belong to a single-shared linear predictor. The COPE model assumes that all outcomes are predicted by the single linear predictor and all predictors are commonly predictive of all outcomes. There are models whose number of fixed effects parameters is in between the COPE model and the saturated model, and we propose two extended models to check the COPE model assumptions. Model A checks whether one pre-specified outcome belongs with other outcomes, and model B checks whether one pre-specified predictor is commonly predictive. We apply model A to all outcomes in turn. Any outcomes that do not belong to the COPE model are removed and are fit separately barring some value to continuing to fit a multivariate model. We then apply model B to all predictors in turn. In contrast to outcomes, predictors often can not be arbitrarily dropped for statistical reasons and must be kept in the model for scientific reasons. Thus, we may actually prefer the model B version of the COPE model—a kind of relaxed COPE model where most covariates belong to a single linear predictor, but a few covariates are not required to be part of the COPE linear predictor.

### 4.1. Extended model A: one outcome has a separate set of predictor effects

For a pre-specified outcome *s*, we modify the linear predictor in model (2.3) by adding an extra set of regression coefficients for outcome *s* on top of the common regression coefficients, to allow the fixed effects for outcome *s* to differ from the common effects while keeping the linear predictor for other outcomes the same as in model (2.3)
4.1where *α*_{s} = (*α*_{s1}, …, *α*_{sL}) is the extra set of regression coefficients, *α*_{sl} is the departure from the common effect for all outcomes of the effect of predictor *l* on outcome *s*, *l* = 1,…, *L*. We assign *α*_{s} a normal prior with mean 0_{L} and variance **I**_{L}.

Models (2.2*a*)–(4.1) have an identifiability problem, *γ*_{1s} and *α*_{s} can be replaced by and for an arbitrary constant *c*. To solve the problem, we use the Gram–Schmidt process to split *α*_{s} into two components, one component is parallel to *α*, where *c*^{*} = *α*_{s}′*α*/*α*′*α* and the other component, denoted *α*^{*}_{s}, *α*^{*}_{s} = *α*_{s} − *c*^{*}*α*, is perpendicular to *α*, *α*^{*}^{′}_{s} *α* = 0. We use MCMC sampling for computations and we calculate *α*^{*}_{s} for each posterior sample of *α*_{s}.

Having fit the model, we test the hypothesis *H*_{0}: *α*^{*}_{s} = 0_{L} against the alternative hypothesis *H*_{A}:*α*^{*}_{s} ≠ 0_{L} using a Bayesian Wald test. The test statistic is calculated as
4.2where is the posterior mean of *α*^{*}_{s}, the posterior variance Var(*α*^{*}_{s}) is estimated by , *N* is the total number of posterior samples, *j* is the index of *j*th posterior sample, and *W*_{α} is compared with a *χ*^{2}-distribution with degree of freedom *L*. If *W*_{α} > *χ*_{L}^{2}(0.95), we conclude that the *s*th outcome does not have regression parameters in common with other outcomes.

### 4.2. Extended model B: one predictor has a distinct effect for each outcome

Suppose we have *K*^{*} outcomes remaining in our model. For a pre-specified predictor *l*, we modify model (2.3) to allow the *l*th predictor *x*_{l} to have a separate coefficient for each outcome *k*. We modify *η*_{ik} for outcome *k* in model (2.3)
4.3where *ϕ*_{k}, *k* = 1, … , *K*^{*}, is the departure from the common effect of the fixed effect of predictor *l* on outcome *k* and *x*_{il} is the *l*th component of *x*_{i} for the *i*th subject; also recall that *α*_{l} is the *l*th component of *α*. Define *ϕ* = (*ϕ*_{1}, … , *ϕ*_{K*})′ and *γ*_{1} = (*γ*_{11}, … , *γ*_{1K*})′. We assign *ϕ* a normal prior with mean 0_{K*} and variance **I**_{K}^{*}.

As with model B, we have an identifiability problem because *α*_{l} and *ϕ* can be replaced by *α*_{l}* = *α*_{l} * *c* and *ϕ*^{*} = *ϕ*+ (*γ*_{1}*α*_{l} (1 − *c*))/‖ *α*‖ for an arbitrary constant *c*, thus *γ*_{1l} and *ϕ*_{l} are confounded. We again use the Gram–Schmidt process to split *ϕ* into two components, one component *c***γ*_{1} is parallel to *γ*_{1}, where *c** = *ϕ*′*γ*_{1}/*γ*_{1}′*γ*_{1} and the other component *ϕ** = *ϕ*− *c***γ*_{1} are perpendicular to *γ*_{1}.

We test the hypothesis *H*_{0} : *ϕ*^{*} = 0_{k*} against the alternative hypothesis *H*_{A} : *ϕ*^{*} ≠ 0_{K*}. The Wald test statistic is
4.4here is the posterior mean of *ϕ**, posterior variance Var(*ϕ**) is estimated by and *W*_{ϕ} is compared with a *χ*^{2}-distribution with degree of freedom *K*^{*}. If , we conclude that the *l*th predictor is not commonly predictive for all outcomes.

### 4.3. Final model

We fit extended model A for each outcome in turn and calculate each Wald test. We remove the outcomes with significant Wald test results from our COPE model. We then fit extended model B with the remaining outcomes for each predictor in turn and calculate each Wald test statistic as well. We remove predictors with significant Wald tests from the common set of predictors and these predictors are allowed to have separate coefficients for each outcome. Suppose, we have *K*^{*} outcomes left for the COPE model, *L*^{*} predictors are common and *L* − *L*^{*} predictors have outcome-specific regression coefficients. Without loss of generality, suppose the first *L*^{*} predictors are common for all outcomes, denoted as *x*′_{i−} = (*x*_{i1}, … , *x*_{iL}^{*}). Then our final model is
4.5where *α*_{−} = (*α*_{1}, … , *α*_{L}^{*}) is the common regression coefficient vector, *α*_{mk} is the regression coefficient of predictor *m*, *m* = *L*^{*} + 1, … , *L*, for outcome *k*, *k* = 1, … , *K*^{*}, *γ*_{0k}, *γ*_{1k} and *β*_{ik} are the same as in model (2.3). If the Wald test for every outcome is significant, the simple solution is to not use the COPE model, though some B-type diagnostic model might still be appropriate. If all predictors are significant, we get the usual saturated model.

## 5. Data analysis: teens linked to care

We illustrate the methodology by examining baseline observations from the teens linked to care (TLC) study on YPLH, which was conducted in eight adolescent clinical care sites in Los Angeles, New York, and San Francisco from 1995 to 1996 [37,38]. In this paper, we study outcomes that are subscales from the psychometric measure brief symptom inventory (BSI). The BSI consists of 53 items covering nine symptom dimensions: somatization, obsessive-compulsive disorder (OCD), interpersonal sensitivity, depression, anxiety, hostility, phobic anxiety, paranoid ideation and psychoticism. Each item is given an integer-valued score ranging from 0 to 4.

We generate continuous, binary and Poisson variables from the 53 items by the following rules: continuous subscale scores are calculated by summing the values for the items included in that dimension and dividing by the number of items; binary variables are calculated from a single item score by setting the binary variable to 0 if the original item score is 0 else setting the binary variable to 1 for an item score of 1–4; and Poisson variables are calculated by summing the score of the items included in that dimension.

These are all commonly used distributions for items and subscales except for the Poisson distribution, which we think merits greater consideration than it has received in the past. For example, many items on the BSI are skewed with respondents choosing mostly 0, occasionally 1 and rarely 2 to 4. For an item scored on a 0 to 4 scale, a Poisson random variable with mean less than 1.27 has less than a 1 per cent chance of being 5 or larger, and thus the Poisson approximation to the sampling distribution has the potential to be surprisingly accurate; this must necessarily be confirmed for each item in the dataset. Summing several approximately Poisson-distributed random variables then produces a random variable that, in practice, is indistinguishable from a Poisson, even in the presence of modest correlation. Poisson regression is a commonly accepted model for non-negative integer-valued data; our model introduces the random effect to model both the correlation and the overdispersion, for when a non-negative integer outcome being modelled as Poisson has variance bigger than its mean.

We then apply the proposed COPE model on a set of generated outcomes that include continuous, binary and Poisson variables. Higher scores on every scale indicate a worse psychological state.

We use data collected at baseline interview in our analysis and start with a candidate model with eight outcomes, two treated as continuous subscales: depression and OCD; two treated as Poisson subscales: anxiety and somatization; and four dichotomized as binary items: BS11 (poor appetite), BS13 (temper uncontrollable), BS21 (people are unfriendly) and BS31 (avoid things that frighten you). No items are common between BSI subscales and the four binary items belong to subscales not otherwise used in this analysis. The two continuous outcomes have long right tails, we transform each outcome as log_{2} (*x* + *c*), where *c* is the smallest non-zero value for that outcome, in our example, *c* is 1/6 for both outcomes. For response variables analysed on the log scale, the exponentiated regression coefficients can be interpreted as the multiplicative increase in the unlogged response per unit increase in the predictor.

The predictors are gender (73% male); age standardized to have mean 0 and standard deviation 1 by subtracting the mean 20.8 and then dividing by the standard deviation 2.1; food (87% yes), yes means that a subject can get enough food easily daily; finance (66% yes), yes means that a subject has the necessities to live comfortably and no means that the subject is struggling to survive and has difficulty paying bills; marijuana use in the past three months (79% yes); acquired immune deficiency syndrome (AIDS) symptoms (37% yes), yes means that the subject has physical health symptoms resulting from HIV infection; hard drug use (62% yes), yes means that the subject used at least one type of hard drug from stimulants, LSD, inhalants, coke, crack and heroin in the past three months; and attempted suicide (41% yes), yes means that the subject attempted suicide at least once prior to being enrolled in this study.

We complete a Bayesian prior specification of the model by following the recommendations from §3. We assign a normal prior with mean 0_{8} and variance 9**I**_{8} to *α*, *α* ∼ N(0,9**I**_{8}), here 9 can be changed to any arbitrary value. For the intercepts *γ*_{0k}, for continuous outcomes, we use the centre of the range (max + min)/2 as the prior mean and 1/4 of the range (max−min)/4 as the prior standard deviation; for binary outcomes, we use 0 as the prior mean and 3.4 as the prior standard deviation; for Poisson data, we use 0 as the prior mean and 2.3 as the prior standard deviation. A guess of the ranges of log depression and OCD are both from −2.58 to 2.06. The centre of the range is −0.26 and the range divided by 4 is 1.16. Therefore, we set normal independent priors for *γ*_{0} with means (−0.26,−0.26,0,0,0,0,0,0) and standard deviations (1.16,1.16,3.4,3.4,3.4,3.4,2.3,2.3).

We have one standardized continuous predictor and seven binary predictors, the range of *x*_{i}′ *α* /‖ *α* ‖ is then from to which is −4.6 to 4.6 and the length of the range is 9.2. A guess of the range of *η*_{ik} is 5.2 for continuous outcomes after logarithm transformation, 13.8 for binary outcomes and 9.2 for Poisson outcomes. We calculate the prior standard deviation for scale parameters by dividing the range of *η*_{ik} by 4 times the range of *x*′*α* /‖*α*‖ and we get 0.14 for continuous outcomes, 0.38 for binary outcomes and 0.25 for count outcomes. We set normal independent priors for scale parameters *γ*_{1} with mean (0,0,0,0,0,0,0,0,0,0) and standard deviations (0.28,0.28,0.76,0.76,0.76,0.76,0.50,0.50).

We set a Wishart prior for the inverse of the covariance matrix for the random effects *β*_{i} with degrees of freedom 20 and we calculate the *k*th diagonal element by dividing *ν* − *K* − 1 = 20 − 8 − 1 = 11 by a guess of the variance of *η*_{ik}, which are (5.2/4)^{2} = 1.7 for continuous outcomes, (13.8/4)^{2} = 11.9 for binary outcomes and (9.2/4)^{2} = 5.3 for count outcomes. We get the prior precision matrix * Λ* = diag(6.5,6.5,0.9,0.9,0.9,0.9,2.1,2.1).

We run the proposed candidate model in WinBUGS [39] and generate 500 000 posterior samples, discarding the first 4000 samples as a burn-in. We examine autocorrelation plots and find that the autocorrelations for most parameters are close to 0 after 300 lags. The posterior density plots show that the posterior distributions for *α*, *γ*_{0} and *γ*_{1} are bell-shaped and unimodal.

We diagnose the candidate model by running extended model A, models (2.2*a*)–(4.1), for each outcome in turn to test whether the outcome has a separate set of regression coefficients. This results in eight extended models corresponding to the eight outcomes. We calculate the Bayesian Wald test statistic for each model and compare it with *χ*_{8}^{2}(0.95) = 15.51, where the degree of freedom is equal to the number of predictors in the model. Results show that the Wald test statistic is significant for anxiety (*p*-value = 0.048), but not for any of the other outcomes. We exclude the outcome anxiety then run extended model B, models (2.2*a*)–(2.2*c*) and (4.3) for each predictor in turn to test whether the predictor is not commonly predictive for all outcomes, but has a different regression coefficient for each outcome, resulting in another eight extended models corresponding to the eight predictors. We calculate the Bayesian Wald test statistic for each model and compare it with *χ*_{7}^{2}(0.95) = 14.07, where the degree of freedom is the number of outcomes in the model, which is 7 now because we excluded anxiety. Results show that the Wald test statistic is significant for suicide attempt (*p*-value = 0.040) but not for the other predictors. The Bayesian Wald test statistics and *p*-values for these two sets of extended models are shown in table 1.

Our final model has anxiety by itself in a separate model and a relaxed COPE model for the other seven outcomes, and eight predictors among which suicide attempt has an outcome-specific effect and the other seven predictors constitute a single linear predictor commonly predictive for the seven remaining outcomes. Posterior summaries for the seven common regression coefficients *α* and eight scale parameters *γ*_{1} are shown in table 2. Having enough food and a good financial situation imparts negative effects on all seven outcomes. Using marijuana and having AIDS symptoms have positive effects on all seven outcomes. The magnitude of the effects is modified by the scale parameter for each outcome. The scale parameter for the depression and OCD subscales is 0.58 and 0.71, respectively, which means that 1 unit change in the single linear predictor would associate with a change of 2^{0.58} = 1.49 units in depression and a change of 2^{0.71} = 1.64 for the OCD subscale because we used a base 2 logarithm transformation for these two continuous variables prior to analysis. The scale parameters for binary outcomes B11, B13, B21 and B31 are 1.42, 0.47, 0.87 and 0.71, respectively, which means that 1 unit change in the single linear predictor would associated with *e*^{1.42} =4.13, *e*^{0.47} =1.59, *e*^{0.87} =2.39 and *e*^{0.71} = 2.04 odds ratio change for B11, B13, B21 and B31. The scale parameter for Poisson outcome somatization is 0.71, indicting a one unit change in linear predictor is associated with *e*^{0.71} = 2.04 unit change in somatization.

Table 3 shows posterior summaries for the regression coefficients of each predictor for each outcome. We fit a separate model for anxiety and a relaxed COPE model for the other seven outcomes. For the first seven common predictors, we calculate *α*_{l}*γ*_{1k} from the posterior samples of *α*_{l} and *γ*_{1k}, then draw inference from the calculated posterior samples of *α*_{l}*γ*_{1k}. We have the outcome-specific regression coefficient for suicide attempt, the one predictor that has a separate regression coefficient for each outcome. Suicide attempt has significant positive effects on depression, OCD, B31 and somatization. Food and finance have significant negative effects on anxiety while hard drugs, AIDS symptoms and suicide attempt have significant positive effects. We compare the standard deviation of the regression coefficients in our final relaxed COPE model with that in the saturated model and find that on average, the standard deviation in our final model is 29 per cent less than that in the saturated model. This is evidence that compared with the saturated model, the COPE model has greater efficiency.

## 6. Discussion

We began with a candidate COPE model that assumed all outcomes were similar and shared one common linear predictor. We then fit flexible diagnostic models to identify outcomes that did not belong to the COPE model, and which were then excluded from the final model. The model with one linear predictor is a one cluster model. When we have a large number of outcomes, more than one linear predictor may be needed to predict all outcomes. We may expand our one cluster model to a multi-cluster model by clustering outcomes and introducing cluster-specific regression coefficients for each cluster [40].

We can use a graphical method to identify which outcomes belong to the same cluster and how many clusters we shall have. We first run a univariate generalized linear model for each outcome *y*_{k} and get estimates of regression coefficients omitting the intercept. We normalize by dividing by its Euclidean norm, , so that has length 1 and can be compared across outcomes. We plot a profile plot of the , plotting points (*j*, ) and drawing line segments connecting point to , and inspect the plot to identify potential outcome clusters.

Finally, we can adapt the COPE model to longitudinal mixed outcome data by introducing two latent variables, one to model the correlation among outcomes at the same time and the other to model the correlation among outcomes over time. Consideration of the COPE model when it cannot be written as a linear model, as in model (2.3), could be an area of future work.

In this paper, we proposed a Bayesian Wald test to determine whether one outcome or one predictor belongs to the COPE model. There are alternative methods for model selection. The Bayes factor is a widely used statistic in which prior and posterior information are combined in a ratio that provides evidence in favour of one model versus another. We can calculate a Bayes factor as the marginal likelihood of the candidate model divided by the marginal likelihood of a diagnostic type A model or type B model, where the marginal likelihood of a model is the probability of the data with all the model parameters integrated out. We can also use Akaike information criterion (AIC), Bayesian information criterion (BIC) or deviance information criterion (DIC) to do model selection. In a Bayesian framework, DIC is easily calculated from the samples generated by a MCMC simulation while AIC and BIC require calculating the likelihood.

## Acknowledgements

Weiss was supported in part by the Center for HIV Identification, Prevention and Treatment Services, NIH/NIMH P30MH58107. Suchard was supported in part by a John Simon Guggenheim Memorial Foundation Fellowship and a research gift from Google.

## Footnotes

One contribution of 9 to a Theme Issue ‘Inference in complex systems’.

- Received April 28, 2011.
- Accepted August 9, 2011.

- This journal is © 2011 The Royal Society