The interface between life and physical sciences provides an abundant habitat for mathematical models. These are often complex to our feeble minds yet ridiculously simplistic in comparison with nature's subtlety. They nevertheless often succeed in extracting important insights into, and sometimes quantitative measures of, nature's ways. The effectiveness of mathematics in the natural sciences was dubbed ‘unreasonable’ in the title of a famous essay over 50 years ago , and is no less so today. The models come in a range of flavours, with elements from deterministic and stochastic mathematics, and from algebra, combinatorics and graph theory among other branches of the mathematical tree. To evaluate the hypotheses about how nature works embodied in such models, their properties—most obviously the outcomes they predict—must be compared with observations. At a minimum, the observations can be at the level of qualitative descriptions of natural systems, but increasingly today models are evaluated on the basis of detailed comparisons with carefully measured data. Moreover, rather than coming at the end of a process of model building and computation, comparisons with data are today usually integrated into every level of the model building process, from choosing the broad features of the model to fine-tuning of parameter values.
This issue of Interface Focus is devoted to showcasing some of the many possibilities for innovative approaches to inference in complex models. The emphasis here is not on fully general exposition of methods or rigorous proofs of validity. There is of course some introduction and review of advanced techniques that are more fully developed elsewhere. However, our focus is on application to cutting edge scientific problems. For this showcase, we have selected an unusually broad range of scientific areas, the unifying features being complex models and challenging inference problems tackled by novel techniques.
This year marks the centenary of the death of Francis Galton, the great Victorian polymath who has a claim to be regarded as the first applied statistician in the modern sense. He enthusiastically collected and analysed data to illuminate a range of scientific problems from climate to human psychology and morphology, the latter including fingerprints for forensic use. His bequest generated another important centenary this year—the founding of the world's first academic department of statistics at University College London. Academic statistics has flourished in the intervening century, but the number of card-carrying statisticians remains far too small to address the ubiquitous challenges of inference in every branch of science. The toolkit of inference methods available to scientists has constantly been enriched by contributions from other disciplines, including students of insurance, games of chance, psychology, agriculture and genetics. In recent decades, entire schools of inferential thought have evolved in various engineering and informatics disciplines, largely independent of mainstream statistics, and identified with terms such as machine learning, signal processing, fuzzy logic, data mining and sometimes the curious term ‘inverse problems’. As computational advances allowed each of these fields to expand in achievements and ambition, a ‘tower of Babel’ problem has been evident, with different schools tackling essentially the same problems, but with differing cultures and vocabularies restricting the free flow of ideas. With this issue, I believe that we see evidence of a new vigour in the field arising from a coming together in recent years of major inference schools with different origins. While the mathematical rigour traditionally associated with academic statistics remains important in all schools of inference, the proof of convergence results as sample size increases—once a mainstay of academic statistics—is of diminished relevance today, while for example proofs of the validity and rate of convergence of algorithms is of great importance. All but one of the papers published here work to some extent within the Bayesian paradigm of statistical inference, reflecting the success of Bayesian inference ideas in many complex models, which acts as a unifying tendency across the different schools. Working with important marginal distributions of a high-dimensional probability distribution spawned by a complex model allows, albeit often only approximately, interpretable measures of uncertainty about a diverse range of inferences.
Experimental design has been a bedrock of statistical thought for most of the twentieth century. I know from personal experience on grant funding committees that its principles are, sadly, far less widely understood by scientists than should be the case. Michael Stumpf and colleagues  consider a new approach to experimental design, concerned not with the number and types of experiments required to achieve a specified precision of estimation from future data, but instead with how to specify a system in synthetic biology to generate specified ‘design objectives’: characteristics of the outputs of future systems. Penfold & Wild  assess the performance of several novel approaches to inferring the topology of gene regulatory networks, finding that non-parametric approaches, combining nonlinear dynamical system formalism with Bayesian learning strategies, perform well overall, while dynamic Bayesian networks perform well for smaller systems. Annibale & Coolen  examine the effects of sampling on topological features of signalling networks, using ensembles of tailored random graphs. Their overall goal is to help evaluate biases in the networks available in public databases and their effects on inferences drawn from analysis of these databases. Systems of nonlinear differential equations have been of fundamental importance in deterministic mathematical modelling. Calderhead & Girolami  here advance recent efforts to put this approach into a statistical framework, allowing principled parameter estimation and hypothesis evaluation for differential equation models, as well as measures of uncertainty in model predictions. Their implementation uses Markov chain Monte Carlo (MCMC) with Riemannian geometry to model the local covariance structure of the parameter space, and is illustrated with application to cell signalling pathways and enzymatic circadian control. Jesus & Chandler  review the statistical technique of estimating functions for analysing and modelling complex systems, including the special case of the generalized method of moments. Estimating function methods are particularly useful when working within a traditional formal statistical model, but of such complexity that the likelihood function is intractable. The authors illustrate its application to point process rainfall models. The technique of approximate Bayesian computation, adopted by Stumpf and colleagues, has also become widely used in recent years for ‘likelihood-free’ approximate inference. Golightly & Wilkinson  use a related likelihood-free approach within their particle-MCMC framework for inferring the parameters of a stochastic chemical kinetic model. They apply the methodology to a Lotka–Volterra system and a prokaryotic auto-regulatory network model. Suchard and colleagues  use a Bayesian multivariate analysis to predict multiple medical outcomes of different data types from a single combination of predictors, using a composition of generalized linear models and using MCMC for parameter inference. They use the method to predict outcomes from a study of young people living with HIV. Finally, Tavaré and colleagues  further recent efforts to put phylogeographic inference on a sound statistical footing, in their case within a clustering framework based on a coalescent model within a subdivided population, with a fixed but unknown number of migrations. This approach allows the incorporation of covariates such as climate information.
I hope that this issue will stimulate awareness of some of the diverse range of possibilities for evaluating complex mathematical models, estimating their parameters and testing hypotheses embedded within them. Many thanks to the authors for agreeing to contribute to this issue to tight deadlines, even though the rigorous reviewing process of Interface Focus meant that invitation generated no guarantee of acceptance. My thanks also to the reviewers for their crucial efforts to maintain quality and improve presentation of the authors' work.
One contribution to a Theme Issue ‘Inference in complex systems’.
- Received September 14, 2011.
- Accepted September 14, 2011.
- This journal is © 2011 The Royal Society