Abstract
Inferring the topology of a generegulatory network (GRN) from genomescale timeseries measurements of transcriptional change has proved useful for disentangling complex biological processes. To address the challenges associated with this inference, a number of competing approaches have previously been used, including examples from information theory, Bayesian and dynamic Bayesian networks (DBNs), and ordinary differential equation (ODE) or stochastic differential equation. The performance of these competing approaches have previously been assessed using a variety of in silico and in vivo datasets. Here, we revisit this work by assessing the performance of more recent network inference algorithms, including a novel nonparametric learning approach based upon nonlinear dynamical systems. For larger GRNs, containing hundreds of genes, these nonparametric approaches more accurately infer network structures than do traditional approaches, but at significant computational cost. For smaller systems, DBNs are competitive with the nonparametric approaches with respect to computational time and accuracy, and both of these approaches appear to be more accurate than Granger causalitybased methods and those using simple ODEs models.
1. Introduction
The phenotypic expression of a genome, including an organism's response to its external environment, represents an emergent property of a complex, dynamical system with multiple levels of regulation. This regulation includes controls over the transcription of messenger RNA (mRNA) and translation of mRNA into protein via generegulatory networks (GRNs). Within a GRN, specific genes encode for transcription factors (TFs); proteins that can bind DNA (either independently or as part of a complex), usually in the upstream regions of target genes (promoter regions), and so regulate their transcription. Since the targets of a TF can include genes encoding for other TFs, as well as those encoding for proteins of other function, interactions arise between transcriptional and translational levels of the system to form a complex network (figure 1). Additional influences may arise in the protein and/or gene space including, for example, posttranslational and epigenetic effects, further complicating matters.
Identifying the structure of GRNs, particularly, with regards to the targets of TFs, has become a major focus in the systems approach to biology. While the genomewide targets of a particular TF can be elucidated experimentally using ChIPchip or ChIPsequencing (ChIPseq) [2], generating a ChIPchip/seq dataset for the many thousands of TFs that may be transcriptionally active in cellular function remains technically and often financially prohibitive. Alternatively, identifying the downstream promoter sequences binding a TF can be achieved using the bacterial onehybrid system [3] and proteinbinding microarrays [4], while TFs that can bind a particular promoter sequence in yeast can be identified via the yeast onehybrid system [5]. As with ChIPchip/seq, onehybrid systems do not currently lend themselves to high throughput experimentation, with the added caveat that binding in vitro or in yeast/bacteria does not necessarily imply that the TF can or does bind in its native environment or under particular experimental conditions.
As an alternative (or complementary) approach to the experimental identification of TF targets, measuring the dynamic response of transcription and/or translation can provide useful information about the underlying GRN on a more global scale, provided there is a way to infer the topology of the network (often referred to as ‘reverseengineering’ in the literature). While measuring the level of protein present in biological systems remains difficult to do on a large scale, measuring the relative abundance of mRNA is comparatively straightforward. Consequently, the generation of highly resolved timeseries measurement of transcript levels via multiple microarray experiments has become an increasingly powerful tool for investigating complex biological processes, including developmental programmes [6] and response to abiotic or biotic stress. The inference of GRNs from such time course data has proven useful for disentangling (suspected causal) relationships between genes [7,8], identifying network components important to the biological response, including highly connected genes (socalled ‘hubs’) and capturing the behaviour of the system under novel perturbations [9,10].
To address the various challenges associated with inferring GRNs from transcriptional time series, a number of theoretical approaches have been applied, including ordinary and stochastic differential equations (ODEs/SDEs) [10–12], Bayesian and dynamic Bayesian networks (BNs/DBNs) [7,8,11–13] and information theoretic or correlationbased methods [11,14]. While the relative merits of these competing approaches have previously been discussed in the context of in silico [11,13,15,16] and in vivo networks [8,12,13,15], there remains disagreement as to the best approach. Previous work by Bansal et al. [11], published in 2007, suggests that while ‘further improvements are needed’, network inference algorithms have become ‘practically useful’. Here, we revisit the work of Bansal et al. [11] by assessing the performance of more recent examples of network inference algorithms, using a range of in silico and in vivo datasets.
2. Inference of generegulatory networks from time course data
Inference of GRNs from transcriptomic datasets is a notoriously difficult task, primarily because the number of probes (genes) on modern microarrays can be several orders of magnitude greater than the number of independent measurements, even for highly resolved and replicated time course studies [6]. The sheer number of genes, combined with the observation that a significant proportion of expression profiles can be correlated over time [6], means that inferring GRNs from microarray data alone may be inherently unidentifiable. To overcome this problem, the number of genes to be modelled is often artificially reduced by removing flat or uninteresting profiles [6] or by clustering together groups of genes that might be coexpressed [17,18]. Alternatively, when investigating an organism's response to a particular stimulus, in which time series are measured under both control and perturbed conditions, the number of genes can be reduced by including only those that are differentially expressed, i.e. whose expression profiles are different in the control and perturbed datasets [19]. Removing genes in this way can greatly limit the number to be modelled, yielding figures in the range of hundreds to tens of thousands. In most cases, this number is still prohibitively large, requiring prior knowledge or heuristic approaches to reduce the pool further. Problems may nevertheless arise when artificially reducing the number of genes in this manner, for example, by erroneously labelling a particular gene as not differentially expressed. Additionally, even modern microarrays may be missing probes for important genes. In these cases, the absence of an otherwise important gene could have a negative impact on the accuracy of the reconstructed network.
Other notable sources of error in the reconstruction of networks arise when inferring networks from transcriptional data alone. Such models tend to assume protein behaviour to be correlated with transcription, which often appears not to be the case [20], owing to a variety of effects including protein degradation and posttranslational modifications.
In subsequent sections, we briefly outline a variety of approaches used to infer networks from time course data, including ODEs, BN and DBNs, nonparametric approaches using Bayesian learning of nonlinear dynamical systems (NDSs), and approaches based upon statisticalhypothesis tests (e.g. ttests), such as Granger causality.
2.1. Ordinary differential equations
Within the context of GRNs, a coupled set of ODEs represents a deterministic model relating the instantaneous change in mRNA concentration of a gene to the mRNA concentration of all other genes in the system: 2.1where X_{i} = {x_{1}, … , x_{M}} denotes the vector of mRNA concentrations of the ith gene (of N) at times {t_{1}, … , t_{M}}, with timederivative Ẋ_{i} = {dx_{1}/dt, … , dx_{M}/dt}, Y represents additional time course variables including, for example, protein and metabolite concentrations or other external perturbations, and f(·) represents an appropriate function capturing the relationship between all variables and their instantaneous change, with Θ the vector of model parameters.
In the simplest ODE models, the rate of change in transcription depends on all other transcripts via: 2.2where a_{ij} represents a realvalued parameter capturing the influence of gene j upon gene i, with positive values indicating genes that promote transcription of i, negative values indicating repressors of i and zeroentries representing nonparents. In these cases, inference of network structure requires inference of the parameters, a_{ij}, from timeseries data, potentially to be accompanied by postinference processing to ensure a_{ij} is sparse. Notable extensions to this basic model include additional terms representing, for example, effects arising from degradation of the mRNA or protein [21], or the influence of external perturbations including drug concentrations [22,23], which may additionally be estimated from the data, or fixed using prior knowledge or experimentation.
When the network structure is known a priori, the functional form of f(·) can instead be chosen to reflect more realistically the underlying assumptions about the nature of GRNs by including, e.g. transcriptional delays and ratelimiting effects designed to mimic the chemical nature of interactions using Michaelis–Menten or Hilltype kinetics. In these models, timeseries data might be used to infer a subset of the physically meaningful parameters such as transcription and translation rates and proteinbinding rates and affinities. The number of parameters in these physically motivated ODE systems can be significantly large and solutions can be unidentifiable. In these cases, a small subset of parameters might need to be estimated, requiring timeseries data from single cells. Additionally, the computational cost of solving such systems of equations can be large, and while nonparametric methods can be employed to speed up inference of parameters [24], these types of ODE systems are more suited to small, wellcharacterized systems, such as the Arabidopsis clock [25–27] or the NFκBsignalling pathway [28,29]. However, the realistic nature of such models allow the user to make predictions about the behaviour of the system under new perturbations [10], an approach that has previously been fruitful in identifying novel components of the system [25].
2.2. Bayesian and dynamic Bayesian networks
BNs are directed acyclic graphs (DAGs) in which individual nodes represent random variables, with directed connections (edges) between nodes representing probabilistic conditional independencies. For a GRN, BNs are used to relate the mRNA expression of a gene to the mRNA expression of other genes. Since the network structure of a BN is a DAG, the joint probability density that a network generated the observations, , may be factorized into a product of conditional distributions: 2.3where denotes the parental set of node i, the mRNA/protein/metabolite concentrations of that parental set and Θ represents the free parameters of the probability density function p_{i}(·). A key limitation of assuming a DAG is the inability to describe feedback loops, an important constituent of GRNs [30,31]. The importance of this is illustrated in figure 2, in which the GRN represented in figure 2a,b may only be represented in terms of a BN as indicated in figure 2c. To overcome these limitations, DBNs were suggested as a model of time course observations [32,33]. For a DBN, an individual gene is no longer represented by a single node but by a series of nodes by unfolding the DAG over time (figure 2d). In the simplest implementation of a DBN the observation at time t_{i} depends only upon observations at time t_{i−1} and equation (2.3) becomes: 2.4where may now include cyclic structures, N represents the number of genes and M the number of time points in the time series.
2.2.1. Missing parameters and variables
A key advantage of BNs and DBNs is the ability to incorporate missing values, including missing timepoint observations and missing variables. The influence of a missing variable, such as an unmeasured TF, may be incorporated into a BN by noting the joint distribution is defined as: 2.5
The influence of the missing variable X′ may either be integrated out (marginalized), or inferred as part of the algorithm (figure 2e). A notable example of a DBN that infers missing variables is the Bayesian statespace model (SSM) of Beal et al. [7], which uses a variational approach to infer the hidden states and free parameters of an SSM.
2.2.2. Inference in Bayesian networks and dynamic Bayesian networks and prior knowledge
Reverseengineering of a GRN using a BN or DBN typically involves inferring the graph structure, , the free parameters of the probability density functions, Θ or a combination of both. In theory, the joint probability of a particular network structure, , and model parameters, Θ, may be calculated via Bayes' rule: 2.6where denotes the set of all allowable graph structures, p(Θ) represents the prior density of Θ and the prior probability of . Enumerating all possible combinations of network structures is often computationally unfeasible while exact marginalization of freeparameters can be analytically intractable. Rather than performing an exact calculation of equation (2.6), quite often approximations are employed, including maximumlikelihood and maximum a posteriori (MAP) approaches, approximations via expectationmaximization (EM) or sampling via Monte Carlo techniques (for a concise introduction to these methods see e.g. [34]).
A key feature of the Bayesian approach is the ability to incorporate prior information. Within equation (2.6), prior information about known connections in a GRN is encoded using the term .
2.3. Nonparametric approaches to nonlinear dynamical systems
In recent years, a novel class of nonparametric algorithms have been suggested which combines the flexibility and advantages of Bayesian learning with NDS [15,35]. A common class of NDS evolves in continuous time as: 2.7where f(·) denotes an unknown, nonlinear function. Rather than explicitly trying to choose a parametrized form for f(·), as would be the case in an system of ODEs, an alternative strategy is to use a nonparametric approach to capture the effect of the function directly from the data. When the parents of a particular gene, , are known a priori, the task becomes one of identifying the potentially nonlinear relationship between a series of outputs, , from a series of inputs , which may be achieved by assigning the function f(·) a Gaussian process prior [36]. The conditional probability density of the observation, , is then defined by a multivariate normal distribution: 2.8where μ represents the process mean and K the covariance. A probability distribution over possible parental sets for the ith gene can then be calculated via Bayes' rule as: 2.9where M_{j} represents the jth possible parental set of gene i and Θ_{j} represents the hyperparameters of the Gaussian process. This calculation requires a separate Gaussian process model of the dynamics for each possible set of parents, which scales superexponentially with the number of TFs. By limiting the maximum indegree for each node, i.e. limiting maximum number of parents, this scaling becomes polynomial [15,35], and the denominator in equation (2.9) may be feasibly calculated. Equation (2.9) may be applied nodebynode to all possible genes in a system, to evaluate the distribution over network structures.
The behaviour of each of the separate Gaussian process models depend upon the choice of hyperparameters Θ. Although it would be preferable to integrate out these hyperparameters, the solution is not analytically tractable and would require samplingbased approaches. Aïjö & Lähdesmäki [15] instead infer a separate set of hyperparameters for each of models in the denominator by maximizing the marginal likelihood: 2.10
Alternatively, studies by Klemm [35] suggest using a common hyperparameter for all models in the denominator of equation (2.9), which can be inferred along with the distribution over parental sets using an EM approach.
2.3.1. Derivative estimation for continuous time systems
Microarray time course datasets measure the mRNA expression profiles, X_{i}, and consequently, the timederivative must first be estimated from the data in order to apply the continuous nonparametric models outlined in §2.3. Studies by Aïjö & Lähdesmäki [15] suggest using a discrete approximation: 2.11
Alternatively, by assuming the mRNA expression profile of the ith gene corresponds to a Gaussian process (e.g. [19]), and noting that the derivative of a Gaussian process is itself a Gaussian process [24,36,37], Klemm [35] suggest that the derivative can be approximated as: 2.12where Θ is taken to be the MAP estimator, and μ and K represent the mean and covariance, respectively.
2.3.2. Discrete time nonlinear dynamical systems
A second class of NDSs outlined in Klemm [35] evolves over discrete time as: 2.13where, again, f(·) represents an appropriate nonlinear function. Where the parental set is known, learning this function essentially involves learning a relationship between a vector of observations X_{i} = {X_{i}(t_{2}),X_{i}(t_{3}), … , X_{i}(t_{M})}, and the matrix of observations composed of the expression levels of the parental set at the preceding time points, which may again be achieved by assigning the function a Gaussian process prior. As with the continuous time NDS, a distribution over the parental sets may be calculated via Bayes' rule (see equation (2.9)).
2.4. Granger causality
The notion of Granger causality was first introduced within the context of autoregression models [38]. Informally, a random variable X_{1} is said to Granger cause a second variable, X_{2}, if knowledge of the past behaviour of X_{1} improves prediction of X_{2} compared with knowing the past of X_{2} alone. Typically, this involves calculating the predictive error of X_{2} with and without the variable X_{1} and performing a suitable statistical test to determine whether or not the more complex model (with X_{1}) has better explanatory power versus the simpler case. Granger causality can readily be extended to cases of multivariate datasets (e.g. [39]), allowing their use in GRN inference [13].
2.5. Implemented algorithms
In order to evaluate the performance of simple ODEs on timeseries data, we use the timeseries network identification (TSNI) algorithm of Bansal et al. [23], while Granger causality analysis is implemented using the Granger Causal Connectivity Analysis (GCCA) toolbox [39]. An example of a linear DBN using firstorder conditional dependencies can be found in the G1DBN package [40], while a DBN with hidden states is investigated using the variational Bayesian SSM (VBSSM) of Beal et al. [7]. An example of inference using nonparametric NDS can be found in the Matlab package GP4GRN [15]. Additionally, we implement a Matlab version of the algorithms proposed in Klemm [35], which includes an EM implementation of a Gaussian process model for NDS evolving over continuous time (causal structure identification (CSI)^{c}, §2.3), and one evolving over discrete time (CSI^{d}, §2.3.2).
3. Network evaluation
Establishing the accuracy of an inferred GRN requires knowledge of a groundtruth (or gold standard) network, against which the inferred network can be ‘benchmarked’. Benchmarking involves counting the number of links correctly predicted by the algorithm (true positives, TP), the number of incorrectly predicted links (false positives, FP), the number of true links missed in the inferred network (false negatives, FN) and the number of correctly identified nonlinks (true negatives, TN), as illustrated in figure 3. Performance of the algorithm can then be summarized by calculating the true positive rate (TPR, TPR = TP/TP + FN) also known as recall, the false positive rate (FPR = FP/FP + TN), and the positive predictive value (PPV = TP/TP + FP) also known as the precision.
In most cases, however, inference does not identify a single network, but a fully connected network with edges weighted according to, for example, the strength or probability of that particular interaction (compare figure 4b with figure 4a), which may interpreted in terms of a ranked list of connections. A sparser network can be extracted from the list by removing all links with strength or probability below a certain threshold (figure 4c), counting the number of TP, FP, TN and FN and evaluating the TPR/FPR/PPV. By evaluating the performance of the algorithm over a range of such thresholds, the performance can be summarized by: (i) the receiver operating characteristic (ROC) curve, which plots the 1FPR (equivalent to the specificity) versus the TPR for all thresholds; (ii) the precisionrecall (PR) curve, which plots the TPR against the PPV for all thresholds. In all subsequent benchmarking, we evaluate both the ROC and PR curves.
4. Results
4.1. In silico networks
While the targets of TFs may be experimentally identified using a variety of experimental techniques including ChIPchip/seq [2] and onehybrid systems [3,5], allowing the validity of individual connections to be verified, establishing a complete goldstandard GRN in biological systems remains technically and financially difficult. This has lead to the emergence of a diverse range of in silico networks from which observations can be generated for benchmarking purposes. Foremost among the in silico networks are those released as part of the dialogue on reverse engineering assessment and methods (DREAM, see also the DREAM homepage^{1}) [16,10].
4.1.1. DREAM4 10gene networks
As an example of a small in silico network we make use of the 10gene networks released as part of the DREAM4 challenge [16,41,42], available for download at the DREAM project homepage. The datasets consist of five separate networks, whose topologies were extracted from known GRNs in Escherichia coli and Saccharomyces cerevisiae. Timeseries observations were generated using parametrized SDEs, with observations uniformly sampled (21 time points, single replicate) under five different perturbations, for a total of 105 observations per gene. The datasets include additive measurement noise, modelled upon existing microarray noise.
Depending upon the capability of the algorithms, networks were inferred using either a complete dataset (all five perturbations simultaneously), or a separate network was inferred for each perturbation, with a consensus network evaluated as the average of the five inferred networks. Selfinteractions were removed from all networks and performance evaluated by calculating the area under the ROC curve (AUROC) and area under the PR curve (AUPR). For comparison, we evaluated the average AUROC and AUPR scores obtained from 1000 randomly generated adjacency matrices (again removing selfinteractions), which yielded scores of approximately AUROC = 0.55 and AUPPR = 0.16–0.19. Additional networks were inferred from steadystate measurements following systematic knockouts or knockdowns of each of the 10 genes using pipeline 1 of Greenfield et al. [10], which yielded AUROC scores of between 0.73–0.9 and AUPR scores in the range 0.27–0.78.
The performance of each of the methods used is summarized in table 1. In general, all methods appeared to perform better than random, with DBNs, nonparametric NDSs approaches, and networks recovered from steadystate data each performing best in at least one of the five datasets.
DBNs were significantly better than random on all five datasets, with DBNs that incorporate hiddenstates (VBSSM) outperforming those that do not (G1DBN) in datasets 2 and 3, while G1DBN performed better on datasets 4 and 5. The two methods performed similarly on dataset 1. In contrast to previous studies of Cantone et al. [12], DBNs were here found to consistently outperform simple ODEbased approaches (TSNI). DBNs also possessed better AUROC/AUPR scores in general than those following reconstruction using Granger causality (GCCA; four out of five datasets), consistent with previous observations that DBNs were better than GC when the number of observations was low [13].
The nonparametric NDSs approaches were consistently found to perform better than random, and were the topranked timeseries approach in three out of five of the datasets. Interestingly, the EM algorithm of Klemm [35], which used a Gaussian process to estimate the timederivative and jointly constrained the hyperparameters, performed better than the maximumlikelihoodbased approach of Aïjö & Lähdesmäki [15], which used a discrete approximation to the gradient, in three of five of the datasets. The discrete NDS proposed in Klemm [35] performed better than both approaches on four of the datasets. In general, however, the nonparametric NDSs methods did not greatly outperform other, linear methods such as the dynamical BNs.
4.1.2. DREAM4 100gene networks
As an example of a larger in silico network, we make use of the 100gene systems of the DREAM4 challenge [16,41,42]. The datasets again consist of five networks whose topology is taken from E. coli or S. cerevisiae networks. Time series consist of uniformly sampled measurements (21 time points, single replicate) simulated using a parametrized set of SDEs under 10 different perturbations. Additive noise modelled upon microarray noise was included.
Recovered networks were inferred as for the 10gene networks, again removing all selfinteractions. For comparison purposes, 100 random networks were generated yielding scores of approximately AUROC = 0.50 and AUROC = 0.002. Additional networks were again inferred from steadystate measurements following systematic knockouts or knockdowns of each of the 100 genes using pipeline 1 of Greenfield et al. [10], which yielded AUROC scores of between 0.74 and 0.88 and AUPR scores in the range 0.15–0.46. Performance of the individual approaches are summarized in table 2.
The nonparametric NDS algorithms performed best of all the time course methods, with scores significantly greater than other approaches, including DBNs. The performance of the discrete CSI algorithm (CSI^{d}) was superior to a previous parametrized ODE model, which had been one of the top contenders in the previous DREAM3 network inference challenge, and had AUPR scores of (approx.) 0.23, 0.11, 0.21, 0.19 and 0.14, respectively, in the five DREAM4 100gene datasets (see fig. 2 of Greenfield et al. [10]). In general, the discrete CSI algorithm (CSI^{d}) [35] was found to outperform the continuous CSI algorithm (CSI^{c}) [35] and the related maximumlikelihood approach of Aïjö & Lähdesmäki [15] (GP4GRN). While DBNs did not perform as well as the nonparametric approaches, they did perform better than methods based upon Granger causality and simple ODE systems (TSNI). The latter has previously been acknowledged to perform poorly on larger datasets [23,11], and is not strictly designed for inferring networks but rather the targets of perturbations. Interestingly, while methods using timeseries data did not outcompete reconstruction using systematic knockout, nonparametric and Bayesian methods were competitive with networks reconstructed following systematic knockdowns.
Network inference using systematic knockouts was superior to all methods using timeseries data tested here. Indeed, networks constructed solely from the knockout data produced the most accurate network topologies in the original DREAM4 challenge. Previous studies have suggested that combining predictions from different inference methods can result in a better, or at least more robust, performance compared with the individual methods alone [42]. In the DREAM4 challenge, a simple combination of time course and knockout data was achieved by the ranking of connections identified from the two datasets and calculating an averaged rank (see pipeline 3 in Greenfield et al. [10]), but this did not appear to perform better than the networks produced from systematic knockouts alone. Subsequent analysis by Greenfield et al. [10] did, however, demonstrate that the two datasets could be used together in a mutually beneficial way. In table 3, we demonstrate the effect of combining timeseries data with knockout data for each of the network inference algorithms previously described, using pipeline 3 approach of Greenfield et al. [10]. For the DBNs, the combination of time series with knockout data outperformed reconstruction using knockout data alone in one out of five cases. Simple ODE and Granger causalitybased methods using timeseries data did not appear to add anything to the knockout data. Conversely, for the NDSsbased approaches the performance using both datasets appears to be better than the knockout dataset alone in two of five cases, and comparable in a third, suggesting that more recent methods can be advantageously combined with knockout data. Producing a knockout for each gene in the network is, however, unrealistic in practice for many organisms, requiring additional biological experiments and explicit knowledge of the genes involved in the network prior to inference.
4.2. Synthetic and in vivo networks
While in silico datasets readily allow benchmarking of algorithms on arbitrarily large systems, Werhli et al. [8] note that the data used in such benchmarking represent a simplification of biological systems, suggesting that conclusions based solely upon in silico benchmarking might lead to biased evaluations. While the topology of largescale, goldstandard biological networks remain unknown, advances in experimental techniques including ChIPchip/seq and the onehybrid systems mean that the smallscale biological networks, including the Arabidopsis clock, are now well characterized [25,27]. Similarly, advances in synthetic biology have allowed small networks to been constructed in vivo [12], providing opportunities for testing of algorithms in real biological systems.
4.2.1. Invivo reverseengineering and modelling assessment yeast network
A synthetic GRN has previously been constructed in the budding yeast S. cerevisiae [12]. This invivo reverseengineering and modelling assessment (IRMA) network consists of five genes, each of which regulate at least one other gene in the network, and whose expression is negligibly influenced by endogenous genes. Expression within the network is activated in the presence of galactose and inhibited in the presence of glucose, allowing two time course datasets to be generated by Cantone et al. [12]. In the first of these cells were grown in a glucose medium and switched to galactose (switch on) with gene expression monitored over 16 time points using quantitative RTPCR. In the second dataset, cells were grown in a galactose medium and switched to glucose (switch off) with expression monitored over 21 time points.
The performance of each of the inference algorithms was evaluated in terms of AUROC and AUPR using both the switch on and switch off datasets, again removing selfinteractions. For comparison purposes, the performance of randomly generated networks, and networks reconstructed from the steadystate data following systematic knockouts are also included. A summary of performance in terms of AUROC and AUPR can be seen in table 4, with networks inferred from the switch off dataset depicted in figure 5.
Previous studies by Cantone et al. [12] had suggested that DBNs (BANJO [43]) performed better than random only on the switch off dataset, based upon calculation of sensitivity. In this study, we note that one of the two DBNs (VBSSM) performed significantly better than random on both datasets, and was the top performer in the switch on dataset. The second DBN performed better than random on the switch on dataset only. Recent studies using nonstationary DBNs appear to perform significantly better than random on the switch on dataset, but not on the switch off [44]. In both instances, however, the AUPR curve for the nonstationary DBN does not appear to be greater than for the stationary DBNs evaluated in this paper (AUPR = 0.51 and 0.38, respectively). The nonparametric methods all performed significantly better than random on the switch off dataset, with CSI^{d} the top performer. In the switch on dataset only GP4GRN was found to perform better than random, with both CSI^{c} and CSI^{d} yielding low AUROC and AUPR curves. Network recovery based upon Granger causality possessed good AUROC and AUPR scores that were competitive with Bayesian approaches on both datasets, and superior to the performance of simple ODE models (TSNI).
In general, methods based upon timeseries data appeared to be competitive with reconstruction based upon systematic knockouts, with VBSSM, TSNI, GP4GRN and GCCA all outperforming knockout networks in the switch on dataset, and CSI^{d} outperforming knockout networks in the switch off dataset.
4.2.2. The Arabidopsis thaliana clock network
The network structure for the Arabidopsis thaliana clock was inferred using timeseries data consisting of 24 uniformly sampled microarrays, taken at 2 h intervals, with four biological replicates (three technical replicates). The networks were inferred from timeseries information of seven clockrelated genes, as used in previously published studies [13], and consist of LHY, CCA1, PRR7, PRR9, TOC1, GI and ELF4. While a goldstandard clock model is not known with absolute certainty, here we use the model of Pokhilko et al. [27] as a yardstick for evaluating performance.
Following inference of the fully connected network structure, networks were identified by taking the top 10 interactions (neglecting selfinteractions). While the number of connections taken was arbitrary, it ensured all algorithms could be compared on a level setting. Within the Pokhilko et al. [27] model, LHY and CCA1 act as a single node, while in the original Locke et al. [25] model so do PRR7 and PRR9, and indeed most of the inferred networks capture a link between LHY and CCA1 and between PRR7 and PRR9.
In figure 6, we summarize the inferred networks produced when treating LHY/CCA1 and PRR7/PRR9 as single nodes. This step did not involve generating new networks with a reduced gene list, but simply by taking the union of all connections. Most of the inferred networks, as well as networks inferred using nonstationary DBNs [44], capture a loop that acts between the morning elements LHY/CCA1 and the evening elements TOC1/GI and vice versa. All methods appeared to capture the link from TOC1 to PRR9 save CSI, where the link direction was reverted. All algorithms, again including networks identified using nonstationary DBNs [44], capture or partially capture a loop between LHY/CCA1 and PRR7/PRR9, with GCCA and CSI capturing the loop in the direction from LHY/CCA1 to PRR7/9, and all other algorithms capturing the loop in the direction from PRR7/9 to LHY/CCA1. Of the methods tested, only CSI^{d} appears to (partially) capture the loop between GI and TOC1, although networks inferred using nonstationary DBNs also appear to capture this loop [44].
Interestingly, ELF4, a clockrelated gene that has not yet been incorporated into the canonical clock network, has several prominent interactions in the recovered networks. Nearly, all algorithms identified ELF4 as a target of LHY/CCA1, consistent with previous modelling using ELF4 with the clock genes [13]. Although a few inferred networks had ELF4 feeding back into the clock, several methods, including nonstationary DBNs [44], suggest it was a terminal node in the network. TSNI predicts that ELF4 modulates TOC1, with both TSNI and G1DBN predicting ELF4 interacting with PRR7/9, while CSI predicts ELF4 regulation of GI.
5. Discussion and outlook
A variety of approaches exist for inferring the structure and dynamics of GRNs from timeseries observations. While the relative merits of these competing approaches have previously been discussed [11–16], the pool of available algorithms is constantly growing. With the increasing prevalence of large computer clusters this has lead to emergence of novel new approaches to inferring GRNs, including nonparametric methods based upon NDSs [15,35]. Here, we have revisited earlier work evaluating the performance of available algorithms, using more recent approaches. In general, the nonparametric approaches outperform other methods, demonstrating the advantages that can be achieved when combining an NDSs formalism with Bayesian learning strategies.
For smaller networks, including the 10gene in silico networks of the DREAM4 challenge [16,41,42], and the 5gene synthetic yeast network of [12], the best performing methods for inferring network structure from timeseries data were split between DBN methods and the nonparametric NDSs, with VBSSM, G1DBN and CSI^{d} each performing best in terms of AUROC/AUPR on at least one of dataset. Networks recovered using other methods, including simple ODE models (TSNI) and networks based upon Granger causality (GCCA), were nearly always found to score higher than found on randomly generated networks. Crucially, the best networks recovered using (in silico and synthetic) timeseries data were, in some cases, superior to those recovered following systematic knockout. Furthermore, in the in silico systems, recovery of networks from time course data nearly always outperformed recovery following systematic knockdowns. This confirms earlier work by Bansal et al. [11] that suggests inferring network structure from timeseries observations can be of practical use, and further suggests that for small systems, i.e. when looking to disentangle the relationship between relatively few genes, a few time series collected under different perturbations, can be more powerful than single measurements comprising complete, systematic interventions.
The ability of these various approaches has been demonstrated on the wellcharacterized Arabidopsis clock network using replicated time course microarrays. Again, as with the small in silico and synthetic networks, the DBNs and nonparametric approaches appeared to accurately recover known links, including the feedback loop between morning and evening elements of the clock, as well as partially capturing loops between LHY/CCA1 and PRR7/9. Additionally, most methods appeared to suggest that ELF4 was, in some way, regulated by LHY/CCA1, consistent with previous models using the same set of genes [13].
In larger systems, i.e. the 100gene in silico networks of DREAM4, the nonparametric NDSs approaches were found to outperform all other methods based upon timeseries data, with AUROC/AUPR scores in keeping with previously published results using parametrized ODE systems [10]. While DBNs did not perform as well as these nonparametric models, they did outperform methods based upon Granger causality, consistent with observations in Zou et al. [13], and were more accurate at recovering network structure than TSNI. In contrast to the 10gene networks, recovery of networks following systematic knockout in the 100gene systems consistently outperformed reconstruction based upon timeseries data. Indeed, recovery based solely upon knockout data was the top scorer in the DREAM4 in silico network challenge [10,16]. This is perhaps not surprising considering that time course datasets contained only 10 perturbations, while recovery from knockouts relied upon 100 separate interventions. By comparison, the 10gene systems contained five time series each under separate perturbations versus 10 interventions in the knockout datasets. While previous studies [10] along with work presented here have demonstrated that recovery of networks based upon systematic interventions and on timeseries observations can be combined in ways that are beneficial to the accuracy of the recovered network, this approach appears to be unfeasible in practice for larger networks. In particular, an Affymetrix microarray time series consisting of 25 time points with three replicates currently costs in the region of £30 000. Assuming an identical number of replicates, a similar figure would allow single time point microarray analysis of just 25 knockouts, requiring extensive knowledge of the genes most probably to be involved in the network prior to inference. Furthermore, the generation and validation of knockouts requires additional time and resources. The combined use of timeseries and genetic datasets should, however, prove useful in more accurately identifying smaller networks.
Interestingly, and despite the obvious similarity of the two approaches, the EM algorithm outlined by Klemm [35], which employs just one set of hyperparameters per gene, performs better than the approach of Aïjö & Lähdesmäki [15], which employs one set of hyperparameters per parental set. This appears to be true regardless of the size of the network, suggesting that the increased number of hyperparameters in the latter approach might negatively impact upon accuracy of the algorithm. One possible explanation for this observation is that allowing for the increased number of hyperparameters in Aïjö & Lähdesmäki [15] somehow leads to overfitting in the Bayesian identification of parental sets and indeed, further studies in Klemm [46] using a framework identical to that of Aïjö & Lähdesmäki [15] suggest that the method incorrectly favours parental sets with larger cardinality. The approach of Klemm [35] additionally appears to perform better when assuming discrete observations, rather than using a separate Gaussian process to estimate the time derivative. A possible explanation for this is that using an independent Gaussian process to first estimate the time derivative may lead to overfitting, smoothing out genuine biological signals and in some cases leading to an inaccurate reconstruction of the phasespace.
5.1.1. Outlook
The studies in this paper using a variety of network inference techniques with the DREAM4 100gene datasets suggest that timeseries data collected under 10 perturbations allow for relatively accurate reconstruction of networks in large systems, with nonparametric approximations of ODEs providing the best overall performance. The relatively low cost of microarray experiments means that highly resolved, highly replicated, time course datasets are being increasingly generated under a variety of experimental perturbations [6], making network inference algorithms of great use in systems approaches to biology. The usefulness of such timeseries data nonetheless depends upon both the quality of the data and precisely how dynamic the timeseries profiles are. In particular, timeseries profiles that spend significant times in stationary phases are evidently going to be less informative than profiles that are changing over the full range of the experimental time course.
Over the next few years, it seems likely that such microarray datasets will be increasingly augmented with more sophisticated experimental datasets, including protein and metabolite time series, protein–protein interactions via yeast twohybrid, and information about proteinDNA interactions derived from ChIPchip/seq or onehybrid systems, as well as targeted knockout and knockdowns. Indeed, several current consortia, including the encyclopedia of DNA elements^{2} (ENCODE), the immunological genome project^{3} (ImmGen) and plant response to environmental stress in Arabidopsis thaliana^{4} (PRESTA) will provide multiple datasets using a variety of experimental techniques. To take advantage of these increasingly rich datasets, network inference algorithms must be able to learn (in a principled way) from large varieties of diverse data. Approaches modelling such diverse datasets in an integrated fashion are already underway, including work by Savage et al. [47] which combines gene expression data with TF binding information (ChIPchip) to identify transcriptional modules in a Bayesian framework. Likewise, previous studies by Greenfield et al. [10], as well as work in this paper, have demonstrated that, in some cases, knockout data can be advantageously combined with timeseries measurement of gene expression by calculating an average ranking of connections over multiple network predictions. The nonparametric learning strategies presented in this paper represent ideal candidates for addressing future challenges of combining datasets. Such algorithms readily incorporate underlying assumptions about GRNs by adopting an NDS formalism, benefit from the flexibility of the Gaussian process framework, and can readily deal with multiple time course perturbations, with the Bayesian treatment of network structure inference allowing for the leveraging of prior information.
These nonparametric approaches do, however, suffer form scaling issues. In particular, the reliance upon Gaussian processes to recover the unknown functions of the NDS requires calculations that scale with n, the number of observations, as (n^{3}), rendering the method impractical in cases where very many time courses are generated. The explicit permutation over all possible parental sets of limited indegree represents a deeper problem, however, as even limiting the maximum cardinality of the parents, the number of parental sets scales polynomially with the number of prospective TFs. This makes computation in the algorithms of (N^{p+1}/(p − 1)!) complexity, where N is the number of genes and p the maximum number of parents per gene. For the 100gene system containing 210 observations per gene, the computational time for most methods tended to lie in the range of minutes to hours. For the nonparametric models, however, computational time could take upwards of 48 h per gene. While the approach is embarrassingly parallel, allowing fast calculations provided the user has access to large clusters, it is currently able to deal with networks containing hundreds, rather than thousands of TFs. The increasing use of ChIPchip/seq and onehybrid systems might, however, play a role here, by increasingly allowing hard constraints to be placed over which TFs can bind particular genes. In such cases, nonparametric methods could allow for accurate reconstruction of very large networks.
Acknowledgements
We thank Katherine Denby for the highresolution Arabidopsis thaliana microarray data, and Sandy Klemm, Claire Hill, Steven Kiddle, Dafyd Jenkins, Katherine Denby and David Rand for helpful comments and discussion. We also thank the anonymous reviewers for their valuable comments. We acknowledge support from grants BBRSC BB/F005806/1 (Plant Response to Environmental Stress in Arabidopsis; C.A.P. and D.L.W.) and EPSRC EP/G021163/1 (Mathematics of Complexity Science and Systems Biology; D.L.W.).
Footnotes
One contribution of 9 to a Theme Issue ‘Inference in complex systems’.

↵1 http://wiki.c2b2.columbia.edu/dream/index.php/The_DREAM_Project.

↵4 http://www2.warwick.ac.uk/fac/sci/lifesci/research/presta/project/.
 Received May 31, 2011.
 Accepted July 12, 2011.
 This journal is © 2011 The Royal Society