## Abstract

Allopatric speciation is a mechanism to evolve reproductive isolation; it is caused by the accumulation of genetic differences between populations while they are geographically isolated. Here, we studied a simple stochastic model for the time until speciation caused by geographical isolation in fragmented populations that experience recurrent but infrequent migration between subpopulations. We assumed that mating incompatibility is controlled by a number of loci that behave as neutral characters in the accumulation of novel mutations within each population. Genetic distance between populations was defined as the number of incompatibility-controlling loci that differ between them. Genetic distance increases through the separate accumulation of mutations in different populations, but decreases after a successful migration event followed by genetic mixing between migrants and residents. We calculated the time to allopatric speciation, which occurs when the genetic distance exceeds a specified threshold. If the number of invasive individuals relative to the resident population is not very large, diffusion approximation provides an accurate prediction. There is an intermediate optimal rate of migration that maximizes the rate of species creation by recurrent invasion and diversification. We also examined cases that involved more than two populations.

## 1. Introduction

Species diversity is determined by the balance between speciation and extinction. The mechanism of speciation has been a central topic of evolutionary sciences. However, questions that are rarely asked include how the speed at which novel species are created depends on the life history of the organism, geographical structure, habitat stability, mating system, species interactions, etc.

One of the simplest mechanisms that can lead to the creation of novel species is allopatric speciation [1,2]. If two or more populations of the same species are isolated geographically, they will accumulate different mutations and become quite different from each other after many generations. When individuals from these populations encounter each other, they may no longer be able to perform proper sexual reproduction if they have become too different. Then, the two populations can be considered different species. Allopatric speciation is regarded as a major process for species creation [3,4].

However, very few theoretical studies have examined the process of allopatric speciation because it tends to be regarded as ‘theoretically trivial’ [5,6]. As a consequence, almost all of the theoretical studies of speciation conducted within the last few decades have focused on sympatric speciation in which species diverge spontaneously as a result of adaptation to different niches, habitats or mate choice [7–15].

Nei *et al.* [16] studied a simple population genetics model for allopatric speciation. They assumed that the possibility of sexual reproduction for a pair of individuals was controlled by one or two loci. For example, suppose that there are three alleles (labelled 1, 2 and 3) in a single compatibility-controlling locus. All homozygotes (*A*_{1}*A*_{1}, *A*_{2}*A*_{2} and *A*_{3}*A*_{3}) are viable and have the same fitness. Heterozygotes *A*_{1}*A*_{2} and *A*_{2}*A*_{3} are also viable. But heterozygotes *A*_{1}*A*_{3} are not viable (they have zero fitness). Initially, the two populations are both composed of *A*_{2}*A*_{2} only. One population might experience mutation to *A*_{1}, and ultimately be replaced by *A*_{1}*A*_{1}, which takes place through neutral evolution because their heterozygotes *A*_{1}*A*_{2} have the same fitness as the homozygotes. Similarly, the other population might be replaced by *A*_{3}*A*_{3} by neutral evolution. After these changes, the two populations are no longer compatible with each other because any breeding between individuals from different populations ends in reproductive failure. Nei *et al.* [16] also studied a two-locus version of this model (cf. [17]). A simple two-locus model for incompatibility was proposed earlier by Dobzhansky [18] and Muller [19], where the original genotype is aabb and is replaced by AAbb in one population, which is possible because Aabb is viable. In the second population, the original population is dominated by aabb and is replaced by aaBB, where the evolution is also neutral as aaBb is viable. However, the two populations do not cross each other because offspring between AAbb and aaBB are AaBb, which is non-viable. Many theoretical works on allopatric speciation have studied various extensions of these simple models (cf. [20–23]).

Orr & Orr [24] studied the effect of population size on the waiting time until two populations would be considered different species, assuming the Dobzhansky model. They concluded that the waiting time to speciation caused by genetic drift was independent of the sizes of the populations. Gavrilets & Hastings [25] concluded that the founder effect promotes speciation, based on the Dobzhansky model. They measured the degree of reproductive isolation in terms of the proportion of non-viable hybrids when randomly chosen pairs from the two populations are crossed.

In these studies, it was assumed that no migration occurred between the populations when calculating the time to speciation. Gavrilets [26] extended the model to incorporate recurrent migration between two populations. He examined several models, including selection to the local environment for each population. If we focus on the case without selection, Gavrilets [26] assumed that the incompatibility was controlled by multiple loci. As a population is monomorphic with respect to these loci, the state of a population is characterized by the ‘distance’ *i* between two populations, which indicates the number of incompatibility-controlling loci that differ between them. When *K*+1 loci are different (*K* is a ‘threshold’ specifying the genetic architecture of reproductive isolation), the two populations become different species. Gavrilets calculated the waiting time assuming a birth-and-death process in which the distance *i* could increase by increments of one at a rate that was equal to the mutation rate. He also modelled the effect of migration between two populations as a decrease in distance *i* at a rate that was proportional to the product of the current distance *i* and the migration rate.

The time to speciation should increase as the migration between populations becomes larger. In the complete absence of migration, two populations will become different species in the shortest period of time. However, no additional species will be created by this mechanism. If instead there is infrequent but recurrent migration between islands, the populations on the islands will become different species during the intervals between migration events, and the next migration (colonization) by one of the species will cause both islands to contain the same species in addition to another species. This may be followed by genetic differentiation. Repeated events of migration and genetic differentiation that occur on different islands will increase the number of novel species recurrently. Endler [27] reviewed empirical studies and concluded that the most general mode of speciation is parapatric speciation, in which different populations are isolated but not perfectly. Avise [28] also concluded that the geographical structure of most species is consistent with a meta-population, being composed of many local populations that are connected by infrequent migration (see also [29] for mathematical modelling of speciation and diversification in a meta-population).

In the work reported in this paper, we examined a simple stochastic model that determined the time until speciation as a result of geographical isolation in fragmented populations that experience recurrent but infrequent migration among them. We assumed that mating incompatibility was controlled by a number of loci that behave as neutral characters in the accumulation of novel mutations within each population [30]. Genetic distance between populations was defined as the number of incompatibility-controlling loci that differ between two populations. Genetic distance increases through the accumulation of mutations, which occurs separately in different populations, but is reduced after a successful migration event followed by genetic mixing of migrants and residents (cf. [31]). We calculated the time to allopatric speciation, which occurred when the genetic distance exceeded a threshold [32]. If the number of immigrating individuals relative to the resident population is not very large, diffusion approximation provides an accurate prediction. There is an intermediate optimal rate of migration that maximizes the rate of species creation by recurrent invasion and diversification. We also examined cases that included more than two populations.

## 2. Dynamics of genetic distance between populations

Consider two populations of a sexual haploid species with non-overlapping generations. The populations live on two islands or island-like habitats. For simplicity, we assume that the two populations are of the same size, *N*. Successful migration of individuals between populations occurs very infrequently. Most migration attempts result in failure. However, when an attempt is successful, a certain number of individuals (instead of a single individual) from one population arrive in the other population.

We assume that the possibility of successful mating between immigrants and residents is controlled by differences in a set of *l* loci that control incompatibility. To be specific, the possibility of interbreeding between two individuals is determined by the number of loci that differ between them, denoted by *x*. Two individuals cannot reproduce sexually if *x* exceeds a threshold value *x _{c}*. By contrast, they can mate and produce offspring if

*x*is less than

*x*.

_{c}We assume that the size of each population *N* is clearly smaller than the inverse of the mutation rate *u* for these incompatibility-controlling loci (weak mutation limit). Under this assumption, each population is monomorphic most of the time with respect to these loci, except for relatively short periods of replacement following a successful migration event (e.g. [33–42]; see [43] for review). Then, we can define the genetic distance between the two populations in terms of the number of incompatibility-controlling loci that differ between them. If a population is polymorphic with respect to these loci, then we can use the average genetic distance between the two populations [44].

Between successive migration events, the genetic differences between the two populations increase owing to the accumulation of different mutations in different populations. Hence, their genetic distance, or the degree of genetic differentiation, increases with time. With population size *N*, the mutation rate per loci *u* and the neutrality of the mutations in the process of their accumulation, the rate of accumulation of novel mutations is *k* = *ul*. The time required for fixation is of the order of *N* generations [45], which is much shorter than the time scale on which the speciation process is occurring. As each population would accumulate novel mutations independently, the rate of divergence is 2*k*.

Note that the average genetic distance increases only if the replacement of an allele occurs at a locus for which the two populations have the same allele. If replacement occurs in one population at a locus for which the two populations have different fixed alleles, the change does not increase their genetic distance. Hence, the rate of increase in genetic distance owing to replacement by novel mutations should slow down as the distance between two populations increases. Considering this effect, the mean rate of divergence per generation is given by the following equation: 2.1between successive migration events.

The timing of successful migrations from one population to the other follows a Poisson point process at a rate *m* per generation. Here *m* is very small because the interval between successful invasions is very long.

From the assumption of incompatibility, if immigrants and residents have different alleles at *x* loci, and if *x* is greater than the threshold *x _{c}*, then the immigrants and residents do not mix sexually and they should be treated as different species.

By contrast, if *x* is smaller than the threshold *x _{c}*, then individuals from the two populations can freely exchange their genomes through sexual reproduction. According to the stochastic theory of population genetics for neutral loci, a population eventually becomes monomorphic where one allele dominates and the other goes extinct. The probability of fixation for two alleles is proportional to their initial frequencies [46,47]. If the population size of the immigrants is

*N’*and the population size of the residents is

*N*, then immediately after the invasion, there are

*x*loci that are polymorphic, where immigrants and residents have different alleles. After some number of generations, which is of the order of

*N*[45], the population again becomes monomorphic, and the expectation of loci that show the immigrant allele is equal to the fraction of immigrants . We here assume that the loci are unlinked for simplicity. If the loci behave independently, the number of loci that still differ after the fixation process follows a binomial distribution This quantity is equal to the decrease in the distance:

*x*

_{before}−

*x*

_{after}. Considering the situation in which the number of loci controlling the incompatibility is very large, the distance after mixing is well approximated by the mean value. Hence, we have the genetic distance after one successful migration event as 2.2Normally, is much smaller than 1.

The model that we explored in this paper was the stochastic dynamics of genetic distance *x*, which was given by equations (2.1) and (2.2). We focus on the situation in which the number of loci controlling the incompatibility is large (*l* and *x _{c}* are large). In addition, concerning the magnitude of rate parameters, we assume: . Successful migrations are less frequent than mutation accumulation. We here focus on the stochasticity caused by the random timing of successful migration events, while we treat other processes as deterministic: namely, the increase of the genetic distance between populations, equation (2.1), and the reduction of the distance after successful migration events, equation (2.2). We name this the ‘population-based model’. In the following, we study several different approximations of the population-based model. We also ran direct computer simulations of an individual-based model of population genetic dynamics, which is described in appendix A. As will be explained later, most of the conclusions of the simplified stochastic dynamics of genetic distance model and the individual-based model were very similar, although the latter model required considerably more computational time to run.

## 3. Evolutionary trajectories

Figure 1 provides an example of changes in the genetic distance between two populations with time. Among several curves in figure 1, the black curve, labelled population-based model, was generated by direct computer simulations of the stochastic model. Grey curves and the broken line are for the individual-based model and the deterministic model, which is explained later. The black curve indicates that genetic distance increases with time and the rate of increase slows down and eventually converges to a final level (cf. [48–51]) around which it continues to fluctuate. This fluctuation is caused by the stochasticity of the timing of successful migration events.

Two different threshold levels are indicated by horizontal lines. Open circles show the time at which genetic distance first reaches the threshold level. The two populations are no longer compatible and when migrants from one population arrive in the other population they cannot mate successfully. Hence, we regard the two populations as two separate species. We can consider that speciation occurs at *τ* when the genetic distance *x* reaches the threshold *x _{c}*, which is indicated by open circles in figure 1. Here, we focus on

*τ*, the time of this first passage to speciation.

After genetic distance exceeds the threshold, migrants from one population can immigrate to the other population and they may quickly evolve mating isolation mechanisms to avoid reproductive interference [52–54].

If the threshold to speciation *x _{c}* is below the equilibrium, the first passage time to speciation is rather short. If the threshold

*x*is above the equilibrium level, stochastic fluctuations above equilibrium in the genetic difference allow the threshold to be reached in finite time.

_{c}In the following section, we analyse the stochastic model of genetic distance by introducing approximations and compare the results with direct computer simulations and an individual-based model of the corresponding population genetic process.

### 3.1. Deterministic approximation

For the two islands population-based model, under approximations that the migration rate *m* is large and the impact of migration *ɛ* is small, we can approximate the decrease in genetic distance as a deterministic process. Then, the ordinary differential equation for genetic distance is given by
3.1where *t* is time measured in terms of generations. Here, we neglect terms of the order of *m*^{2} because *m* is assumed to be small.

In figure 1, the dashed curve indicates the change in genetic distance predicted by the deterministic model. It increases smoothly with time and converges to a final level. Given the initial condition *x*(0) = 0, the solution of equation (3.1) is
3.2At equilibrium, the mean genetic distance is
3.3which increases with the rate of mutation accumulation *k* but decreases with the migration rate *m* and the impact of migration events *ε*. The time until speciation is
3.4aand
3.4bEquation (3.4*b*) implies that speciation never occurs if the threshold genetic distance is greater than the saturation level in the deterministic model.

### 3.2. Diffusion approximation

Note that the stochastic model for genetic distance is a Markov process, which implies that once the value of *x* at time *t* is known, the behaviour of the model after *t* becomes independent of any additional information from the past trajectory. If, in addition, the trajectory is continuous with a probability of one, the model is equivalent to a diffusion process [55,56]. In the present model, there is a discontinuous decline in *x* after every successful migration event, as is shown by equation (2.2). Hence, the diffusion approximation is accurate if the magnitude of each drop is small—that is the impact of invasion *ɛ* is not large.

The change that occurs in a short time interval of length *Δ**t* is given as follows:
3.5

Then, the mean rate of increase in *x*, , and the rate of variance generation, , are given by
3.6aand
3.6b

The probability density function of genetic distance is defined by
3.7awhich satisfies the following forward equation (or Fokker–Planck equation):
3.7bThe stationary distribution at equilibrium can be derived from , which is
3.8where *C* is a normalization constant that makes hold. Here *a* is a constant given by

Figure 2 shows the distribution predicted by equation (3.8) and the histogram of genetic distance from the direct computer simulation using a population-based model. The two distributions have the same mean genetic distance but differ in variance. If the impact of migration event *ɛ* is not very large (say less than 0.1), the diffusion approximation is quite accurate. However, when *ɛ* is larger, the distribution from the diffusion approximation becomes deformed and has a fat tail to the right.

## 4. Average time to speciation

Speciation between two subpopulations occurs when the genetic distance between them reaches the threshold *x _{c}*. If the genetic distance

*x*is smaller than the threshold, immigrants can mate with residents without any disadvantage. On the other hand, if genetic distance

*x*exceeds the threshold, individuals from different subpopulations cannot mate. The speciation threshold

*x*may differ between species; moreover, here we assumed that all of the neutral loci contributed equally to reproductive isolation. The initial value of genetic distance is

_{c}*x*

_{0}and the probability that the genetic distance from

*x*

_{0}reaches the threshold

*x*within time

_{c}*t*is given by . The initial and boundary conditions are and (

*t*> 0), respectively. These conditions satisfy the backward equation 4.1Solving the above equation (see appendix B for the derivation), we finally obtain the average time to speciation

*T*, as follows: 4.2awhere 4.2band is an upper incomplete gamma function .

The deterministic approximation predicts that speciation will never occur if the equilibrium value *x** given by equation (3.3) is smaller than the threshold *x _{c}*. In reality, owing to stochastic fluctuations, the genetic distance becomes higher than

*x** by chance and might reach

*x*within a finite period of time. The diffusion approximation can take this into consideration. It predicts that the mean time to speciation is finite even when

_{c}*x*>

_{c}*x**.

Figure 3 shows the average time to speciation *T* from the initial genetic distance under four different thresholds. The four groups of curves represent the different threshold values *x _{c}*. Within each group, curves show the predictions from the diffusion approximation and from the deterministic approximation. Solid symbols indicate the results from direct computer simulations using a population-based model, where different symbols indicate the results for different threshold levels. When the speciation threshold was below the equilibrium level (

*x*<

_{c}*x**), all of the values were similar, which suggests that the deterministic model was simple but provided relatively accurate predictions. However, when the speciation threshold was above or equal to the equilibrium level (

*x*≥

_{c}*x**), the deterministic model predicted that speciation was impossible, as is shown by equation (3.4

*a*,

*b*), but the direct computer simulation (filled symbols) provided a threshold at a finite time, which was not very different from the prediction using the diffusion approximation (solid curve).

In contrast to these results, the individual-based model produced a shorter estimated time to speciation in all cases, as is indicated by the open symbols that are located below the others. Intuitively, the individual-based model shows greater stochastic fluctuations in genetic distance, which reduces the time to reach the threshold relative to the population-based model that showed weaker stochasticity.

Figure 4 shows the average time to speciation, where the horizontal axis is the threshold level for speciation *x _{c}*. The average waiting time increased with the threshold level, but the results of the computer simulation of the population-based model were close to the predictions of the diffusion approximation. The deterministic model was able to predict divergence time at equilibrium.

## 5. Migration between two populations affects their genetic distances with other populations

Migration between two populations can affect the genetic distance between one of them and another population. This effect is quite important when considering the effect of migration when there are more than two populations.

Suppose that there are three islands (*A*, *B* and *C*), all of which are connected with each other equally. There exist six directions of migration and three genetic distances *x _{ij}* provided that , and

*x*=

_{ij}*x*. For this model, the reduction in genetic distance because of migration from island A to B is described by the two following formulae: 5.1aand 5.1bEquation (5.1

_{ji}*a*) is the same as equation (2.2), which means that there is a direct effect of migration and it invariably reduces genetic distance. However, equation (5.1

*b*) is peculiar to situations when there are three or more islands. This equation represents indirect effects on genetic distance that are connected with the migrating population; the effect is caused by changes in the genetic composition of the migrating population. In this case, after migration, the population on the island

*B*now shares a fraction (

*ɛ*) of alleles with the residents of

*A*on average. Hence, the genetic distance caused by this event is calculated by comparing the relationship between

*B*and

*C*. This effect sometimes causes a reduction in genetic distance and at other times it causes an increase. This rule can be applied to situations with different numbers of islands and different structures among them because we can decompose all of the island structures into closed relationships among two or three (a triangle) islands.

### 5.1. Reduction in variance in genetic distance and increased waiting time

Here, we consider a situation in which there are more than two populations that are connected to each other by migration events. It is clear that the average waiting time is strongly dependent on the value of genetic distance and the threshold for speciation. Therefore, using a population-based model we simulated dynamics in genetic distance for different numbers and structures of subpopulations. We considered Wright's island model [57] in which there are *n* populations, each of which is connected by migration with all of the other populations. The graph of connections among the populations forms a ‘complete graph’ with equal migration rates and migration impacts. Figure 5*a* shows the distribution of genetic distance at equilibrium with the number of islands by sampling a random pair of populations and then calculating their genetic distances. As the number of islands increases, the average genetic distance between islands remains unchanged, but the variance between pairs of islands decreases. This result might be caused by cancellation between direct and indirect effects on genetic distance, which intensifies when the number of islands increases.

The variance in a stationary distribution of genetic distance is important when considering the average waiting time to speciation, when the threshold is higher than the equilibrium value. In such situations, we can intuitively expect a large variance to lead to a high potential for speciation. This pattern was shown in the distribution of average waiting times to first speciation with an increasing number of islands under the assumption that all of the islands had the same genetic distance at equilibrium (figure 5). Later we will discuss the relationship between waiting time to speciation and the number of islands in more detail. A stationary distribution of genetic distance that has low variance requires a longer period of time to undergo speciation when the threshold is larger than the equilibrium value. Note that the distribution of waiting times has a heavy tail towards longer times because some populations seem simply unable to reach the threshold through chance migration events. This pattern is not observed when the threshold is lower than the equilibrium value.

## 6. Discussion

In this paper, we present a simple model of allopatric speciation in which the possibility of sexual reproduction is controlled by the number of loci. Suppose that two individuals have different alleles at *x* loci that control incompatibility. They cannot engage in successful sexual reproduction if *x* exceeds a threshold value *x _{c}*, but they mate and breed without problems if

*x*is smaller than

*x*. Assuming that incompatibility-controlling loci are monomorphic within populations, we can define the genetic distance between populations. Genetic distance increases with time through the accumulation of mutations, but decreases when migration from one population to the other occurs. When successful migration events occur very infrequently but many individuals immigrate during each successful migration event, the stochasticity of the migration events causes genetic distance to fluctuate and the time until speciation can be obtained from stochastic process calculations.

_{c}### 6.1. Dynamics of genetic distance

The distance between two populations tends to increase rapidly when the distance is short, but the rate of increase tends to slow and eventually saturate. There are two reasons for this saturation. First, there is a maximum number of loci *l* and the speed of the increase in genetic distance between two populations slows down as the number of loci with alleles that are common to both populations becomes smaller. Second, successful migrations should cause a drop in genetic distance with a magnitude that is proportional to the genetic distance immediately before the migration event. The process of saturation can even be represented by the simplest deterministic model, equation (3.2). The equilibrium level *x** is given by equation (3.3).

If the threshold for speciation is below the equilibrium, speciation occurs rather quickly and the waiting time can be accurately predicted by the deterministic model, equation (3.4) (figure 3). By contrast, if the threshold for speciation is above the equilibrium level *x**, then the time to speciation is critically dependent on the magnitude of the fluctuations. In fact, in the absence of fluctuations, speciation will not occur. By contrast, in the stochastic model, which we termed a ‘population-based model’, owing to stochasticity in the timing of migration, genetic distance fluctuates after it reaches the equilibrium and it can cross the speciation threshold after a finite period of time. The mean time to speciation in this stochastic model can be accurately calculated using a diffusion approximation, as shown in figures 3 and 4.

In the model that considered population processes, which we termed an ‘individual-based model’, in figures 1, 3 and 4, genetic distance fluctuated with a greater variance than in the corresponding population-based model. This was caused by stochasticity in the mutation accumulation process and stochasticity in the fixation fraction after migration events, which were neglected in the simple population-based model.

As figures 3 and 4 show, the individual-based model resulted in shorter times to speciation than the population-based model with corresponding parameters.

We believe that the population-based model is worthy of study by itself. First, even if the individual-based model is a true one, the corresponding population-based model can be a reasonably accurate approximation when the number of loci controlling incompatibility is sufficiently large owing to the law of large numbers. Traditionally most theoretical models of speciation, such as Dobzhansky's and Nei *et al.*'s models, assumed that the incompatibility is controlled by a small number of loci (such as one or two); in this paper, we study the opposite case with a large number of loci controlling incompatibility. Second, a useful approach of analysing a given model's property is to first study a simplified version, and then to compare the original complex model with the simplified model. In a similar manner, we do believe that it is worth studying the population-based model and its property carefully, and then study the individual-based model which includes additional stochasticity by comparing their difference and similarity. Third, the population-based model represents an idea that different populations while separated accumulate the difference with time at a constant rate. Accumulation of neutral mutations is one of the candidate processes causing this increase in the distance, but it is also possible that different populations may change their genetic structure by experiencing different environment, which is not covered by the individual-based model.

### 6.2. Three or more populations

We also studied a case in which more than two populations were connected by migration. Figure 5*a* shows the stationary distribution of genetic distance *x*. We can see that the mean distance remains unchanged as the number of populations increases. However, the variance of genetic distance becomes smaller. If we think of a pair of populations, each population is connected with many others through migration, and migration events that occur through these additional bonds would have impacts that could increase or decrease the genetic distance between the focal pair of populations. Figure 5*b* shows the earliest time for one of the bonds to reach the threshold *x _{c}*. As there are multiple bonds, the graph actually includes

*n*(

*n*− 1)/2 bonds in total and when one of these many bonds reaches the threshold

*x*, it is counted as the event of interest. If the genetic distances for the different bonds are independent of each other and they have the same probability distribution, then the average time to the first event in any one bond that exceeds the threshold should become shorter as the number of populations increases. In fact, the average time increases over time. If we focus on any one of the bonds and estimate the time until it reaches the threshold, the estimate should be much longer as the number of populations increases. This is owing to the smaller variance in genetic distance, as indicated in figure 5

_{c}*a*.

Figure 5*c,d* shows the waiting time until *n* islands are split into two groups between which no bond has a genetic distance that is shorter than the threshold. The waiting time clearly increases with the number of populations.

From these simulations, increases in the number of populations would make it more difficult to complete speciation. This suggests a relationship between biodiversity and the spatial structure of the habitats, which may give testable hypotheses.

### 6.3. Optimal migration rate for species creation

Migration events reduced the genetic distance between populations, and hence the time to speciation increased with the migration rate *m*. By contrast, after genetic distance reaches the threshold, we must wait for the next successful colonization event to observe the creation of a novel species. The waiting time to the next colonization event is longer when the migration rate is smaller. We may ask, ‘What is the possible rate of species creation in multiple populations between which there are recurrent migration events and where genetic divergence between two populations occurs through the accumulation of mutations in incompatibility loci?’ We suspect that there is an intermediate optimal migration rate between two islands that achieves the maximum rate of species creation.

In figure 6, the dashed line shows the mean time interval for migration events between two islands and the dashed-and-dotted line indicates the waiting time to first speciation starting from two populations that have a genetic distance of zero. The horizontal axis is the migration rate *m*. The bars indicate the number of species that are created between two islands after 10 000 generations have passed (see appendix C for details of the simulation). The initial condition for the two islands is that there is only one species that is common to both immediately after geographical isolation. We can clearly see that the rate of creation of novel species by this mechanism occurs most frequently around an intermediate rate of migration *m.* If the migration rate is too low, the waiting time until the next migration event that results in the two islands having populations of the same species is longer. By contrast, if the migration rate is too fast, genetic divergence slows down and it takes longer for the genetic distance to reach the speciation threshold. The optimal rate of migration that achieves the maximum chance of species creation was close to the rate that was predicted by the shortest sum of waiting time to migration and speciation.

Incompatibility may be achieved by postzygotic mechanisms (e.g. lower viability of hybrids), by prezygotic mechanisms (e.g. mate choice) or both [58]. Most models of allopatric speciation that have calculated waiting time to speciation can be interpreted in any of these situations, although they explain the results in terms of the lower viability of hybrids [16]. Our discussion of the rate of species creation (e.g. figure 6) is valid for prezygotic mechanisms. However, if a prezygotic mechanism is not effective and a postzygotic mechanism is important, then a newly arrived population would suffer severe reproductive interference by the resident species, which could result in the extinction of the invader. Yamaguchi & Iwasa [54] analysed quick evolution in prezygotic isolation (the evolution of female mate choice to avoid mating with males from different species) and discussed the possibility of extinction caused by reproductive interference. This idea is supported by the observation that the degree of prezygotic isolation is stronger between sympatric species than between allopatric species [59,60].

In this paper, we considered a case in which incompatibility was controlled by multiple loci and could be treated as a quantitative trait. By contrast, most theoretical studies of allopatric speciation have assumed a small number of incompatibility loci [16,18,24,25], except for Gavrilets [26]. Coyne & Orr [61] reviewed studies of the number of genes that controls incompatibility (or isolation mechanism) and concluded that the number of incompatibility-controlling loci ranges from few to many.

Although most theoretical studies of allopatric speciation have assumed that no migration occurs between populations, Gavrilets [26] studied a case that involved some migration as parapatric speciation without local adaptation. He assumed that migration caused a stochastic decrease in the genetic distance between populations. As he adopted a birth-and-death process, genetic distance could decrease in only single steps. However, migrants from one population and residents in another should differ in a number of loci, and it is possible for multiple loci to fix in a population, which would result in a population with substitutions at multiple loci.

### 6.4. Future work

As shown in figure 1, the population-based model had a smaller variance than the individual-based model. There are two reasons for this difference. First, the mutation accumulation process was treated as a deterministic increase in genetic distance. However, the time points for replacements should occur as a Poisson point process; therefore there is variance in terms of the number of replacements that occurred within a fixed length of time. Second, after a successful migration event that is followed by genetic mixing between migrants and residents, there is stochasticity concerning which of the two alleles will become fixed. If different segregating loci evolve completely independently, the number of different loci after migration *x*_{after} is a stochastic variable that follows a binomial distribution with mean (1 − *ɛ*)*x*_{before} and variance (1 − *ɛ*)*ɛx*_{before}. The maximum recombination rate per generation is one-half, and there might be some correlation between different loci concerning which one will eventually be fixed. This could further increase the variance after migration events. This represents an interesting future theoretical problem.

## Funding statement

This work was performed under the support of the Environment Research and Technology Development Fund (S9) of the Ministry of the Environment, Japan; and also supported by a Grant-in-Aid for Scientific Research to Y.I.

## Appendix A

Consider two populations, each of which is composed of *N* individuals of a species that is sexual and haploid. Each individual has *l* loci that are not linked (i.e. located on different chromosomes). During sexual reproduction, two individuals are paired and each offspring has a genome that includes one copy of an allele from each of the two parents at the corresponding locus. During sexual reproduction, there is also a chance of mutation. The mutation rate per locus per generation is *u*. When a mutation occurs in a locus, it is a novel mutant that is different from any other alleles that have been created before.

In the initial population, all of the individuals in the two subpopulations have the same allele for all loci. The total mutation rate per individual per generation is *k* = *lu*, and the total number of mutants in a population per generation is *Nlu*. As mutants are neutral within a population, the fixation probability is 1/*N*.

We define *x ^{αβ}*, the genetic distance between individuals

*α*and

*β*, as the number of loci that have different alleles between these two haploid individuals. It should satisfy .

If each population is close to monomorphic, we can define the genetic distance between these populations in terms of the average value of *x ^{αβ}*, where

*α*and

*β*are randomly chosen from each population. The average value of

*x*when

^{αβ}*α*and

*β*are chosen from the same population is the within-population distance, which indicates the degree of polymorphism.

During a migration event, *N’* individuals from population *i* successfully arrive in population *j*. These immigrants are randomly selected from population *i* and participate in mating in population *j*. The selected immigrants are removed from population *i*. We adopted a Wright–Fisher model [57,62] for random sampling in each generation. The number of offspring was kept at a constant *N* during every generation.

## Appendix B

We define the average waiting time to speciation:
B1Then, from equation (4.1) with the initial condition *u*(*x*, 0) = 0 (0 ≤ *x* < *x _{c}*) and boundary condition

*u*(

*x*,

_{c}*t*) = 1 (

*t*> 0), we can derive the following ordinary differential equation: B2Here, we use the replacement for equation (B 2) and the equation becomes B3In addition, if we write and multiply both sides by , we obtain B4The solution with , which comes from the assumption that the average waiting time to speciation

*T*does not change with slight increases in initial genetic distance from 0, is B5Next, we substitute and into equation (B 5). In addition, let us replace the lower limit of the outer integration with 0 =

*δ*to allow us to obtain B6

Using the variable transformation , the domain of integration is changed. Hence, equation (B 6) becomes B7

Taking the limit when *δ →* 0, equation (B 7) becomes equation (5.1) in the text.

## Appendix C

Consider two populations of a haploid and sexual species with non-overlapping generations. These populations live on two islands or island-like habitats. Initially, both islands include the same species and the genetic distance between them is *x*_{0} = 0. The genetic distance increases under the rules of equations (2.1) and (2.2). After some number of generations passes, the genetic distance exceeds the threshold (*x* > *x _{c}*). Then, we treat the two populations as different species. Subsequently, migrants from a population on one island successfully arrive on the other island. As the latter island does not contain the same species as the migrant, the migrants do not mix with the residents. The migrants become established as a new population. Then, we redefine the genetic distance between the two islands as the difference between the new population on the second island and the original population on the first island and the same process proceeds as before. We assume that there is no extinction because after the genetic distance exceeds the threshold, immigrants that move from one island to the other quickly evolve mating isolation mechanisms to avoid reproductive interference [54]. Hence, we can count the number of species that are created by this mechanism whenever the distance between two populations reaches the threshold.

## Footnotes

One contribution of 11 to a Theme Issue ‘Modelling biological evolution: recent progress, current challenges and future direction’.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.