Abstract
The paper reviews two computing models by DNA selfassembly whose proof of principal have recently been experimentally confirmed. The first model incorporates DNA nanodevices and triple crossover DNA molecules to algorithmically arrange nonDNA species. This is achieved by simulating a finitestate automaton with output where golden nanoparticles are assembled to readout the result. In the second model, a complex DNA molecule representing a graph emerges as a solution of a computational problem. This supports the idea that in molecular selfassembly computing, it may be necessary to develop the notion of shape processing besides the classical approach through symbol processing.
1. Algorithmic selfassembly
In the last few decades, research performed in biology, chemistry and physics has resulted in an explosion of new findings about molecular interactions. These findings often reveal transfer of information at a molecular level resulting in proliferation of the science of computing within established, on a first glance unrelated, scientific fields. The notion of ‘computing’—up until recently a theoretical concept—acquires a new meaning within these intrinsically experimental disciplines. The perception of ‘computation’ has evolved to mean an environment modification through repetitive applications of a finite set of simple local rules. At a molecular scale, these local interactions are often covalent, hydrogen or ionic bonds. Computing models based on molecular interactions have been developed theoretically since Head's [1] introduction of splicing systems in the mid 1980s, and experimentally since Adleman's [2] first experiment in 1994 solving a computational problem solely by DNA molecules. Today, we have annual scientific conferences on biomolecular computing [3] and unconventional computing [4], where many new models are being introduced and developed.
Molecular selfassembly appears as the central principle in many models of biomolecular computing. Selfassembly is currently an imprecisely defined notion understood to be a process in which individual elements selforganize in a larger more complex structure. This selforganization ranges from cosmic levels (such as galactic formations) to molecular and atomic levels (such as crystal formations or protein folding). Much of designed biomolecular selfassembly is based on DNA selfassembly, although other examples of selfassembly—such as those involving proteins—exist in nature. The inherently informational character and its predictable doublehelical geometry makes DNA an attractive molecule in applications that entail targeted assembly. DNAbased bottomup assembly, pioneered roughly 30 years ago [5], has been explored by at least 60 laboratories worldwide in recent years. Within the past 15 years, much of these explorations have been in search of understanding molecular information processing through algorithmic, programmable selfassembled structures.
Synthetic DNA molecules have been designed and shown to assemble into branched species [6,7], and more complex, structurally robust, species that entail the lateral fusion of DNA double helices [8], such as DNA double crossover (DX) molecules [9], triple crossover (TX) molecules [10] or paranemic crossover (PX) molecules. DX and TX molecules have been used as tiles and building blocks for large nanoscale arrays [11,12]. Winfree [11] introduced the tile assembly model and showed that twodimensional selfassembled arrays made of DX or TX DNA tiles can simulate the dynamics of a bounded onedimensional cellular automaton and so are capable of potentially performing computations as a universal Turing machine. Several successful experiments have confirmed computation by arraylike DNA selfassembly, such as the binary addition (simulation of XOR) using TX molecules (tiles) [13], Sierpinski triangle assembly [14,15] and a binary counter [16] by DX molecules.
In addition to the twodimensional arrays, threedimensional structures such as a cube [17], a truncated octahedron [18], tetrahedron [19] and arbitrary graphs [20–22] have been constructed from DNA duplex and junction molecules. In addition to structural targets, a variety of topological figures have been constructed using the methods of structural DNA nanotechnology [23]. The construction of approximately 100 × 100 nm DNA nanostructures was significantly facilitated by Rothemund's [24] DNA origami where a standard singlestranded vector plasmid is used to outline a shape, whereas short DNA strands connect portions of the plasmid, fixing its shape in a rigid form. Since its appearance, DNA origami has been used as a seed for arranging DX tiles in an array [16], nanotube constructions [25] and most recently as templates for large DNA based tiles arranged in a twodimensional crystalline array [26].
‘DNA fuel’ strands introduced by Yurke [27] provided devices whose activity is controlled isothermally by DNA strands and hence were sequencedependent [27,28]. Also based on strand displacement, structures that can perform simple ‘walking’ on an arranged platform have been reported [29–32]. The strand displacement techniques were recently used for a large number of enzymefree logic gates [33,34] and also for simulating a neural network type of computation [35].
This review describes two realizations of DNA selfassembly computation. Both are recently reported experiments [22,36] depicting two distinct approaches. Section 2 summarizes the results reported by Chakraborty et al. [36] on molecular simulation of a finitestate automaton that divides by 3. This molecular computing model incorporated strand displacement devices to algorithmically arrange TX molecules, and moreover, the result of the computation was an algorithmic arrangement of nonDNA (gold) nanoparticles. Although arrangements of gold nanoparticles in DNAscaffolded periodic arrays have been reported earlier [37–39], this is, to our knowledge, the first algorithmic arrangement of nonDNA particles as a result of computation by selfassembly.
Section 3 describes an experiment where an assembly of a complex DNA nanoobject representing a graph structure embedded in space represents a solution of a nontrivial graph theoretic problem [22]. The threecolourability of a graph with six vertices was demonstrated by assembling a DNA molecule with connectivity of the graph structure itself. Moreover, the general method used in the construction distinguishes by: (i) requiring only four laboratory steps for a solution of an NPcomplete problem regardless of the size of the problem, and (ii) a graphlike structure survives an enzymatic system if and only if the graph is threecolourable. These results imply that natural processes and environment in which threedimensional molecular structures with various functionality can emerge perform ‘structural’ computation. Hence, information processing and computation through molecular selfassembly is a venue where ‘shape processing’ is a viable alternative to classical ‘symbolbased’ computing.
2. DNA selfassembly of an automaton
This section summarizes the experimental results given by Chakraborty et al. [36]. A more general discussion about transducergenerated arrays and potential arrangements of robotic arms within an array capable of simultaneous movements is described in Dolzhenko et al. [40].
2.1. Formal description
A transducer, also known as a finitestate automaton with output, is a formal system consisting of an input/output alphabet, a transition function, one designated initial state and a finite set of states, some of which are designated terminal. More formally, a finitestate automaton with output or a transducer is a fivetuple where Σ is a finite alphabet, Q is a finite set of states, q_{0} is an initial state (), F() is a set of final (terminal) states and is the set of transitions. From a given state and an input symbol, the transition function determines the next state, and an output symbol.
We associate a directed labelled graph to a transducer: the set of vertices is the set of states Q and the directed edges are transitions in δ, such that an edge starts at q and terminates at q′, has an input label a and an output label a′. The input and the output labels of the edges are naturally extended to input and output labels of paths in the transducer.
We say that a word w is accepted by a transducer if there is a path with input label w which starts at q_{0}, and terminates with a state in F. The path p in this case is called an accepting path for w. The output label of p is said to be an output of τ.
Transducers can be simulated by assembly of Wang tiles [41,42]. A finite set of distinct unit squares with coloured edges is called a set of Wang prototiles. Each prototile has an arbitrarily large number of copies called tiles. A tile ξ with left edge coloured l, bottom edge coloured b, top edge coloured t and right edge coloured r is denoted with . No rotation or reflexion of the tiles is allowed. Two tiles and can be placed next to each other, ξ to the left of ξ′ if and only if , and ξ′ on top of ξ if and only if .
To each transducer τ we associate a collection T_{τ} of Wang tiles consisting of input, output and computational tiles. Given a transducer τ, to each transition of the form we associate a prototile in as presented in figure 1. For each symbol a ∈ Σ, the collection T_{τ} contains an input prototile and an output prototile . The colours , called borders, represent one of the left, bottom, right and top border, respectively, and are connect colours used to connect input and output tiles. Additional auxiliary tiles indicating start of the input, end of the input, start of computation and end of computation are also included in but are not an essential part of the computation.
With these sets of input and output prototiles, every computation of τ can be modelled as a tiled rectangle surrounded by boundary colours. In particular, one run of the transducer τ reading an input of length n and producing an output is obtained with a tiled n × 3 rectangle (n > 2) such that the sides of the rectangle are coloured with boundary colours. As an example, consider the transducer τ presented in figure 2a. It has initial and terminal state s_{0} with input/output alphabet {0,1}. It accepts the set of binary strings that represent numbers divisible by 3. The states s_{0}, s_{1}, s_{2} represent the possible remainder (0, 1, 2, respectively) of the division of the input string with 3. The output is the result of the division of the binary string with 3. On input 10101 (21 in decimal), the transducer gives the output 00111 (7 in decimal). The 7 × 3 rectangle of assembled Wang tiles simulating this division is illustrated in figure 2b. The bottom row of tiles consists of the input tiles, the middle row corresponds to the computation by the transducer and the top row consists of the output tiles. It is not difficult to see that the tiles can be modified to simulate composition and iteration of transducers which is known to have computational power of the universal Turing machine [42].
2.2. Molecular implementation
The input of the transducer is established by a sequence of robust twostates [28] DNA devices attached to a six domainflat (6DF) motif tiles encoding input symbols at their ends. The sequencedependent twostate nanomechanical device called PXJX_{2} whose machine cycle is shown in figure 3a, is guided by the addition of set strands to the solution that forms its environment. The two states of the device differ by rotation of a halfturn established by the green and magenta set strands. The set strands have short, unpaired extensions on them such that the state of the device is changed by binding the full biotintailed complements of green or magenta strands, removing them from the solution by magnetic streptavidin beads, and then adding the other strands. The sequencedriven nature of the device means that many different devices can be constructed, each of which is individually addressable [43,44]. Figure 3 shows a PXJX_{2} device connected to two 6DF motifs. The opposite ends of the 6DF motifs contain sticky ends encoding symbols 0 and 1, respectively. The rotational character of the device allows all possible pairs of bits: 00, 01, 10, 11 to be used as input.
The computational Wang tiles representing transitions of the transducer are simulated with TX DNA molecules. It has been shown that such TX molecules can selfassemble in an array [10], and they have been linearly assembled such that a cumulative XOR operation has been executed [13].
Each prototile may appear in large number of molecular copies as separately annealed DNA TX molecules. No rotation of the tiles is assured by the coding of the sticky ends. The tile representing one transition of the transducer is schematically shown in figure 4a, whereas an example of a TX DNA molecule is presented in figure 4b with 3′ends indicated with arrowheads. The ‘connector’ is a sticky part that does not contain any information but is necessary for the correct assembly of the tiles. These TX tiles have the middle duplex longer than the other two to fit on top of the input (figure 5).
A chelator DNA DX motif to which a gold nanoparticle has been attached is used as an output tile. The chelator tile fits on the top of the TX computational tile; there are two different chelator tiles corresponding to 0 and 1 output domains. The chelator that binds to 0 contains a 5 nm gold nanoparticle, whereas the chelator that binds to 1 contains a 10 nm gold nanoparticle. Thus, the output can be read in a transmission electron microscopy [36]. This layer of chelator tiles corresponds to the top row of the 7 × 3 assembled rectangle in figure 2b.
A schematic of the computational setup corresponding to the Wang tiles in figure 2b is illustrated schematically in figure 5. In figure 5, a base row of PXJX_{2} devices setting up the input of 21 binary is depicted (as in the example shown in figure 2); the middle row of computational tiles is represented by DNA TX molecules encoding the transducer transitions; and the top row of Wang tiles containing the output of the computation is obtained by the sequence of chelator DNA DX motifs with attached gold nanoparticles. Different input sequences are obtained by setting the PXJX_{2} devices in different states. Iteration and composition of transducers can be obtained if instead of the chelator tiles, we apply another layer of TX computational tiles which computes on the obtained output as a new input. The gaps between the top duplexes of two consecutive TX tiles fit precisely the bottom duplex of any TX tile. After several layers of TX tiles are applied, the chelator DNA DX tiles with the attached gold nanoparticles can be added.
3. Computing by selfassembling shapes
This section summarizes results of Wu et al. [22]. As mentioned in §1, the review reports assembly of a complex DNA molecule whose structure encodes a solution to a computational problem. The problem addressed by this experiment is a wellknown NPcomplete problem, the threecolourability of a graph. A DNA graph structure is selfassembled by junction building blocks if and only if the given instance of the problem has a solution.
Let G be a graph with vertices V and edges E. The graph G is threecolourable if there is a function such that whenever two vertices v,w ∈ V are adjacent (connected by an edge), . The function f is a colouring of the vertices of G by three colours. So, we say that G is threecolourable if it is possible to colour each vertex such that no two adjacent vertices are coloured with the same colour. The ncolourability is defined similarly.
Figure 6a shows an example of a graph with a possible threecolouring. The illustrated solution was obtained experimentally by construction of a DNA molecule whose helical axes conform to the shape of the graph.
A general procedure for solving this and other graphtheoretic problems is described in Jonoska et al. [20], and the theoretical computing model based on this method with introduction of certain complexity classes is described in Jonoska & McColm [45]. In this approach, each vertex of degree k in a given graph G is represented with a karmed branched junction molecule. In the case of k = 1, the vertex is represented by a DNA hairpin. The 5′ends of each arm of every karmed junction molecule end with singlestranded extensions. These extensions consist of three parts (labelled a, b, c) and are used to uniquely identify each edge that connects two adjacent vertices. The first and the third part, parts a and c are specific encodings for the edge that is represented by the given arm of the molecule, whereas the middle portion, the part b, encodes the colour of the vertex. The middle part of the encoding is the same for all arms of the vertex molecule fixing the colour of the vertex. For each vertex, three types of molecules are needed, each corresponding to one of the three possible colours of the vertex. They are identical except for the middle portion of the singlestranded extensions of the arms identifying the colour of the vertex.
For the example presented in figure 6a, one two, four three and one fourarmed branched junction molecule are needed to identify the graph (figure 6b).
If vertices v and w are connected by an edge e, the a and c DNA sequence of the arm of the vbuilding block representing e is complementary to the corresponding a and c DNA sequence of the wbuilding block arm representing e. This is illustrated in the arms of the three and fourarm molecules corresponding to the edge connecting vertex four with vertex six in figure 6b. Therefore, when the two building blocks are in a solution, parts of the vbuilding block will anneal with the complementary parts of the wbuilding block. If in addition, the middle portions b in v and wbuilding blocks are also complementary, they would anneal and would indicate that the two vertices v and w are coloured with the same colour. Each colour sequence contains a cleavage site for a restriction enzyme, such that when two arms of identically coloured vertex building blocks are hybridized the restriction site is exposed and can be cut by an appropriate enzyme. But if the vertex building blocks encode distinctly coloured vertices, then the mismatched restriction sites form a ‘bubble’ that cannot be cut by a restriction enzyme (figure 6c). One restriction enzyme is used for each colour, hence for the threecolourability, three enzymes are needed. In the experiment reported by Wu et al. [22], the enzymes used to cleave the monochromatic edges were HindIII representing red, BamHI representing blue and EcoRI representing green.
Given the vertex building blocks of a graph, the solution of the threecolourability problem then needs the following four steps, regardless of the size of the graph:

— anneal: combine all vertex building blocks molecules in a single test tube and allow the complementary ends to hybridize,

— ligate: join the open nicks with a ligase,

— cleave: apply three restriction enzymes to destroy the molecular structures containing identically coloured vertices joined by an edge, and

— extract: determine whether any DNA graph structure of the size of the input remains; the graph is threecolourable if the structure is detected, otherwise it is not.
Figure 7a shows a possible formation of a DNA graph structure. The vertex colours are indicated and if all nicks are sealed at the location of the arrowheads, there is a singlestranded circular molecule that traverses the whole graph structure. Such a molecule forms a knot in threedimensional space. Preliminary experiments detecting the graph structure identified two topoisomers [20], proving that the graph structure may conform to two nonidentical knots. In the computational experiment [22], only some of the nicks were ligated such that a linear singlestranded molecule traversing all edges at least once, a reporter strand, was formed and the graph structure was easily detectable. The reporter strand is illustrated in figure 7b indicated in purple. Although it is not possible for every graph to be represented as a single circular molecule [46], it has been shown that the vertex building blocks can be designed such that a reporter strand traversing every edge of a given graph at least once exists for every graph [47].
The current techniques limit the abovedescribed method to smaller size graphs (graphs with about 13 vertices and 28 edges [22]), but this experimental solution of the threecolourability shows that, in principle, a complex molecular shape, including mismatched pairs, results from a computation and this complex molecule represents an answer of the computation. Hence, the complex molecular structures, such as those of proteins, resulting from natural selfassembly that undergo enzymatic selection processes can be seen as products of computations.
4. Concluding remarks
There are many models for computing by DNA selfassembly that are not covered in this review. Some of them are quite stimulating, such as the autonomous simulation of finitestate automata by Benenson [48] where a finitestate automaton is simulated by applying a restriction enzyme and a ligase, or Stojanovic's DNAzymebased automaton that plays tic–tac–toe [49], or the algorithmic tile assembly model introduced by Winfree [11] where DNA tiles simulate the dynamics of a cellular automaton. Some other self assembly models such as Rothemund's DNA origami [24] turned out to advance designs in DNA nanotechnology enabling new shapes to be reported on a daily basis. Yurke's [27] ‘fuel strands’ and isothermal strand displacementbased devices are now taken to a new level by constructing a large number of logic gates that can potentially operate simultaneously [34], and to small walkers that can carry a molecular cargo [29]. Each one of these aspects of algorithmic DNA selfassembly has the potential to develop and possibly drastically influence new material designs, as well as to provide platforms for further studies of molecular interactions. Unfortunately, at the moment, there is no unified theoretical approach that describes all of the different selfassembly models. This task remains as a difficult and challenging problem.
Acknowledgements
We acknowledge support by grant no. CCF1117254 from the NSF to both N.J. and N.C.S., grant no. DMS0900671 to N.J., and the following grants to N.C.S: GM29554 from NIGMS, grant no. CTS0608889 from the NSF, grant no. W911FF08C0057 from ARO, via Pegasus Corporation, MURI W911NF0710439 from ARO, grant nos. N000140910181 and N000140911118 from ONR and a grant from the W.M. Keck Foundation.
Footnotes
One contribution to a Theme Issue ‘Computability and the Turing Centenary’.
 Received November 17, 2011.
 Accepted January 10, 2012.
 This journal is © 2012 The Royal Society