The DNA and Natural Algorithms Group:

DNA Design Toolbox beta beta beta version... use at your own risk!!!!

In our laboratory, when we wish to implement some scheme for biomolecular computing -- be it self-assembly or transcriptional networks -- we must design DNA molecules that can be expected to carry out the task reliably. Many factors can come into play for specific projects, so we have not found any uniformly satisfying approach to the design problem. The purpose of this page, therefore, is give examples of our experience with DNA design ,and to make our code available to those who may wish to use it or to compare it to their own approaches. Therefore, everything on this page is offered "as is", with no guarantee that it is useful, or even correct. It is certainly not stable: the code is evolving all the time.

Please see our Supplementary Material page for examples of DNA design code particular to specific papers.

(Note: Many researchers are interested in designing DNA sequences, for uses such as effective PCR primers, specific probes for DNA arrays, construction of plasmids, re-coding protein sequences, and other biological applications. We do not do such things in this laboratory, and consequently, you should not expect our software to be useful for those applications -- although you may find something of interest for DNA arrays.)

We have two related collections of routines. "DNAdesign" is a suite of MATLAB procedures for manipulating and analyzing DNA sequences, including calculations of extinction coefficients, nearest-neighbor binding energy predictions, dot plots, searches for Watson-Crick complementary subsequences, and the like. It includes a Monte-Carlo optimizer, allowing user-defined scoring functions. "SpuriousDesign" developed from the realization that MATLAB is too slow for optimizing DNA sequences; consequently, key routines were translated into efficient C. Both of these programs make use of (or require) the Vienna RNA package, by Ivo Hofacker et al.

Often, however, we wish to include other factors in our design that are specific to some particular experiments -- and it is easier to code up special purpose design routines, or do the design by hand, alas.

You may also wish to compare to our published work on DNA design (in collaboration with Niles Pierce's group), which appears on our publications page. That work used software developed in Pierce's group for obtaining sequence designs to meet single-strand secondary structure requirements, and is less general (but more thorough) than what we offer here. However, they are actively developing the software.

Also, John Reif's group at Duke has used our code as the design engine for their GUI interface, which they call TileSoft.

Highlights of the DNAdesign MATLAB toolbox and SpuriousDesign C code

An elementary interface to the ViennaRNA program.
Expressing degenerate sequences using base types RYWSMKBDHVN
Choosing a random DNA sequence: S = randbase(template);
Computing the Watson-Crick complement: Sc = WC(S);
Calculating UV_260 extinction coefficients based on nucleotide parameters and nearest-neighbor parameters, ssODe(S) and ssODeNN(S) respectively.
Computing binding energies for hybridization based on nearest-neighbor parameters, nnHS(S) and oligoHS(S)
Calculate melting curves based on the 2-state model, such as this:

Calculate dot-plots showing regions of complementarity within a strand or strands, such as this:
Suppose we want to design a strand that folds into Mickey-Mouse ears.

We specify the sequence template, Watson-Crick pairing regions (three helix domains), and equality constraints (non in this case). This gets converted into more detailed constraint information, St, wc, and eq. This data is also used by the SpuriousDesign programs.

St = 'N'*ones(1,72);
helices = [1 1 1 72 5; 1 13 1 34 5; 1 39 1 60 5];
[St,wc,eq] = constraints(St, helices, []);

Trying a random sequence that satisfies the base-pairing requirements does not give the desired structure, according to the Vienna RNAfold program:

S = constrain(randbase(St),wc,eq);
DNAfold(S); unix('sed y/U/T/ <rna.ps >fold_S.ps');

Two tries, and no Mickey Mouse ears. How can we improve this? Our first attempt tries to minimize "spurious" interactions -- subsequences that are complementary (or nearly so), but aren't meant to be. This is known as negative design. In the context of DNA design, it was initially conceived of by Nadrian Seeman. Here, we use a variant (called exponentially-weighted sequence symmetry minimization, or expSSM), which tries random modifications of the initial sequence, and accepts those changes which decrease the penalty score. For speed, we use the compiled C program 'spuriousC', from the SpuriousDesign package.

save_spuriousC_files(S,St,wc,eq,'Mmouse');
unix('spuriousC wc=Mmouse.wc tmax=30 score=spurious W_verboten=1');
% cut-and-paste output to MATLAB variable 'bestS'
DNAfold(bestS); unix('sed y/U/T/ <rna.ps >fold_spuriousS.ps');

It sounded good, but maybe it didn't work too well this time. (We'll see that it's useful, later on.) The problem, of course, is that the scoring function was a heuristic -- there was no rigorous connection to a model of DNA folding. So, a better idea is to use a scoring function based on the standard model of secondary structure thermodynamics, as implemented by the Vienna RNA package. Here, we try to minimize the expected number (at equilibrium) of nucleotides that are not correctly base-paired (or unpaired). This is combined positive and negative design. Again, we use Monte-Carlo optimization.

unix('spuriousC wc=Mmouse.wc tmax=30 score=struct W_verboten=1')
% cut-and-paste output to MATLAB variable 'bestS'
DNAfold(bestS); unix('sed y/U/T/ <rna.ps >fold_structS.ps');

Our only serious disappointment now is that Mickey Mouse is upside down. Note that the initial stem, whcih was supposed to be 5 base-pairs in size, turns out to be 6 base-pairs. Apparently, this is required to stabilize the stem. Also note that, in the second example, the lonely C-G pairs in the multiloop are extremely weak. We're not concerned about them.

Now consider a more challenging task: we want to design four strands that fold into a double-crossover molecule.

The idea is basically the same. Specify the strand lengths, any fixed sequence elements you may have, and the Watson-Crick complementary regions. In this case, there are six such regions for each double-crossover molecule.

...
DAOhelixA = [1 6 2 48 8; 3 25 2 40 16; 3 41 4 13 8; 2 1 1 21 8; 2 9 3 24 16; 4 14 3 8 8];
...
[St, wc, eq] = constraints(DAOseq, [DAOhelices; DAOsticky], []);
save_spuriousC_files(S,St,wc,eq,'DAO');
unix('spuriousC template=DAO.St wc=DAO.wc eq=DAO.eq tmax=180 score=spurious W_verboten=1')

here

Unfortunately, there is no well-developed model for multi-strand, psuedo-knotted DNA secondary structure thermodynamics, so how do we know whether this is a good sequence or not? One way is to make the molecules in the lab, run a non-denaturing gel, and see how well they hold together. A less reliable way: look at dot plots and look for potential problems, such as long diagonals that aren't in the target structure. Can you see that the designed sequence is better than a random sequence?

[Mwc,Mbp] = dotplot(bestS,wc,eq,Inf);
subplot(1,2,1); imagesc(Mwc); axis square; title('designed sequence')
subplot(1,2,2); imagesc(Mbp); axis square; title('target')

For more information, see the MATLAB help files in the DNAdesign directory, notes-examples, and notes-spuriousC. Also, "spuriousC --" will give an extensive list of options.

Download compressed tar files of DNAdesign (5/19/2004 version) and DNAdesign (6/18/2004 version) and SpuriousDesign (5/19/2004 version) here.

Note for users of the DNA cluster: these files are already installed in /research/src/DNAdesign and /research/src/SpuriousDesign.

Erik Winfree, 8/25/02, 5/19/04, 6/03/04, 6/18/04