DNA Design Toolbox
beta beta beta version...
In our laboratory, when we wish to implement some scheme
for biomolecular computing -- be it self-assembly or transcriptional networks
-- we must design DNA molecules that can be expected to carry out the task
reliably. Many factors can come into play for specific projects,
so we have not found any uniformly satisfying approach to the design problem.
The purpose of this page, therefore, is give examples of our experience
with DNA design ,and to make our code available to those who may wish to
use it or to compare it to their own approaches. Therefore, everything
on this page is offered "as is", with no guarantee that it is useful, or
even correct. It is certainly not stable: the code is evolving all
the time.
Please see our
Supplementary Material page for examples of DNA design code particular to specific papers.
(Note: Many researchers are interested in designing
DNA sequences, for uses such as effective PCR primers, specific probes
for DNA arrays, construction of plasmids, re-coding protein sequences,
and other biological applications. We do not do such things in this
laboratory, and consequently, you should not expect our software to be
useful for those applications -- although you may find something of interest
for DNA arrays.)
We have two related collections of routines. "DNAdesign"
is a suite of MATLAB procedures for manipulating and analyzing DNA sequences,
including calculations of extinction coefficients, nearest-neighbor binding
energy predictions, dot plots, searches for Watson-Crick complementary
subsequences, and the like. It includes a Monte-Carlo optimizer,
allowing user-defined scoring functions. "SpuriousDesign" developed
from the realization that MATLAB is too slow for optimizing DNA sequences;
consequently, key routines were translated into efficient C. Both
of these programs make use of (or require) the Vienna RNA package, by Ivo
Hofacker et al.
Often, however, we wish to include other factors in our design
that are specific to some particular experiments -- and it is easier to code up
special purpose design routines, or do the design by hand, alas.
You may also wish to compare to our published work on DNA design (in collaboration with Niles Pierce's group),
which appears on our publications page. That work
used software developed in Pierce's group for obtaining sequence designs to meet single-strand secondary
structure requirements, and is less general (but more thorough) than what we offer here.
However, they are actively developing the software.
Also, John Reif's group at Duke has used our code as the
design engine for their GUI interface, which they call
TileSoft.
Highlights of the DNAdesign MATLAB toolbox and SpuriousDesign C code
-
An elementary interface to the ViennaRNA program.
-
Expressing degenerate sequences using base types RYWSMKBDHVN
-
Choosing a random DNA sequence: S = randbase(template);
-
Computing the Watson-Crick complement: Sc = WC(S);
-
Calculating UV_260 extinction coefficients based on nucleotide
parameters and nearest-neighbor parameters, ssODe(S) and ssODeNN(S) respectively.
-
Computing binding energies for hybridization based on nearest-neighbor
parameters, nnHS(S) and oligoHS(S)
-
Calculate melting curves based on the 2-state model,
such as this:
-
Calculate dot-plots showing regions of complementarity within
a strand or strands, such as this:
-
Suppose we want to design a strand that folds into Mickey-Mouse
ears.
-
We specify the sequence template, Watson-Crick pairing regions
(three helix domains), and equality constraints (non in this case).
This gets converted into more detailed constraint information, St, wc,
and eq. This data is also used by the SpuriousDesign programs.
St = 'N'*ones(1,72);
helices = [1 1 1 72 5; 1 13 1 34 5; 1 39 1 60 5];
[St,wc,eq] = constraints(St, helices, []);
|
-
Trying a random sequence that satisfies the base-pairing requirements does
not give the desired structure, according to the Vienna RNAfold program:
S = constrain(randbase(St),wc,eq);
DNAfold(S); unix('sed y/U/T/ <rna.ps >fold_S.ps');
|
-
Two tries, and no Mickey Mouse ears. How can we improve this? Our first attempt tries to minimize "spurious"
interactions -- subsequences that are complementary (or nearly so), but aren't meant to be. This is known as
negative design. In the context of DNA design, it was initially conceived of by Nadrian Seeman. Here, we use
a variant (called exponentially-weighted sequence symmetry minimization, or expSSM), which tries random
modifications of the initial sequence, and accepts those changes which decrease the penalty score. For speed, we use
the compiled C program 'spuriousC', from the SpuriousDesign package.
save_spuriousC_files(S,St,wc,eq,'Mmouse');
unix('spuriousC wc=Mmouse.wc tmax=30 score=spurious W_verboten=1');
% cut-and-paste output to MATLAB variable 'bestS'
DNAfold(bestS); unix('sed y/U/T/ <rna.ps >fold_spuriousS.ps');
|
- It sounded good, but maybe it didn't work too well this time.
(We'll see that it's useful, later on.) The problem, of course, is
that the scoring function was a heuristic -- there was no rigorous
connection to a model of DNA folding. So, a better idea is to use a
scoring function based on the standard model of secondary structure
thermodynamics, as implemented by the Vienna RNA package. Here, we
try to minimize the expected number (at equilibrium) of nucleotides that are not
correctly base-paired (or unpaired). This is combined positive and negative design.
Again, we use Monte-Carlo optimization.
unix('spuriousC wc=Mmouse.wc tmax=30 score=struct W_verboten=1')
% cut-and-paste output to MATLAB variable 'bestS'
DNAfold(bestS); unix('sed y/U/T/ <rna.ps >fold_structS.ps');
|
- Our only serious disappointment now is that Mickey
Mouse is upside down. Note that the initial stem, whcih was supposed
to be 5 base-pairs in size, turns out to be 6 base-pairs. Apparently,
this is required to stabilize the stem. Also note that, in the second
example, the lonely C-G pairs in the multiloop are extremely weak. We're not
concerned about them.
-
Now consider a more challenging task: we want to design
four strands that fold into a double-crossover molecule.
- The idea is
basically the same. Specify the strand lengths, any fixed sequence
elements you may have, and the Watson-Crick complementary regions. In
this case, there are six such regions for each double-crossover
molecule.
...
DAOhelixA = [1 6 2 48 8; 3 25 2 40 16; 3 41 4 13 8; 2 1 1 21 8; 2 9 3 24 16; 4 14 3 8 8];
...
[St, wc, eq] = constraints(DAOseq, [DAOhelices; DAOsticky], []);
save_spuriousC_files(S,St,wc,eq,'DAO');
unix('spuriousC template=DAO.St wc=DAO.wc eq=DAO.eq tmax=180 score=spurious W_verboten=1')
|
See here and here for more details.
- Unfortunately, there is no well-developed model
for multi-strand, psuedo-knotted DNA secondary structure
thermodynamics, so how do we know whether this is a good sequence or
not? One way is to make the molecules in the lab, run a
non-denaturing gel, and see how well they hold together. A less
reliable way: look at dot plots and look for potential problems, such
as long diagonals that aren't in the target structure. Can you see
that the designed sequence is better than a random sequence?
[Mwc,Mbp] = dotplot(bestS,wc,eq,Inf);
subplot(1,2,1); imagesc(Mwc); axis square; title('designed sequence')
subplot(1,2,2); imagesc(Mbp); axis square; title('target')
|
For more information, see the MATLAB help files in the DNAdesign directory,
notes-examples, and notes-spuriousC. Also, "spuriousC --" will give an extensive
list of options.
Download compressed tar files of
DNAdesign (5/19/2004 version) and
DNAdesign (6/18/2004 version) and
SpuriousDesign (5/19/2004 version) here.
Note for users of the DNA cluster: these files are already installed in
/research/src/DNAdesign and /research/src/SpuriousDesign.
Erik Winfree, 8/25/02, 5/19/04, 6/03/04, 6/18/04