Global Utilities

DNA extension and analysis with rolling primers

United States Patent 5,962,228

Brenner
 
October 5, 1999
 
DNA extension and analysis with rolling 
primers 
Abstract
 
A novel "primer walking" method for DNA 
repeated cycles nucleotide identificatio
advancement along a template by template
the invention is providing a set of prim
primers" that contain complexity-reducin
of primers required for annealing to eve
a sequencing template. Another important
matic replacement of at least one of the
cleotide with its cognate complexity-red
Sequencing is initiated by annealing rol
terminal nucleotides to a primer binding
that only the rolling primer whose termi
ent with the template leads to the forma
amplifying the double stranded extension
terminal nucleotide, and hence its compl
by the identity of the amplicon. The pri
the successfully amplified polynucleotid
oligonucleotide-directed mutagenesis so 
be selected from the set that forms a pe
template at a site which is shifted towa
one nucleotide relative to the binding s
The steps of selective extension, amplif
repeated. In this manner, the primers "r
the sequencing process, moving a base at
cycle.
sequencing is provided that comprises 
n by selective extension and primer 
 mutation. An important feature of 
ers, referred to herein as "rolling 
g nucleotides for reducing the number 
ry possible primer binding site on 
 feature of the invention is the syste-
 four nucleotides in the target polynu-
ucing nucleotide or complement thereof. 
ling primers differing only in their 
 site of a sequencing template so 
nal nucleotide forms a perfect complem-
tion of an extension product. After 
 product to form an amplicon, the 
ement in the template, is identified 
mer binding site of the template of 
e is then mutated by, for example, 
that a subsequent rolling primer may 
rfectly matched duplex with the mutated 
rds the direction of extension by 
ite of the previous rolling primer. 
ication and identification are then 
oll" along the polynucleotide during 
 a time along the template with each 
 
Inventors:
 
Brenner; Sydney (Cambridge, GB)
 
Assignee:
 
Lynx Therapeutics, Inc. (Hayward, CA)
 
Appl. No.:
 
916120
 
Filed:
 
August 22, 1997
 
Current U.S. Class:
 
435/6; 536/23.1; 536/24.3
 
Intern'l Class:
 
C12Q 001/68; C07H 021/02; C07H 021/04; C
12N 015/00
Field of Search:
 
435/6 536/23.1,24.3 935/76,77,78
 
References Cited [Referenced By]
 
U.S. Patent Documents
 
4942124
 
Jul., 1990
 
Church
 
435/6.
 
5405746
 
Apr., 1995
 
Uhlen
 
435/6.
 
5407799
 
Apr., 1995
 
Studier
 
435/6.
 
5427911
 
Jun., 1995
 
Ruano
 
435/6.
 
5496699
 
Mar., 1996
 
Sorenson
 
435/6.
 
5554517
 
Sep., 1996
 
Davey et al.
 
435/91.
 
5780231
 
Jul., 1998
 
Brenner
 
435/6.
 
Other References
 
Sanger et al., PNAS 74(12): 5463-5467 (1
977).
Primary Examiner: Jones; W. Gary 
 
Assistant Examiner: Whisenant; Ethan 
 
Attorney, Agent or Firm: Macevicz; Steph
en C. 
Parent Case Text
 
This is a continuation-in-part of U.S. p
filed Mar. 5, 1996, now U.S. Pat. No. 5,
of U.S. patent application Ser. No. 08/5
Pat. No. 5,763,175.
atent application Ser. No. 08/611,155 
780,231 which is a continuation-in-part 
60,313 filed Nov. 17, 1995, now U.S. 
 
Claims
 
I claim: 
 
1. A method for determining the nucleoti
method comprising the steps of: 
de sequence of a polynucleotide, the 
 
(a) providing a set of first primers, ea
a 3'-terminal nucleotide, a template pos
region comprising one or more complexity
ch first primer of the set having 
itioning segment, and an extension 
-reducing nucleotides; 
(b) providing a double stranded DNA temp
site, a promoter, the polynucleotide, an
first primer binding site being capable 
at least one of the first primers; 
late comprising a first primer binding 
d a second primer binding site, the 
of forming an extendable duplex with 
 
(c) generating a population of RNA trans
template with an RNA polymerase that rec
cripts from the double stranded DNA 
ognizes the promoter; 
(d) mutating the first primer binding si
a first primer forming an extendable dup
binding site is shifted one nucleotide i
that a single stranded DNA template is f
te in the RNA transcripts by extending 
lex therewith, so that the first primer 
n the direction of extension and so 
ormed; 
(e) forming an amplicon from the single 
stranded DNA template; 
(f) identifying the 3'-terminal nucleoti
form the single stranded DNA template by
(g) repeating steps (b) through (f) unti
ucleotide is determined. 
de of the first primer extended to 
 the identity of the amplicon; and 
l the nucleotide sequence of the polyn-
 
2. The method of claim 1 wherein said st
RNA transcripts includes removing DNA fr
ep of generating said population of 
om said population. 
3. The method of claim 2 wherein said am
double stranded DNA by a polymerase chai
plicon is formed by amplifying said 
n reaction. 
4. The method of claim 3 wherein said on
des in said extension region of said fir
consisting of 2'-deoxyinosine, 8-oxo-2'-
osine. 
e or more complexity-reducing nucleoti-
st primers are selected from the group 
deoxyadenosine, and 8-oxo-2'-deoxyguan-
 
5. The method of claim 4 wherein said re
RNA transcripts includes treating said p
moving DNA from said population of 
opulation with a DNase. 
6. The method of claim 5 wherein said RN
A polymerase is T7 RNA polymerase.
Description
 
FIELD OF THE INVENTION 
 
The invention relates generally to a met
and more particularly, to a method of ba
extensions of an oligonucleotide primer.
hod of DNA sequencing and analysis, 
se-by-base sequencing by successive 
 
BACKGROUND 
 
Large-scale sequencing projects typicall
of progressively smaller clones of porti
is to be determined. Genomic DNA is frag
ial chromosomes (YACs) or cosmids whose 
inserted into phage or plasmid vectors f
al, Science, 254: 59-67 (1991). Although
be carried out by either so-called "dire
approaches involve at least one or two l
are prepared for sequencing by one or an
ination method. 
y involve the generation of libraries 
ons of the polynucleotide whose sequence 
mented and inserted into yeast artific-
inserts, in turn, are fragmented and 
or sequencing, e.g. Hunkapiller et 
 large-scale sequencing projects can 
cted" or "random" strategies, both 
abor intensive steps where templates 
other variant of the Sanger chain-term-
 
Many proposals have been made for reduci
steps. For example, one directed strateg
ing with a vector-specific "universal" p
of synthesis of a new sequencing primer 
nce information and subsequent new seque
In such a manner, one may "walk" along a
with a succession of newly determined pr
and subclone the template. A drawback of
of acquiring the new primer at each cycl
ons. Either the process is rendered into
next primer to be synthesized, or the pr
need to maintain a library of primers of
example, could be more than 1.times.10.s
length. A proposal to mitigate this diff
primers that are assembled from a librar
as pentamers or hexamers, e.g., Kotler e
4241-4245 (1993); Kieleczawa et al, Scie
like. But even with hexamers, a library 
is required. 
ng or eliminating these labor intensive 
y involves an initial round of sequenc-
rimer followed by repetitive cycles 
generated from the just-acquired seque-
nce determination with the new primer. 
 relatively large sequencing template 
imers without the need to fragment 
 such an approach is the difficulty 
e for making the next round of extensi-
lerably slow while one waits for the 
ocess is rendered impractical by the 
 every possible sequence which, for 
up.9 for a primer 15 nucleotides in 
iculty has been made that calls for 
y of shorter oligonucleotides, such 
t al, Proc. Natl. Acad. Sci., 90: 
nce, 258: 1787-1791 (1992); and the 
of at least 4096 oligonucleotides 
 
Besides the problem of template preparat
and random approaches employ the Sanger 
which requires the generation of sets of
having a common origin and terminating w
are typically separated by high resoluti
have the capacity of distinguishing very
by no more than a single nucleotide. Unf
cal problems have seriously impeded effi
hes, either for accommodating longer seq
sequencing absent massive capital and la
i) the gel electrophoretic separation st
ult to automate, and introduces an extra
of data, e.g. band broadening due to tem
to secondary structure in the DNA sequen
the separation gel, and the like; ii) nu
such as processivity, fidelity, rate of 
of chain terminators, and the like, are 
and analysis of DNA sequencing fragments
quantities in spatially overlapping band
the labeling moiety is distributed over 
bands rather than being concentrated in 
in the case of single-lane fluorescence 
with suitable emission and absorption pr
resolvability, e.g. Trainor, Anal. Bioch
al, Biotechniques, 5: 342-348 (1987); Ka
19: 4955-4962 (1991); Fung et al, U.S. P
al, Electrophoresis, 12: 623-631 (1991).
ion, as mentioned above, both directed 
chain-termination method of sequencing 
 labeled DNA fragments, each fragment 
ith a known base. The sets of fragments 
on gel electrophoresis, which must 
 large fragments differing in size 
ortunately, several significant techni-
cient scale-up of Sanger-based approac-
uences or for accommodating high-volume 
bor investment. Such problems include 
ep which is labor intensive, is diffic-
 degree of variability in the analysis 
perature effects, compressions due 
cing fragments, inhomogeneities in 
cleic acid polymerases whose properties, 
polymerization, rate of incorporation 
often sequence dependent; iii) detection 
 which are typically present in fmol 
s in a gel; iv) lower signals because 
the many hundred spatially separated 
a single homogeneous phase, and v) 
detection, the availability of dyes 
operties, quantum yield, and spectral 
em., 62: 418-426 (1990); Connell et 
rger et al, Nucleic Acids Research, 
at. No. 4,855,225; and Nishikawa et 
 
An important advance in sequencing techn
approach was available for sequencing DN
tion electrophoretic separations of DNA 
of templates required in large-scale seq
amenable to simultaneous, or parallel, a
eotides. 
ology could be made if an alternative 
A (i) that did not require high resolu-
fragments, (ii) that reduced the number 
uencing projects, and (iii) that was 
pplication to multiple target polynucl-
 
SUMMARY OF THE INVENTION 
 
An object of my invention is to provide 
ing the sequence of polynucleotides. 
a new method and approach for determin-
 
Another object of my invention is to pro
to sequencing that requires fewer primer
vide a new "primer walking" approach 
s for implementation. 
Still another object of my invention is 
ing the number of templates required in 
to provide a method and kits for reduc-
large-scale sequencing projects. 
Another object of my invention is to pro
patterns of gene expression in normal an
vide a method for rapidly analyzing 
d diseased tissues and cells. 
A further object of my invention is to p
for simultaneously analyzing and/or sequ
of different polynucleotides, such as a 
library or a sample of fragments from a 
rovide a method, kits, and apparatus 
encing a population of many thousands 
sample of polynucleotides from a cDNA 
segment of genomic DNA. 
Still another object of my invention is 
for identifying populations of polynucle
to provide a method, kits, and apparatus 
otides. 
Another object of my invention is to pro
of DNA in a size range corresponding to 
vide a method for sequencing segments 
typical cosmid or YAC inserts. 
The method of my invention achieves thes
cycles nucleotide identification by sele
along a template by template mutation. A
is providing a set of primers, referred 
contain complexity-reducing nucleotides 
required for annealing to every possible
template. Another important feature of t
ment of at least one of the four nucleot
with its cognate complexity-reducing nuc
ing is initiated by annealing rolling pr
nucleotides to a primer binding site of 
the rolling primer whose terminal nucleo
the template leads to the formation of a
the double stranded extension product to
tide, and hence its complement in the te
of the amplicon. For example, in a simpl
may be identified by the presence or abs
are used for separate extension and ampl
site of the template of the successfully
mutated by, for example, oligonucleotide
uent rolling primer may be selected from
duplex with the mutated template at a si
ion of extension by one nucleotide relat
rolling primer. The steps of selective e
tion are then repeated. In this manner, 
otide during the sequencing process, mov
with each cycle. 
e and other objectives by repeated 
ctive extension and primer advancement 
n important feature of the invention 
to herein as "rolling primers" that 
for reducing the number of primers 
 primer binding site on a sequencing 
he invention is the systematic replace-
ides in the target polynucleotide 
leotide or complement thereof. Sequenc-
imers differing only in their terminal 
a sequencing template so that only 
tide forms a perfect complement with 
n extension product. After amplifying 
 form an amplicon, the terminal nucleo-
mplate, is identified by the identity 
e embodiment, a terminal nucleotide 
ence of amplicon in four vessels that 
ification reactions. The primer binding 
 amplified polynucleotide is then 
-directed mutagenesis so that a subseq-
 the set that forms a perfectly matched 
te which is shifted towards the direct-
ive to the binding site of the previous 
xtension, amplification and identifica-
the primers "roll" along the polynucle-
ing a base at a time along the template 
 
Generally, this aspect of my invention i
(a) providing a set of primers, i.e. the
set having an extension region comprisin
nucleotides and a terminal nucleotide; (
primer binding site and the polynucleoti
the primer binding site being complement
least one primer of the set; (c) anneali
binding site, the extension region of th
duplex with the template and extending t
DNA; (d) amplifying the double stranded 
the terminal nucleotide of the extension
of the amplicon; (f) mutating the primer
the primer binding site is shifted one o
of extension, thereby effectively shorte
one or more nucleotides; and (g) repeati
nucleotide sequence of the polynucleotid
s carried out with the following steps: 
 rolling primers, each primer of the 
g one or more complexity-reducing 
b) forming a template comprising a 
de whose sequence is to be determined, 
ary to the extension region of at 
ng a primer from the set to the primer 
e primer forming a perfectly matched 
he primer to form a double stranded 
DNA to form an amplicon; (e) identifying 
 region of the primer by the identity 
 binding site of the template so that 
r more nucleotides in the direction 
ning the target polynucleotide by 
ng steps (c) through (f) until the 
e is determined. 
An important feature of my invention is 
to many different polynucleotides in par
tags. In accordance with this aspect of 
of a population is conjugated with an ol
sequence information to a tag complement
of such complements. That is, a unique t
of a population which can be copied and 
to its complement at a fixed position on
a tag hybridizes with its complement, a 
of the transferred sequence information.
are determined by repeated cycles of inf
at the positions of the corresponding ta
the capability of applying the method 
allel by the use of oligonucleotide 
my invention, each polynucleotide 
igonucleotide tag for transferring 
 on a spatially addressable array 
ag is attached to each polynucleotide 
used to shuttle sequence information 
 an array of such complements. After 
signal is generated that is indicative 
 Sequences of the tagged polynucleotides 
ormation transfer and signal detection 
g complements. 
At least two major advantages are gained
to discrete spatial locations rather tha
target polynucleotides to such locations
entities so that the kinetics of diffusi
favorable. Second, tag loading at the sp
be sufficient for detection, while targe
to be sufficient for both biochemical pr
less tag needs to be loaded on the spati
 by using tags to shuttle information 
n sorting an entire population of 
: First, tags are much smaller molecular 
on and hybridization are much more 
atially discrete locations only need 
t polynucleotide loading would need 
ocessing and detection; thus, far 
ally discrete sites. 
An important feature of this embodiment 
of an oligonucleotide tag to each polynu
substantially all different polynucleoti
more fully below, this is achieved by ta
tag-polynucleotide conjugates wherein ea
being attached to any polynucleotide. 
of my invention is the attachment 
cleotide of a population such that 
des have different tags. As explained 
king a sample of a full ensemble of 
ch tag has an equal probability of 
 
Oligonucleotide tags employed in the inv
to complementary oligomeric compounds co
binding strength and specificity as comp
Such complementary oligomeric compounds 
ents." Subunits of tag complements may c
nucleotide analogs or they may comprise 
of 3 to 6 nucleotides or analogs thereof
a minimally cross-hybridizing set. In su
mer of the set and the complement of any
at least two mismatches. In other words,
idizing set at best forms a duplex havin
complement of any other oligomer of the 
tags available in a particular embodimen
per tag and on the length of the subunit
from a minimally cross-hybridizing set. 
generally much less than the number of a
the tag, which for a tag n nucleotides l
for tag complements include peptide nucl
horamidates having a 3'-NHP(.dbd.O)(O.su
nucleoside. The latter compounds are ref
phosphoramidates. Preferably, both the o
complements comprise a plurality of subu
ybridizing set consisting of natural oli
in length. 
ention are capable of hybridizing 
nsisting of subunits having enhanced 
ared to natural oligonucleotides. 
are referred to herein as "tag complem-
onsist of monomers of non-natural 
oligomers having lengths in the range 
, the oligomers being selected from 
ch a set, a duplex made up of an oligo-
 other oligomer of the set contains 
 an oligomer of a minimally cross-hybr-
g at least two mismatches with the 
same set. The number of oligonucleotide 
t depends on the number of subunits 
, when the subunit is an oligomer 
In the latter case, the number is 
ll possible sequences the length of 
ong would be 4.sup.n. Preferred monomers 
eic acid monomers and nucleoside phosp-
p.-)O-5' linkage with its adjacent 
erred to herein as N3'.O slashed.P5' 
ligonucleotide tags and their tag 
nits selected from a minimally cross-h-
gonucleotides of 3 to 6 nucleotides 
 
Generally, this embodiment of my inventi
steps: (a) attaching an oligonucleotide 
polynucleotide of a population to form t
that substantially all different polynuc
tags attached; (b) labeling each tag acc
nucleotides of the respective polynucleo
rolling primer; (c) cleaving the tags fr
and (d) sorting the labeled tags onto a 
complements for detection. Preferably, t
number of times to uniquely identify eac
or to reconstruct a larger polynucleotid
In summary, my invention provides a nove
sequencing. Moreover, my invention is re
and is particularly useful in operations
amounts of sequence information, such as
DNA fragments, mRNA and/or cDNA fingerpr
of gene expression patterns. 
on is carried out by the following 
tag from a repertoire of tags to each 
ag-polynucleotide conjugates such 
leotides have different oligonucleotide 
ording to the identity of the terminal 
tides selectively amplified with a 
om the tag-polynucleotide conjugates; 
spatially addressable array of tag 
he process is repeated a sufficient 
h polynucleotide being sequenced, 
e from randomly generated fragments. 
l "primer walking" method for DNA 
adily automated for parallel application 
 requiring the generation of massive 
 large-scale sequencing of genomic 
inting, and highly resolved measurements 
 
BRIEF DESCRIPTION OF THE DRAWINGS 
 
FIG. 1 diagrammatically illustrates the 
ing RNA template selection. 
steps of a preferred embodiment employ-
 
FIG. 2a diagrammatically illustrates the
invention employing simultaneous analysi
FIG. 2b illustrates the extension region
steps that are selected based on the ide
region of the current step. 
 steps of a preferred method of the 
s of multiple tagged polynucleotides. 
s of rolling primers for subsequent 
ntity of the rolling primer extension 
 
FIG. 3 diagrmmatically illustrates an ap
on a spatially addressable array of tag 
paratus for detecting labeled tags 
complements. 
FIGS. 4a and 4b illustrate how a sequenc
steps of a preferred embodiment of the m
ing template changes in successive 
ethod. 
FIGS. 5a-5c illustrate the affect of dNT
of rolling primer extension on an RNA te
P concentration on the selectivity 
mplate by reverse transcriptase. 
DEFINITIONS 
 
"Complement" or "tag complement" as used
tags refers to an oligonucleotide to whi
hybridizes to form a perfectly matched d
specific hybridization results in a trip
selected to be either double stranded or
are formed, the term "complement" is mea
complement of a single stranded oligonuc
ement of a double stranded oligonucleoti
 herein in reference to oligonucleotide 
ch a oligonucleotide tag specifically 
uplex or triplex. In embodiments where 
lex, the oligonucleotide tag may be 
 single stranded. Thus, where triplexes 
nt to encompass either a double stranded 
leotide tag or a single stranded compl-
de tag. 
The term "oligonucleotide" as used herei
or modified monomers or linkages, includ
des, -anomeric forms thereof, peptide nu
capable of specifically binding to a tar
pattern of monomer-to-monomer interactio
pairing, base stacking, Hoogsteen or rev
or the like. Usually monomers are linked
thereof to form oligonucleotides ranging
e.g. 3-4, to several tens of monomeric u
is represented by a sequence of letters,
ood that the nucleotides are in 5'.fwdar
that "A" denotes deoxyadenosine, "C" den
uanosine, and "T" denotes thymidine, unl
odiester linkages include phosphorothioa
ate, phosphoramidate, and the like. It i
when oligonucleotides having natural or 
e.g. where processing by enzymes is call
sting of natural nucleotides are require
n includes linear oligomers of natural 
ing deoxyribonucleosides, ribonucleosi-
cleic acids (PNAs), and the like, 
get polynucleotide by way of a regular 
ns, such as Watson-Crick type of base 
erse Hoogsteen types of base pairing, 
 by phosphodiester bonds or analogs 
 in size from a few monomeric units, 
nits. Whenever an oligonucleotide 
 such as "ATGCCTG," it will be underst-
w.3' order from left to right and 
otes deoxycytidine, "G" denotes deoxyg-
ess otherwise noted. Analogs of phosph-
te, phosphorodithioate, phosphoranilid-
s clear to those skilled in the art 
non-natural nucleotides may be employed, 
ed for, usually oligonucleotides consi-
d. 
"Extendable duplex" in reference to a pr
that in a duplex formed by such annealin
3'-penultimate nucleotide of the primer 
adjacent nucleotides in the template and
to permit extension of the primer along 
term contemplates that there may be mult
between the primer and template. 
imer annealing to a template means 
g the 3'-terminal nucleotide and the 
form Watson-Crick basepairs with their 
 the duplex is sufficiently stable 
the template with a polymerase. The 
iple mismatches in the duplex formed 
 
"Perfectly matched" in reference to a du
eotide strands making up the duplex form
one other such that every nucleotide in 
basepairing with a nucleotide in the oth
the pairing of nucleoside analogs, such 
2-aminopurine bases, and the like, that 
triplex, the term means that the triplex
and a third strand in which every nucleo
Hoogsteen association with a basepair of
ely, a "mismatch" in a duplex between a 
a pair or triplet of nucleotides in the 
Watson-Crick and/or Hoogsteen and/or rev
plex means that the poly- or oligonucl-
 a double stranded structure with 
each strand undergoes Watson-Crick 
er strand. The term also comprehends 
as deoxyinosine, nucleosides with 
may be employed. In reference to a 
 consists of a perfectly matched duplex 
tide undergoes Hoogsteen or reverse 
 the perfectly matched duplex. Convers-
tag and an oligonucleotide means that 
duplex or triplex fails to undergo 
erse Hoogsteen bonding. 
As used herein, "nucleoside" and "nucleo
and nucleotides, including 2'-deoxy and 
in Kornberg and Baker, DNA Replication, 
"Natural nucleotide" as used herein refe
leotides A, C, G, and T. "Analogs" in re
tic nucleosides having modified base moi
e.g. described by Scheit, Nucleotide Ana
Uhlman and Peyman, Chemical Reviews, 90:
the only proviso that they are capable o
include synthetic nucleosides designed t
complexity of probes, increase specifici
tide" include the natural nucleosides 
2'-hydroxyl forms, e.g. as described 
2nd Ed. (Freeman, San Francisco, 1992). 
rs to the four common natural deoxynuc-
ference to nucleosides includes synthe-
eties and/or modified sugar moieties, 
logs (John Wiley, New York, 1980); 
 543-584 (1990), or the like, with 
f specific hybridization. Such analogs 
o enhance binding properties, reduce 
ty, and the like. 
As used herein, "amplicon" means the pro
That is, it is a population of identical
ded, that are replicated from a few star
are produced in a polymerase chain react
duct of an amplification reaction. 
 polynucleotides, usually double stran-
ting sequences. Preferably, amplicons 
ion (PCR). 
As used herein, "complexity-reducing nuc
tural nucleotide (i) that, when paired w
nucleotides, can form a duplex of substa
of the same duplex containing cognate na
nucleotide it replaces, and (ii) that ca
the same as its cognate natural nucleoti
nucleotides do not display degeneracy or
erases. That is, when a complexity-reduc
is being copied by a polymerase, the pol
at the site of a complexity-reducing nuc
educing nucleotide triphosphate is a sub
incorporated only at the site of a singl
another of its complements, but not both
tides are readily tested in straight for
melting temperature comparisons, and in 
polymerizations are checked by conventio
of radio-labeled complexity-reducing nuc
Nati. Acad. Sci., 44: 633 (1958). Prefer
ity," as used herein means that the melt
as described in Kawase et al, Nucleic Ac
is within twenty percent of that of the 
nucleotide. 
leotide" refers to a natural or non-na-
ith either of more than one natural 
ntially equivalent stability to that 
tural nucleotide--i.e. the natural 
n be processed by enzymes substantially 
de. Preferably, complexity-reducing 
 ambiguity when processed by DNA polym-
ing nucleotide is in a template that 
ymerase incorporates a unique nucleotide 
leotide. Likewise, when a complexity-r-
strate for a DNA polymerase, it is 
e kind of nucleotide, i.e. one or 
. Candidate complexity-reducing nucleo-
ward hybridization assays, e.g. with 
incorporation assays in which test 
nal sequencing or by incorporation 
leotides, e.g. Bessman et al, Proc. 
ably, "substantially equivalent stabil-
ing temperature of a test 13-mer duplex, 
ids Research, 14: 7727-7736 (1986), 
same duplex containing a natural cognate 
 
DETAILED DESCRIPTION OF THE INVENTION 
 
The invention provides a "primer walking
a special set of primers are used for te
of different primers in the set is minim
with complexity-reducing nucleotides and
Within each cycle of copying and mutatio
is identified and the sequencing templat
of the template results from the mutatio
of target sequence to a nucleotide of pr
" approach to DNA sequencing in which 
mplate copying and mutation. The number 
ized by a combined use of primers 
 the process of template mutation. 
n, a nucleotide of the polynucleotide 
e is shortened by one. The shortening 
n that, in effect, converts a nucleotide 
imer binding site. 
In an important aspect, the invention pr
numbers of polynucleotides in parallel b
sequence information obtained in "bulk" 
to discrete spatially addressable sites 
at the spatially addressable sites conve
by the oligonucleotide tag. As explained
rably carried out by alternating cycles 
ing the target polynucleotides by use of
ovides a method of sequencing large 
y using oligonucleotide tags to shuttle 
or solution phase biochemical processes 
on a solid phase. Signals generated 
y the sequence information carried 
 more fully below, sequencing is prefe-
of identifying nucleotides and shorten-
 rolling primers. 
In one aspect, the oligonucleotide tags 
of "words" or subunits selected from min
its. Subunits of such sets cannot form a
of another subunit of the same set with 
Thus, the sequences of any two oligonucl
duplexes will never be "closer" than dif
embodiments, sequences of any two oligon
be even "further" apart, e.g. by designi
such that subunits cannot form a duplex 
of the same set with less than three mis
oligonucleotide tags of the invention an
of the natural nucleotides so that they 
es, such as ligases, polymerases, nuclea
like. 
of the invention comprise a plurality 
imally cross-hybridizing sets of subun-
 duplex or triplex with the complement 
less than two mismatched nucleotides. 
eotide tags of a repertoire that form 
fering by two nucleotides. In particular 
ucleotide tags of a repertoire can 
ng a minimally cross-hybridizing set 
with the complement of another subunit 
matched nucleotides, and so on. Usually, 
d their complements are oligomers 
may be conveniently processed by enzym-
ses, terminal transferases, and the 
 
In another aspect of the invention, tag 
nucleotide monomers which encompass a ra
for antisense therapeutics that have enh
specificity for polynucleotide targets. 
of "oligonucleotide," the compounds may 
ations of the natural nucleotides, e.g. 
moieties, and/or monomer-to-monomer link
oligonucleotide loops, oligonucleotide "
promote enhanced binding and specificity
complements consist of non-natural 
nge of compounds typically developed 
anced binding strength and enhanced 
As mentioned above under the definition 
include a variety of different modific-
modification of base moieties, sugar 
ages. Such compounds also include 
clamps," and like structures that 
. 
Rolling Primers 
 
Preferably, rolling primers are from 15 
the following form: 
to 30 nucleotide in length and have 
 
X.sub.1 X.sub.2. . . X.sub.k YY . . . YN
 
where the X.sub.i 's are nucleotides, pr
ts; Y's are complexity-reducing nucleoti
a terminal nucleotide of either A, C, G,
otide, such as deoxyinosine. The segment
to herein as the "template positioning s
in repetitive subunits so that the prime
binding site with the terminal nucleotid
of target polynucleotide. Preferably, th
that if the primer is out of register by
be too unstable to remain annealed to th
subunit is from 4 to 8 nucleotides in le
below, arranging the template positionin
subunits reduces the overall number of p
primers. Preferably, the template positi
group of no more than two nucleotides, a
of a complexity-reducing nucleotide bein
the underlined X.sub.k indicates the pos
by way of oligonucleotide-directed mutag
in Current Protocols in Molecular Biolog
The segment YY . . . YN is referred to h
the primer, as the primer is extended fr
ably, extension is carried out by a poly
5'.fwdarw.3' orientation. However, the o
other methods of extension, e.g. by liga
U.S. Pat. No. 5,114,839. An important fe
only take place when the terminal nucleo
pair with the adjacent nucleotide in the
ses the minimal number of nucleotides gr
duplex with the template, even if there 
That is, in the preferred embodiments, t
and the template must be stable enough t
mutagenesis. Preferably, the extension r
and most preferably, it comprises 4 nucl
from the group consisting of deoxyadenos
eferably arranged in repetitive subuni-
des or their complements; and N is 
 or T, or a complexity -reducing nucle-
s of X.sub.i nucleotides, referred 
egments," are preferably arranged 
r is properly registered on the primer 
e juxtaposed with the first nucleotide 
e repeat subunit is long enough so 
 one or more repeat subunits, it will 
e template. Preferably, the repeat 
ngth. As will become more apparent 
g segment as a series of identical 
rimers required in a set of rolling 
oning segments are selected from a 
t least one of which is a complement 
g employed. In preferred embodiments, 
ition at which the template is mutated 
enesis, e.g. a technique fully described 
y (John Wiley & Sons, New York, 1995). 
erein as the "extension region" of 
om this end along the template. Prefer-
merase so that YY . . . YN is in a 
rientation could be 3'.fwdarw.5' with 
ting oligonucleotide blocks as described 
ature of the invention is that extension 
tide, N, forms a Watson-Crick base 
 template. The extension region compri-
eater than two that can form a stable 
is a mismatch at the X.sub.k position. 
he duplex between the extension region 
o carry out the oligonucleotide-directed 
egion comprises from 3 to 6 nucleotides, 
eotides. Preferably, Y is selected 
ine (A) and deoxyinosine (I). 
The number of rolling primers required f
on several factors, including the type o
employed, the length of the primer, the 
the repeat subunit length of the templat
the following set of primers (SEQ ID NO:
positioning segment 18 nucleotides in le
A's 6 nucleotides in length. 
or a particular embodiment depends 
f complexity-reducing nucleotides 
length of the extension region, and 
e positioning segment. For example, 
 1 through SEQ ID NO: 6) has a template 
ngth made up of subunits of G's and 
 
    ____________________________________
__
    Subgroup  Rolling Primer Sequence
 
    ____________________________________
__
    (1)       GGAAGAGGAAGAGGAAGAYYYN
 
    (2)                 GAAGAGGAAGAGGAAG
AGYYYN
    (3)                  AAGAGGAAGAGGAAG
AGGYYYN
    (4)                  AGAGGAAGAGGAAGA
GGAYYYN
    (5)                   GAGGAAGAGGAAGA
GGAAYYYN
    (6)                    AGGAAGAGGAAGA
GGAAGYYYN
    ____________________________________
__
If Y is A or I and N is A, C, I, or T, t
includes 192 (=6.times.2.sup.3 .times.4)
represents all of the following sequence
and III. As can be seen from the above e
is available for shifting the primer one
ion after any cycle. That is, if a prime
in a cycle, the next primer employed wou
if a primer from subgroup (6) were emplo
would be selected from subgroup (1), and
amplify the template, the template is, i
in each cycle. 
hen the above set of rolling primers 
 primers. In particular, each "YYY" 
s: AAA, AAI, AII, AIA, IAI, IAA, IIA, 
xample, a template positioning segment 
 nucleotide in the direction of extens-
r from subgroup (5) were employed 
ld be selected from subgroup (6), 
yed in a cycle, the next primer employed 
 so on. When PCR is used to copy and 
n effect, shortened by one nucleotide 
 
Alternatively, the binding strength of t
by substituting G for I and diaminopurin
those immediately adjacent to the termin
set of "YYY" sequences include DDA, DDI,
In another embodiment, the template posi
-reducing analogs for mutating the rolli
progresses so that fewer such segments a
template positioning segments may be emp
converts all template nucleotides to C's
in an alternating fashion. Both primer p
1 and C's or A's to C's at position 3, w
very stable GC basepairs at either end o
p2 contains an additional deoxyinosine w
GT dimers. Note that the respective repe
out of phase. The deoxyinosine at positi
to one that forms a perfectly matched du
one nucleotide in the direction of exten
of primer p1 and p2 one may cause the pr
in each cycle. 
he extension region can be improved 
e (D) for A in all positions, except 
al nucleotide. That is, an alternative 
 DGI, DGA, GDI, GDA, GGI, and GGA. 
tioning segment may contain complexity-
ng primer binding site as sequencing 
re required. For example, the following 
loyed with an extension region that 
. ##STR1## Primers p1 and p2 are used 
1 and p2 convert C's to A's at position 
hich maintains the two segments of 
f the primers when they anneal. Primer 
ithin an interior segment of repeating 
at units are exactly one nucleotide 
on 2 converts the primer binding site 
plex with primer p1 with a shift of 
sion. Thus, by alternating the use 
imer to advance by one nucleotide 
 
Sequencing with Rolling Primers 
 
Prior to sequencing, a target polynucleo
kinds of nucleotide are substituted with
nucleotides. In a preferred embodiment, 
by replicating the target polynucleotide
with dITP. A template for sequencing is 
polynucleotide to a primer binding site.
inserting the target polynucleotide into
binding site. Preferably, the primer bin
ive to the target polynucleotide so that
out with a DNA polymerase. Such insertio
a blunt-end-cutting restriction endonucl
if the rolling primers described above a
three-base sequence adjacent to the begi
that is complementary to the primers des
referred herein as the "T" primer, is lo
polynucleotide so that it can be amplifi
can be initiated on such a template (SEQ
as shown below, assuming the use of the 
tide is treated so that one or more 
 their cognate complexity-reducing 
this is conveniently accomplished 
 in a PCR wherein dGTP is replaced 
then prepared by joining the target 
 Typically, this is accomplished by 
 a vector which carries the primer 
ding site is in the 3' direction relat-
 primer extensions can be carried 
n is conveniently carried out using 
ease, such as Stu I or Ecl 136 II, 
re employed. These enzymes leave a 
nning of the target polynucleotide 
cribed above. Preferably, a primer, 
cated at the other end of the target 
ed by PCR. For example, sequencing 
 ID NO: 9) in four separate reactions 
primers described above. 
    ____________________________________
______________________________________
    Reaction 1
 
                 GGAAGAGGAAGAGGAAGAAIIA.
fwdarw.
                                  . . . 
CCTTCTCCTTCTCCTTCTTCCNNNN . . .
           NNNBBBB . . . BB . . .
 
    Reaction 2
 
                            GGAAGAGGAAGA
GGAAGAAIIC.fwdarw.
                                   . . .
 CCTTCTCCTTCTCCTTCTTCCNNNN . . .
           NNNBBBB . . . BB . . .
 
    Reaction 3
 
                        GGAAGAGGAAGAGGAA
GAAIII.fwdarw.
                              . . . CCTT
.
CTCCTTCTCCTTCTTCCNNNN . . . NNNBBBB 
 
           . . BB . . .
 
    Reaction 4
 
                    GGAAGAGGAAGAGGAAGAAI
IT.fwdarw.
                         . . . CCTTCTCCT
.
TCTCCTTCTTCCNNNN . . . NNNBBBB . . 
 
           BB . . .
 
    ____________________________________
where "NNNN . . . NNN" represents the ta
. BB" represents the complement of a T p
the sequences by PCR. The underlined seq
of the rolling primers. The template pos
arbitrarily chosen to correspond to a pr
If it is assumed--to illustrate the meth
otide adjacent to the rolling primer bin
1 will result in the formation of an amp
the polynucleotide is identified as T. P
the primer is extended with a high fidel
in the presence of dATP, dCTP, dITP, and
It should be understood that selective e
a single vessel, for example, if labeled
products are separated from the primers 
feature is that only primers whose termi
Crick basepair with the template are ext
any single stranded DNA in the reaction 
stranded nuclease, such as Mung bean nuc
ion, the remaining double stranded DNA i
of dATP, dCTP, dITP, and dTTP in the pre
amplicon. Preferably, this amplification
PCR so that there is little or no likeli
being produced. 
______________________________________-
rget polynucleotide and "BBBB . . 
rimer binding site for amplifying 
uences indicate the extension regions 
itioning segment of the primers was 
imer from subgroup (1) described above. 
od--that the sequence of the polynucle-
ding site is "TAIC," then only Reaction 
licon, and the first nucleotide of 
referably, prior to amplification, 
ity DNA polymerase, such as Sequenase, 
 dTTP in the preferred embodiments. 
xtension may also be carried out in 
 primers are employed and the extension 
that fail to extend. The important 
nal nucleotide forms a correct Watson--
ended. Preferably, after extension, 
mixture is digested with a single 
lease. After such extension and digest-
s then amplified, again in the presence 
ferred embodiments, to produce an 
 is accomplished by 5-10 cycles of 
hood of anomalous amplification products 
 
Samples of the amplicon from Reaction 1 
new vessels containing following primers
are removed and aliquotted into four 
 from subgroup (2): 
    ____________________________________
______________________________________
    Reaction 5
 
                 GAAGAGGAAGAGGAAGAGIIAA.
fwdarw.
                                       .
.
 . . CCTTCTCCTTCTCCTTCTTCCTNNN . . 
 
           NNNBBBB . . . BB
 
    Reaction 6
 
                        GAAGAGGAAGAGGAAG
AGIIAC.fwdarw.
            . . . CCTTCTCCTTCTCCTTCTTCCT
NNN . . . NNNBBBB . . . BB
    Reaction 7
 
                      GAAGAGGAAGAGGAAGAG
IIAI.fwdarw.
                          . . . CCTTCTCC
. .
TTCTCCTTCTTCCTNNN . . . NNNBBBB . 
 
           BB
 
    Reaction 8
 
                   GAAGAGGAAGAGGAAGAGIIA
T.fwdarw.
                           . . . CCTTCTC
.
CTTCTCCTTCTTCCTNNN . . . NNNBBBB . 
 
           . BB
 
    ____________________________________
Since the first nucleotide of the target
the previous cycle, one selects primers 
regions have the form "IIAN," as shown. 
ned T in the lower strands, which is mut
by oligonucleotide-directed mutagenesis.
otide directing the mutation of the site
converted into a "C" in the amplicons. S
target is A, both Reactions 7 and 8 lead
amplicon may be sampled for the next cyc
eotide is presently being considered. As
onal "pooling" step must be carried out 
simultaneously sequenced. 
______________________________________-
 polynucleotide was determined in 
from subgroup (2) whose extension 
This creates a mismatch at the underli-
ated to C in any amplicon produced 
 That is, the primer is the oligonucle-
 in the amplicon. Thus, the "T" is 
ince the second nucleotide of the 
 to the production of amplicons. Either 
le since only a single target polynucl-
 explained more fully below, an additi-
when multiple polynucleotides are 
 
As before, samples of one of the two amp
vessels containing primers from subgroup
the form "IAIN". 
licons are distributed into four new 
 (3) with an extension region having 
 
    ____________________________________
______________________________________
    Reaction 9
 
                   AAGAGGAAGAGGAAGAGGIAI
A.fwdarw.
              . . . CCTTCTCCTTCTCCTTCTCC
CTANN . . . NNNBBBB . . . BB
    Reaction 10
 
                      AAGAGGAAGAGGAAGAGG
IAIC.fwdarw.
                         . . . CCTTCTCCT
.
TCTCCTTCTCCCTANN . . . NNNBBBB . . 
 
           BB
 
    Reaction 11
 
                           AAGAGGAAGAGGA
AGAGGIAII.fwdarw.
                               . . . CCT
TCTCCTTCTCCTTCTCCCTANN . . . NNNBBBB
           . . . BB
 
    Reaction 12
 
                           AAGAGGAAGAGGA
AGAGGIAIT.fwdarw.
                        . . . CCTTCTCCTT
.
CTCCTTCTCCCTANN . . . NNNBBBB . . 
 
    ____________________________________
______________________________________
           BB
 
Both Reactions 9 and 10 will produce amp
fied as an "I." For the next cycle, this
from subgroup (4) having an extension re
process is continued. 
licons; thus, the third base is identi-
 then leads to the selection of primers 
gion with the form "AIAN," and the 
 
Sequencing with RNA Template Selection 
 
A significant increase in selectivity ca
and a reverse transcriptase to extend ro
comes about in part from the facile remo
tion after RNA templates are synthesized
is illustrated in FIG. 1. Double strande
sequenced is ligated between an RNA poly
binding site, e.g. by cloning into an ap
nts. Using standard protocols, the vecto
template (100) and rolling primer bindin
template (100) and binding site are synt
such as T7 RNA polymerase. After synthes
with a DNase to remove extraneous DNA an
the purified RNA the appropriate rolling
"first primers," are added (130) and tho
the RNA template are extended with a rev
on, the RNA is removed by hydrolysis, e.
H activity of the reverse transcriptase,
amplified, preferably by PCR. Preferably
red herein as a "second primer," contain
round of transcription; and in further p
to the template positioning segment of t
n be achieved by using an RNA template 
lling primers. The gain in selectivity 
val of undesired DNA by nuclease diges-
. The general scheme of the embodiment 
d DNA (dsDNA) template (100) to be 
merase promoter and a rolling primer 
propriate vector containing such eleme-
r is linearized downstream of dsDNA 
g site, and RNA copies (120) of dsDNA 
hesized (110) using an RNA polymerase, 
is, the reaction mixture is treated 
d the RNA copies are purified. To 
 primers, referred to herein as the 
se forming extendable duplexes with 
erse transcriptase. After such extensi-
g. by heating and/or action by RNase 
 and the resulting ssDNA (140) is 
, one of the primers in the PCR, refer-
s the promoter sequence for the next 
reference, the other primer binds 
he rolling primer binding site. 
A preferred set of rolling primers, i.e.
has the following form: 
 first primers, for this embodiment 
 
X.sub.1 X.sub.2. . . X.sub.k IRZNN 
 
where X.sub.1 X.sub.2. . . X.sub.k is a 
bed above, I is deoxyinosine, R is selec
and diaminopurine ("D"), Z is selected f
oxyadenosine ("oxo-A") and 8-oxo-2-deoxy
from the group consisting of A, C, G, an
of the template is converted to either C
and amplification steps. This is because
oxo-A at the Z position it may pair with
template it only allows incorporation of
selected with oxo-G at the Z position it
when used as a template it only allows i
as a "place saver" which provides a stab
opurine being preferred over T for the g
Finally, I converts T's to C's. Clearly,
When the template positioning segments o
are used, then the total number of rolli
is 128 (=2.times.2.times.2.times.16). 
template positioning segment as descri-
ted from the group consisting of G 
rom the group consisting of 8-oxo-2-de-
guanosine ("oxo-G"), N is selected 
d T. In this embodiment, any nucleotide 
 or T by pairing with Z in the extension 
 whenever a primer is selected with 
 either G or T, but when used as a 
 T. Likewise, whenever a primer is 
 may pair with either A or C, but 
ncorporation of C. R merely serves 
le basepair with either T or C (diamin-
reater stability of the TD basepair). 
 G could be also used at this position. 
f primers p1 and p2, described above, 
ng primers required for sequencing 
 
Rolling primers of the above form are re
DNA synthesizer using conventional chemi
for the various nucleotide analog, which
Glen Research (Sterling, Va.). 
adily synthesized on an automated 
stries and phosphoramidite monomers 
 are available commercially, e.g. 
 
Constructing Oligonucleotide Tags from M
Subunits 
inimally Cross-Hybridizing Sets of 
 
As mentioned above, an important embodim
eous sequencing of multiple target polyn
tags of the type disclosed by Brenner, i
and 5,654,413; and in International appl
are incorporated by reference. 
ent of the invention includes simultan-
ucleotides by way of oligonucleotide 
n U.S. Pat. Nos. 5,604,097; 5,635,400; 
ication PCT/US96/09513, which references 
 
Oligonucleotide tags and their complemen
range in length from 12 to 60 nucleotide
range in length from 18 to 40 nucleotide
they range in length from 25 to 40 nucle
from antisense monomers, oligonucleotide
range in length from 10 to 40 monomers; 
length from 12 to 30 monomers. Most pref
stranded and specific hybridization occu
tag complement. 
ts used in the present method may 
s or basepairs; more preferably, they 
s or basepairs; and most preferably, 
otides or basepairs. When constructed 
 tags and their complements preferably 
and more preferably, they range in 
erably, oligonucleotide tags are single 
rs via Watson-Crick pairing with a 
 
After chemical synthesis libraries of ta
PCR amplicons that include primer bindin
ction endonuclease recognition sites to 
to polynucleotides. Preferably, the comp
so that the right and left primers have 
annealing temperatures. In some embodime
and other flanking sequences of the tags
the four natural nucleotides in order to
exchange reaction to render a construct 
a selected region. Such reactions usuall
activity of a DNA polymerase, such as T4
are described in Sambrook et al, Molecul
Harbor Laboratory, New York, 1989). 
gs are conveniently maintained as 
g regions for amplification and restri-
facilitate excision and attachment 
osition of the primers is selected 
approximately the same melting and 
nts, either one or both of the primers 
 consist of three or fewer of the 
 allow the use of a "stripping" and 
containing a tag single stranded in 
y employ the 3.fwdarw.5' exonuclease 
 DNA polymerase, or like enzyme, and 
ar Cloning, Second Edition (Cold Spring 
 
As mentioned above, an important use of 
from a target polynucleotide to a solid 
Preferably, this step is carried out by 
of a double stranded template, e.g. one 
separating it from the reaction mixture,
tag, and applying it to the solid phase 
be carried out in a variety of ways usin
ques, one of which is exemplified below.
labeled in a variety of ways, including 
of radioactive moieties, fluorescent moi
inescent markers, and the like. Many com
for labeling DNA and constructing DNA pr
labelling tags of the present invention.
Nonisotopic DNA Probe Techniques (Academ
Handbook of Fluorescent Probes and Resea
Eugene, 1992); Keller and Manak, DNA Pro
New York, 1993); and Eckstein, editor, O
ical Approach (IRL Press, Oxford, 1991);
ing and Detection of Biomolecules (Sprin
like. 
the tags is for "shuttling" information 
phase support containing tag complements. 
excising the tag-containing segment 
or more restriction endonucleases, 
 denaturing and labelling the excised 
support for detection. This step can 
g standard molecular biological techni-
 Likewise, the excised tags can be 
the direct or indirect attachment 
eties, colorimetric moieties, chemilum-
prehensive reviews of methodologies 
obes provide guidance applicable to 
 Such reviews include Kricka, editor, 
ic Press, San Diego, 1992); Haugland, 
rch Chemicals (Molecular Probes, Inc., 
bes, 2nd Edition (Stockton Press, 
ligonucleotides and Analogues: A Pract-
 Kessler, editor, Nonradioactive Label-
ger-Verlag, Berlin, 1992); and the 
 
Preferably, the tags are labeled with on
disclosed by Menchen et al, U.S. Pat. No
onal application PCT/US90/05565. 
e or more fluorescent dyes, e.g. as 
. 5,188,934; and Begot et al Internati-
 
Solid Phase Supports for Tag Complements
 
Preferably, detection of sequence inform
locations where tags hybridize to their 
the detection of signals from successive
with the same tag complement location th
Otherwise, the sequence of signals will 
the sequence of the polynucleotide corre
This requirement is met by providing a s
complement. As used herein "spatially ad
of a particular tag complement can be re
cing operation. Knowledge of the identit
it is only important that its location b
of tag transfers. Preferably, the region
ete, i.e. non-overlapping with regions c
so that signal detection is more conveni
arrays are constructed by attaching or s
phase supports. 
ation takes place at spatially discrete 
complements. It is important that 
 cycles of tag transfer be associated 
roughout the sequencing operation. 
not be a faithful representation of 
sponding to the tag and tag complement. 
patially addressable array of tag 
dressable" means that the location 
corded and tracked throughout a sequen-
y of a tag complement is not crucial; 
e identifiable from cycle to cycle 
s containing tag complements are discr-
ontaining different tag complements, 
ent. Generally, spatially addressable 
ynthesizing tag complements on solid 
 
Solid phase supports for use with the in
forms, including microparticles, beads, 
chined chips, and the like. Likewise, so
may comprise a wide variety of compositi
alkanethiolate-derivatized gold, cellulo
nked polystyrene, silica gel, polyamide,
a population of discrete particles are e
coating, or population, of complementary
other), or a single or a few supports ar
regions each containing a uniform coatin
sequences to the same tag (and no other)
of the regions may vary according to par
regions range in area from several m.sup
e.g. 100-500. 
vention may have a wide variety of 
and membranes, slides, plates, microma-
lid phase supports of the invention 
ons, including glass, plastic, silicon, 
se, low cross-linked and high cross-li-
 and the like. Preferably, either 
mployed such that each has a uniform 
 sequences of the same tag (and no 
e employed with spacially discrete 
g, or population, of complementary 
. In the latter embodiment, the area 
ticular applications; usually, the 
.2, e.g. 3-5, to several hundred m.sup.2, 
 
Tag complements may be used with the sol
ized on, or they may be separately synth
support for use, e.g. as disclosed by Lu
16: 10861-10880 (1988); Albretsen et al,
Wolf et al, Nucleic Acids Research, 15: 
Nucleic Acids Research, 15: 5353-5372 (1
are synthesized on and used with the sam
ise a variety of forms and include a var
may comprise microparticles or arrays, o
populations of tag complements are synth
supports may be used with the invention,
controlled pore glass (CPG), highly cros
rs, cellulose, nylon, dextran, latex, po
in the following exemplary references: M
vol. 44 (Academic Press, New York, 1976)
and 4,046;720; and Pon, Chapter 19, in A
Biology, Vol. 20, (Humana Press, Totowa,
further include commercially available n
ene beads (e.g. available from Applied B
atized magnetic beads; polystyrene graft
TentaGel.TM., Rapp Polymere, Tubingen Ge
the support characteristics, such as mat
the like, and the type of linking moiety
under which the tags are used. Exemplary
Pon et al, Biotechniques, 6: 768-775 (19
Barany et al, International patent appli
J. Chem. Soc. Commun., 1989: 891-893; Da
18: 3813-3821 (1990); Beattie et al, Cli
Maskos and Southern, Nucleic Acids Resea
like. As described more fully below, whe
synthesized on microparticles, populatio
a solid phase support to form a spatiall
id phase support that they are synthes-
esized and attached to a solid phase 
nd et al, Nucleic Acids Research, 
 Anal. Biochem., 189: 40-50 (1990); 
2911-2926 (1987); or Ghosh et al, 
987). Preferably, tag complements 
e solid phase support, which may compr-
iety of linking moieties. Such supports 
r matrices, of regions where uniform 
esized. A wide variety of microparticle 
 including microparticles made of 
s-linked polystyrene, acrylic copolyme-
lyacrolein, and the like, disclosed 
eth. Enzymol., Section A, pages 11-147, 
; U.S. Pat. Nos. 4,678,814; 4,413,070; 
grawal, editor, Methods in Molecular 
 N.J., 1993). Microparticle supports 
ucleoside-derivatized CPG and polystyr-
iosystems, Foster City, Calif.); deriv-
ed with polyethylene glycol (e.g., 
rmany); and the like. Selection of 
erial, porosity, size, shape, and 
 employed depends on the conditions 
 linking moieties are disclosed in 
88); Webb, U.S. Pat. No. 4,659,774; 
cation PCT/US91/06103; Brown et al, 
mha et al, Nucleic Acids Research, 
nical Chemistry, 39: 719-722 (1993); 
rch, 20: 1679-1684 (1992); and the 
n tag complements are attached or 
ns of microparticles are fixed to 
y addressable array. 
As mentioned above, tag complements may 
a few) solid phase support to form an ar
tag complements. That is, within each re
complement is synthesized. Techniques fo
in McGall et al, International applicati
Natl. Acad. Sci., 91: 5022-5026 (1994); 
application PCT/GB89/01114; Maskos and S
al, Genomics, 13: 1008-1017 (1992); and 
Research, 21: 4663-4669 (1993). 
also be synthesized on a single (or 
ray of regions uniformly coated with 
gion in such an array the same tag 
r synthesizing such arrays are disclosed 
on PCT/US93/03767; Pease et al, Proc. 
Southern and Maskos, International 
outhern (cited above); Southern et 
Maskos and Southern, Nucleic Acids 
 
Preferably, the invention is implemented
coated with complements of the same tag 
methods of covalently or noncovalently l
ces are well known, as exemplified by th
Iyer (cited above); Gait, editor, Oligon
ach (IRL Press, Oxford, 1984); and the r
the size and shape of a microparticle is
in the size range of a few, e.g. 1-2, to
diameter are preferable, as they facilit
of large repertoires of oligonucleotide 
usage. 
 with microparticles or beads uniformly 
sequence. Microparticle supports and 
inking oligonucleotides to their surfa-
e following references: Beaucage and 
ucleotide Synthesis: A Practical Appro-
eferences cited above. Generally, 
 not critical; however, microparticles 
 several hundred, e.g. 200-1000 m 
ate the construction and manipulation 
tags with minimal reagent and sample 
 
Preferably, commercially available contr
supports are employed as solid phase sup
come available with base-labile linkers 
e.g. Applied Biosystems (Foster City, Ca
having pore size between 500 and 1000 an
olled-pore glass (CPG) or polystyrene 
ports in the invention. Such supports 
and initial nucleosides attached, 
lif.). Preferably, microparticles 
gstroms are employed. 
In other preferred applications, non-por
their optical properties, which may be a
numbers of microparticles on planar supp
Particularly preferred non-porous microp
(GMA) beads available from Bangs Laborat
les are useful in a variety of sizes and
groups for synthesizing tags or tag comp
parallel manipulations of tagged micropa
employed. 
ous microparticles are employed for 
dvantageously used when tracking large 
orts, such as a microscope slide. 
articles are the glycidal methacrylate 
ories (Carmel, Ind.). Such micropartic-
 derivatized with a variety of linkage 
lements. Preferably, for massively 
rticles, 5 m diameter GMA beads are 
 
Attaching Tags to Target Polynucleotides
 
An important aspect of the invention is 
tion, e.g. a cDNA library, such that the
polynucleotides. This latter condition c
a repertoire of tags to a population of 
and sampling of the ligated sequences. A
can be ligated to a population of polynu
as through direct enzymatic ligation, am
containing the tag sequences, and the li
a very large population of tag-polynucle
tag is generally attached to many differ
a sufficiently small sample of the conju
"doubles," i.e. the same tag on two diff
negligible. (Note that it is also possib
same polynucleotide in a sample. This ca
being processed, e.g. sequenced, twice. 
are being analyzed, multiple tags with t
occurence--and expected--because of diff
more fully below, the probability of obt
estimated by a Poisson distribution sinc
will be large, e.g. on the order of thou
of selecting a particular tag will be sm
large, e.g. on the order of tens of thou
of the tag repertoire is about 100 times
polynucleotide in the population being a
exity of the tag repertoire is preferabl
of polynucleotides being analyzed. Gener
the probability of obtaining a double. T
selecting a large sample of tag-polynucl
ensures adequate coverage of a target po
operation, and selecting a small sample 
of doubles will be present. In most embo
adds an additional source of noise or, i
complication in scanning and signal proc
simultaneously giving multiple signals c
the term "substantially all" in referenc
is meant to reflect the statistical natu
to obtain a population of tag-molecule c
The meaning of substantially all in term
conjugates depends on how the tags are b
acid sequencing, substantially all means
tags have unique polynucleotides attache
at least ninety percent of the tags have
Still more preferably, it means that at 
have unique polynucleotides attached. An
at least ninety-nine percent of the tags
In a preferred embodiment, tags, polynuc
sites, and other elements for manipulati
a cloning vector to establish a base lib
as needed. For example, such a construct
where the "T" or tag primer binding site
binding site are used with the appropria
the cloning vector to form PCR amplicons
sites are used to excise the tag from th
ication and identification of a terminal
amplifications, it is important that the
from undesired cleavage by the nucleases
Preferably, this is accomplished by meth
iction endonucleases. 
tagging of polynucleotides of a popula-
 same tag is not attached to different 
an be essentially met by ligating 
polynucleotides followed by cloning 
 repertoire of oligonucleotide tags 
cleotides in a number of ways, such 
plification, e.g. via PCR, using primers 
ke. The initial ligating step produces 
otide conjugates such that a single 
ent polynucleotides. However, by taking 
gates, the probability of obtaining 
erent polynucleotides, can be made 
le to obtain different tags with the 
se simply leads to a polynucleotide 
Also, where patterns of gene expression 
he same polynucleotide will be a common 
erences in mRNA abundances). As explain 
aining a double in a sample can be 
e the number of conjugates in a sample 
sands or more, and the probability 
all because the tag repertoire is 
sand or more. Preferably, the size 
 the number of distinct species of 
nalyzed. Or, in other words, the compl-
y about 100 times that of the population 
ally, the larger the sample the greater 
hus, a design trade-off exists between 
eotide conjugates--which, for example, 
lynucleotide in a shotgun sequencing 
which ensures that a minimal number 
diments, the presence of doubles merely 
n the case of sequencing, a minor 
essing, as regions of tag complements 
an simply be ignored. As used herein, 
e to attaching tags to polynucleotides 
re of the sampling procedure employed 
onjugates essentially free of doubles. 
s of actual percentages of tag-molecule 
eing employed. Preferably, for nucleic 
 that at least eighty percent of the 
d. More preferably, it means that 
 unique polynucleotides attached. 
least ninety-five percent of the tags 
d, most preferably, it means that 
 have unique polynucleotides attached. 
leotides to be sequenced, primer binding 
ng the sequences are inserted into 
rary that may be sampled and amplified 
 could have the following form: ##STR2## 
 and the "S" or sequencing primer 
te primers to amplify the insert of 
 for subsequent analysis. The cleavage 
e amplicons, after steps of PCR amplif-
 nucleotide. As noted below, after 
 target polynucleotides be protected 
 employed in the identification step. 
ylation and careful selection of restr-
 
Sequencing Tagged Polynucleotides 
 
A preferred embodiment for simultaneousl
polynucleotides is diagramed in FIG. 2a.
polynucleotides is amplified from a vect
of dATP, dCTP, dITP, and dTTP to give a 
(10) containing T primer binding site (1
below is optional, tag (16), cleavage si
and rolling primer binding site (22). 
y sequencing a population of tagged 
 Preferably, the population of tagged 
or as described above in the presence 
population of double stranded DNAs 
2), cleavage site (14)--which as shown 
te (18), target polynucleotides (20), 
 
In the initial population, rolling prime
complement to the extension region (24),
example below. Samples of the initial po
(26) to four separate vessels (28-34) wh
primers of subgroup (1), described above
-AIIC, -AIIG, and -AIIT. (The four rolli
vessel and allowed to compete against on
errors are less likely if the primers ar
of subgroups (1)-(6) are used here to ex
alternative forms of the rolling primers
as described more fully below, the trans
because more than four vessels, i.e. up 
exemplified here, are required for the e
stranded DNAs (10) are combined with the
wing steps (36) are taken: the double st
heating; the temperature is lowered to p
to the rolling primer binding sites; the
fidelity DNA polymerase, such as Sequena
dITP, and dTTP; preferably, any remainin
e.g. with a single stranded nuclease, su
the likelihood of interference from the 
subsequent amplification; T primer is ad
products are amplified, preferably with 
A (38), amplicon C (40), amplicon G (42)
As an alternative, and/or supplement, to
stranded nuclease, the double stranded D
(or equivalently, amplified in the prese
After such treatment, any double strande
least two extension reactions will be he
at cleavage site (18). Thus, a nuclease 
sequences will not cleave it. If a sampl
capture agent, such as biotin, on the T 
by cleaving with the nuclease for cleava
that are methylated or hemi-methylated w
signal upon application of the tags to a
r binding site (22) contains a known 
 for example, AGG as shown in the 
pulation are preferably transferred 
ere they are combined with the rolling 
, having extension regions -AIIA, 
ng primer could be placed in a single 
e another for extension; however, 
e used separately). The rolling primers 
emplify the invention. Clearly, many 
 could be used. In subsequent cycles, 
ferring step (26) becomes more complex 
to 32 (=4.times.8) in the embodiment 
xtension reactions. After the double 
 appropriate rolling primers the follo-
randed DNAs are denatured, e.g. by 
ermit the rolling primers to anneal 
 primers are extended with a high 
se, in the presence of dATP, dCTP, 
g single stranded DNA is digested, 
ch as Mung bean nuclease, to reduce 
left over single stranded DNA in the 
ded; and the double stranded extension 
5-10 cycles of PCR, to generate amplicon 
, and amplicon T (44), respectively. 
 the step of digesting with a single 
NA (10) can be treated with a methylase 
nce of 5-methylcytosine triphosphate). 
d DNA that is not the product of at 
mi-methylated or fully methylated 
that recognizes site (18) on those 
e of amplicon is taken by way of a 
primer, tags may be release for analysis 
ge site (18). However, those sites 
ill not be cleaved to give a spurious 
 solid phase support (48). 
After a sample is taken from each amplic
sites (14) and/or (18) and labeled (46),
labeled tags are then either applied sep
solid phase support (48) or pooled and a
the labeling system employed, the comple
factors. Samples of the amplicons are al
in accordance with the method of the inv
of the most recently determined nucleoti
extension region, a sample either may be
with rolling primers for the next cycle,
one or more other samples and aliquotted
for the next cycle. Unlike the single po
of polynucleotides is sequenced every ve
amplicon at the conclusion of the amplif
digestion, and amplification, the amplic
34 correspond to target polynucleotides 
their initial positions (or more general
to the rolling primer binding site), res
and a knowledge of the sequence of the e
the rolling primers of the next cycle ca
cleotide case, in each successive cycle 
shifts, or advances, the rolling primer 
along the template in the direction of r
a single nucleotide shift takes place in
rolling primers selected for the extensi
ion in the template upon amplification. 
nucleotide of the extension region to on
positioning segment of the rolling prime
below, the pattern of primer selection a
4 of a sequencing operation is illustrat
first cycle, the original template is di
tion and extension. ##STR3## The nucleot
nucleotides in the second column is the 
primer used to produce the amplicon. Gen
the rolling primers of the next cycle is
distal to the terminal nucleotide in the
primer (the leftmost "I" of the "IIA" se
determine which nucleotide, I or A, is c
with the terminal nucleotide (i.e. for t
A, "A" for amplicon C--since A will pair
G--since I will pair with C, and "I" for
with A), (iii) insert the determined nuc
terminal nucleotide. For this embodiment
between extension region sequences is il
regions lead to more complex patterns, b
sible transitions remains the same. ##ST
thirty-two reactions are required, and c
until sequencing is halted. 
on, tags are excised by way of cleavage 
 as described more fully below. The 
arately to their tag complements on 
pplied to the support, depending on 
xity of the tag mixture, and like 
so taken for further processing (50-56) 
ention. Depending on the identity 
de and the identity of the current 
 separately aliquotted into vessels 
 or a sample may be combined with 
 into vessels with rolling primers 
lynucleotide case, when a population 
ssel will almost always contain an 
ication reaction. Thus, after extension, 
ons in the vessels 28, 30, 32, and 
having a T, G (or I), C, and A at 
ly, at the nucleotide position adjacent 
pectively. With this information, 
xtension region of the current amplicon, 
n be selected. As in the single polynu-
a rolling primer is selected that 
binding site one or more nucleotides 
olling primer extension. Preferably, 
 each cycle. As described above, the 
on step also serve to generate a mutat-
The mutation changes the interior-most 
e that is complementary to the template 
r of the current cycle. In the tables 
nd amplicon pooling in cycles 2 through 
ed for the above embodiment. In the 
stributed to four vessels for denatura-
ide to the right of the line between 
terminal nucleotide of the rolling 
erally, the algorithm for determining 
 as follows: (i) drop the nucleotide 
 extension region of the current rolling 
quences in the second column), (ii) 
omplementary to the nucleotide paired 
he above example: "A" for amplicon 
 with I as well as C, "I" for amplicon 
 amplicon T--since I will also pair 
leotide, I or A, to the left of the 
, the general pattern of transitions 
lustrated in FIG. 2b. Longer extension 
ut the basic algorithm defining permis-
R4## Typically, by the eighth cycle 
ontinue to be required, in each cycle 
 
Clearly, additional steps to those outli
example, to separate the initial extensi
stranded DNA and/or the single stranded 
tion of polynucleotides and other reagen
and the like, may be carried out on comm
e.g. Biomek 1000 (Beckman Instruments, F
ned above may be implemented, for 
on product from extraneous single 
nuclease, if one is employed. Manipula-
ts, temperature control for PCRs, 
ercially available laboratory robots, 
ullerton, Calif.). 
Rolling primers and T primers may be con
segment capable of binding to an anchore
via triplex formation for separation, e.
65: 1323-1328 (1993); Cantor et al, U.S.
Thus, for example, magnetic beads carryi
ide can be used to capture the amplicons
containing a nuclease to cleave the tag,
double stranded DNAs that have been sele
unamplified and therefore hemi-methylate
the T primer contains a 5' biotin which 
and conveniently labeled. After capture,
the 3' strands of the double stranded se
by the use of T4 DNA polymerase, or like
leoside triphosphate (dNTP) correspondin
Thus, provided that the flanking nucleot
the strand to the 3' ends, the 3'.fwdarw
rase will strip back the 3' strand to th
an exchange reaction will be initiated t
the flanking nucleotides. The 3' ends of
extension reaction with labeled dNTPs. A
strand can be removed by denaturation an
array for detection. 
structed to have a double stranded 
d single stranded oligonucleotide 
g. as taught by Ji et al, Anal. Chem. 
 Pat. No. 5,482,836; or the like. 
ng such a single stranded oligonucleot-
 and transfer them to a separate vessel 
 e.g. at cleavage site 18, of those 
ctively amplified (other DNAs remain 
d so no cleavage occurs). Preferably, 
permits the released tag to be captured 
 e.g. via avidinated magnetic beads, 
gment are stripped back to the tag 
 enzyme, in the presence of a deoxynuc-
g to the nucleotide flanking the tag. 
ides are not present elsewhere along 
.5' exonuclease activity of the polyme-
e flanking nucleotides, at which point 
hat prevents further stripping past 
 the tag can then be labeled in an 
fter labeling the non-biotinylated 
d applied to the spatially addressable 
 
After the labeled tags are hybridized to
the tags are removed by washing so that 
amplicons can be applied. 
 their tag complements and detected, 
labeled tags from the next set of 
 
Apparatus for Observing Detection Signal
s at Spatially Addressable Sites 
Preferably, a spatially addressable arra
containing tag complements to a solid ph
may be used to detect hybridized tags an
whenever light-generating signals, e.g. 
the like, are employed. For example, a s
in International patent applications PCT
S95/01886, may be employed. Preferably, 
nts are loaded as a fluid-particle slurr
held in place by a combination of nonspe
article to the substrate and a gentle fl
a dam in the flow chamber. An exemplary 
Flow chamber (500) is prepared by etchin
and outlet (504) in a glass plate (506) 
es, e.g. Ekstrom et al, International pa
U.S. Pat. No. 4,911,782; Harrison et al,
and the like. The dimension of flow cham
rticles (508), e.g. GMA beads, may be di
packed planar monolayer of 100-200 thous
a closed chamber with inlet and outlet b
slip (512) onto the etched glass plate (
3,397,279. With the glass cover slip in 
few tens of percent greater than the dia
loaded to ensure that a monolayer is for
(504) is present in glass plate (506) is
ticles in the slurry, but at the same ti
or other reagents, to pass freely. Reage
from syringe pumps (514 through 520) thr
by a microprocessor as is commonly used 
ers, e.g. Bridgham et al, U.S. Pat. No. 
4,252,769; Barstow et al, U.S. Pat. No. 
No. 4,703,913; or the like. 
y is established by fixing microparticle 
ase surface. A variety of apparatus 
d/or enzymatic events on such an array 
chemiluminescent, fluorescent, or 
canning system, such as described 
/US91/09217, PCT/NL90/00081, and PCT/U-
microparticles containing tag compleme-
y into a flow chamber where they are 
cific binding of the DNA on the microp-
ow which pushes the particles against 
apparatus is illustrated in FIG. 3: 
g a cavity having a fluid inlet (502) 
using standard micromachining techniqu-
tent application PCT/SE91/00327; Brown, 
 Anal. Chem. 64: 1926-1932 (1992); 
ber (500) are such that loaded micropa-
sposed in cavity (510) in a closely 
and beads. Cavity (510) is made into 
y anodic bonding of a glass cover 
506), e.g. Pomerantz, U.S. Pat. No. 
place cavity (510) has a height a 
meter of the microparticles being 
med. A dam or shelf adjacent to outlet 
 which forms a barrier to the micropar-
me allows fluid component of the slurry, 
nts are metered into the flow chamber 
ough valve block (522) controlled 
on automated DNA and peptide synthesiz-
4,668,479; Hood et al, U.S. Pat. No. 
5,203,368; Hunkapiller, U.S. Pat. 
 
Specifically hybridized tags are detecte
with illumination beam (524) from light 
mercury arc lamp, or the like. Illuminat
(528) and excites the fluorescent labels
dized to tag complements in flow chamber
is collected by confocal microscope (532
directed to CCD camera (536), which crea
array for processing and analysis by wor
about a 25 nM concentration are passed t
rate of 1-2 .mu.L per minute for 10 minu
buffer consisting of 50 mM NaCl, 3 mM Mg
fluorescent labels carried by the tags a
collected. The tags are melted from the 
buffer through the flow chamber at a flo
55.degree. C. for 10 minutes. 
d by exciting their fluorescent labels 
source (526), which may be a laser, 
ion beam (524) passes through filter 
 on tag complements specifically hybri-
 (500). Resulting fluorescence (530) 
), passed through filter (534), and 
tes an electronic image of the bead 
kstation (538). Preferably, tags at 
hrough the flow chamber at a flow 
tes at 20.degree. C. in a hybridization 
, 10 mM Tris-HCl (pH 8.5), after which 
re illuminated and fluorescence is 
tag complements by passing hybridization 
w rate of 1-2 .mu.L per minute at 
 
In sequencing applications, microparticl
a substrate in variety of ways. The fixa
the microparticles to undergo successive
without significant loss. When the subst
derivatized with an alkylamino linker us
e.g. Pierce Chemical, which in turn may 
conventional chemistries, to form an avi
be introduced to the microparticles in a
es can be fixed to the surface of 
tion should be strong enough to allow 
 cycles of reagent exposure and washing 
rate is glass, its surface may be 
ing commercially available reagents, 
be cross-linked to avidin, again using 
dinated surface. Biotin moieties can 
 number of ways. 
Kits for Implementing the Method of the 
Invention 
The invention includes kits for carrying
invention. Preferably, kits of the inven
for carrying out the extensions and ampl
invention. Kits may also include a reper
to a solid phase support. Additionally, 
the corresponding repertoire of tags, e.
ucleotides to be sorted or as elements o
repertoire of tag complements are attach
contain appropriate buffers for enzymati
e.g. fluorescent or chemiluminescent com
like, instructions for use, processing e
transferases, and so on. In an important
also include substrates, such as a avidi
plates, for fixing microparticles for pr
 out the various embodiments of the 
tion include a set of rolling primers 
ifications in accordance with the 
toire of tag complements attached 
kits of the invention may include 
g. as primers for amplifying the polyn-
f cloning vectors. Preferably, the 
ed to microparticles. Kits may also 
c processing, detection chemistries, 
ponents for labelling tags, and the 
nzymes, such as ligases, polymerases, 
 embodiment for sequencing, kits may 
nated microscope slides or microtiter 
ocessing. 
EXAMPLE 1 
 
Construction of a Tag Library 
 
An exemplary tag library is constructed 
synthesized 9-word tags of nucleotides A
3'-TGGC-›.sup.4 (A,G,T).sub.9 !-CCCCp 
as follows to form the chemically 
, G, and T defined by the formula: 
 
where "›.sup.4((A,G,T).sub.9 !" indica
sts of nine 4-mer words of A, G, and T; 
mixture is ligated to the following righ
(SEQ ID NO: 10 and SEQ ID NO: 11): 
tes a tag mixture where each tag consi-
and "p" indicate a 5' phosphate. This 
t and left primer binding regions 
 
    ____________________________________
__
    5'- AGTGGCTGGGCATCGGACCG
 
                     5'- GGGGCCCAGTCAGCG
TCGAT
        TCACCGACCCGTAGCCp
 
                                        
         GGGTCAGTCGCAGCTA
              LEFT                      
                   RIGHT
    ____________________________________
__
The right and left primer binding region
after which the single stranded portion 
with DNA polymerase then mixed with the 
and amplified to give a tag library. 
s are ligated to the above tag mixture, 
of the ligated structure is filled 
right and left primers indicated below 
 
    ____________________________________
______________________________________
        Left primer:
 
    5'40 - AGTGGCTGGGCATCGGACCG
 
    5'40 - AGTGGCTGGGCATCGGACCG-
 
                         ›.sup.4( (A, 
G, T).sub.9 !-
                                GGGGCCCA
GTCAGCGTCGAT
        TCACCGACCCGTAGCCTGGC-
 
                       ›.sup.4( (A, G,
 T).sub.9 !-
                                CCCCGGGT
CAGTCGCAGCTA
                                        
             CCCCGGGTCAGTCGCAGCTA-5'
                                 Right p
rimer
    ____________________________________
The underlined portion of the left prime
recognition site. The left-most underlin
region indicates recognition sites for B
a cleavage site for Hga I. The right-mos
binding region indicates the recognition
or left primers may be synthesized with 
reagents, e.g. available from Clontech L
facilitate purification after amplificat
______________________________________-
r binding region indicates a Rsr II 
ed region of the right primer binding 
sp 120I, Apa I, and Eco O 109I, and 
t underlined region of the right primer 
 site for Hga I. Optionally, the right 
a biotin attached (using conventional 
aboratories, Palo Alto, Calif.) to 
ion and/or cleavage. 
EXAMPLE 2 
 
Construction of a Plasmid Library of Tag
"Signature" Sequencing 
-Polynucleotide Conjugates for cDNA 
 
cDNA is produced from an mRNA sample by 
(A or G or C) as a primer for first stra
of the poly A region of the mRNAs and N.
second strand synthesis. That is, both a
second strand primer is present in two f
present in three forms. The GATC sequenc
nds to the recognition site of Mbo I; ot
be used as well, such as those for Bam H
presence of the A and T adjacent to the 
primer ensures that a stripping and exch
step to generate a five-base 5' overhang
is annealed to the mRNA sample and exten
which the RNA strand is degraded by the 
riptase leaving a single stranded cDNA. 
and extended with a DNA polymerase using
strand synthesis, the resulting cDNAs ar
England Biolabs, Beverly, Mass.) using m
of the cDNAs are then cut back with the 
reaction using T4 DNA polymerase in the 
the cDNAs are ligated to the tag library
Hga I to give the following construct: #
cloning vector (SEQ ID NO: 12) is constr
available plasmid, such as a Bluescript 
##STR6## The rolling primer binding site
subgroup (1), described above. The plasm
I (to give a Rsr II-compatible end and a
ted) and then methylated with DAM methyl
is cleaved with Rsr II and then ligated 
conjugate is cleaved with Mbo I and Bam 
of the plasmid. The plasmid is then ampl
for extensions and amplifications in acc
conventional protocols using pGGCCCT.sub.15 
nd synthesis anchored at the boundary 
sub.8 (A or T)GATC as the primer for 
re degenerate primers such that the 
orms and the first strand primer is 
e in the second strand primer correspo-
her four base recognition sites could 
1, Sph I, Eco RI, or the like. The 
restriction site of the second strand 
ange reaction can be used in the next 
 of "GGCCC". The first strand primer 
ded with reverse transcriptase, after 
RNase H activity of the reverse transc-
The second strand primer is annealed 
 conventional protocols. After second 
e methylated with CpG methylase (New 
anufacturer's protocols. The 3' strands 
above-mentioned stripping and exchange 
presence of dATP and dTTP, after which 
 of Example 1 previously cleaved with 
#STR5## Separately, the following 
ucted, e.g. starting from a commercially 
phagemid (Stratagene, La Jolla, Calif.). 
 corresponds to a rolling primer of 
id is cleaved with Ppu MI and Pme 
 flush end so that the insert is orien-
ase. The tag-containing construct 
to the open plasmid, after which the 
HI to permit ligation and closing 
ified and isolated for use as a template 
ordance with the invention. 
EXAMPLE 3 
 
Signature Sequencing of a cDNA Library 
 
The plasmid constructed in Example 2 is 
and amplicons with the rolling primers d
primer (SEQ ID NO: 13): 
used for generating extension products 
escribed above and the following T 
 
biotin-5'-IIIIIIIIAAAAGGAGGAGGCCTTGA 
 
where the I's are deoxyinosines added to
temperatures of the T primers and rollin
temperature is about 55.degree. C. Clear
employed in the implementation of the in
above are employed. 
 balance the annealing and melting 
g primers. Preferably, the annealing 
ly, many other sequences could be 
vention. The rolling primers described 
 
The segment containing the T primer bind
binding site is excised and separated fr
can be accomplish in a variety of ways k
example, engineering the plasmid to cont
segment, or by simply amplifying directl
ines with deoxyinosines, e.g. by PCR in 
is aliquotted into four vessels, denatur
is added. Conditions are adjusted to per
after which the primers are extended wit
polymerase, in the presence of dATP, dCT
rer's protocol. The remaining single str
stranded nuclease, such as Mung bean nuc
DNA extension product may be separated f
capture via the formation of a triplex b
binding region, and an appropriate singl
a magnetic bead. 
ing site through the rolling primer 
om the plasmid of example 2. (This 
now to those skilled in the art, for 
ain restriction sites flanking the 
y by PCR). After replacing deoxyguanos-
the presence of dITP, the segment 
ed, and the appropriate rolling primer 
mit the rolling primers to anneal, 
h Sequenase, or like high fidelity 
P, dITP, and dTTP, using the manufactu-
anded DNA is digested with a single 
lease. Optionally, the double stranded 
rom the reaction mixture, e.g. by 
etween, for example, the T primer 
e stranded complement attached to 
 
The double stranded DNA is combined with
separation step was used) and amplified 
of dATP, dCTP, dITP, and dTTP to form th
of these are combined and re-distributed
rolling primers for the next cycle of ex
for analysis. 
 T primer (and rolling primer if a 
by 5-10 cycles of PCR in the presence 
e four initial amplicons. Samples 
 into vessels with the appropriate 
tension. Samples are also drawn off 
 
Preferably, the samples for analysis are
carrying a single stranded sequence that
The beads are then transferred to reacti
cleaves the tags from the target polynuc
ID NO: 14) containing the tags are next 
primers with magnetic beads coated with 
vessels where their 3' ends are stripped
and dGTP, as shown below: 
 separately captured on magnetic beads 
 forms a triplex with the S primers. 
on mixtures containing Apa I, which 
leotide. The released strands (SEQ 
captured via their biotinylated T 
avidin and transferred to reaction 
 in the presence of T4 DNA polymerase 
 
After cleavage with Apa I: 
 
    ____________________________________
______________________________________
                                      bi
otin-5'40 -IIIIIIII›AG!.sub.12
    TAGAGAGGACCG› TAGS !GGGGCC
 
              CCCCCCCC›TC!.sub.12 ATCT
CTCCTGGC› SGAT !CC
                               .dwnarw. 
        T4 polymerase + dGTP
     biotin-5'40 -  IIIIIIII›AG!.sub.1
2 TAGAGAGGACCG› TAGS !GG
                                       G
GC› SGAT !CC
                               .dwnarw. 
        Add dUTP*, dCTP   ddATP
     biotin-5'40 -  IIIIIIII›AG!.sub.1
2 TAGAGAGGACCG› TAGS !GG
                             dAUCUCUCCUG
GC› SGAT !CC
                               * * *  *
 
                               .dwnarw. 
        Heat denature
                        dAUCUCUCCUGGC›
 SGAT !CC-5'40
                         * * *  *
 
    ____________________________________
Here dUTP* represents a labeled dUTP and
triphosphate. Preferably, dUTP is labele
fluorescent dye for each of the four amp
NO: 15) for each of the amplicon are mix
addressable array for hybridization to t
______________________________________-
 ddATP represents dideoxyadenosine 
d with a separate spectrally resolvable 
licons. The released tags (SEQ ID 
ed and are applied to the spatially 
heir complements and detection. 
Example 4 
 
Sequencing a Target Polynucleotide with 
Conversion to RNA 
In this example, a dsDNA template is seq
embodiment employing cyclical conversion
vector (SEQ ID NO: 16) is prepared from 
pUC 19, by inserting a T7 promoter eleme
primer binding site (single underline) i
of the polylinker region: ##STR7## After
into the Bam HI site, the beginning temp
RNA transcription takes place after the 
with HinD III. FIGS. 2a and 2b illustrat
of the template and rolling primer bindi
eight cycles. In each cycle a single nuc
Arrows (210) indicate the nucleotide pos
and the double underlined nucleotide of 
the resulting change. In this example, t
with the indicated extension regions are
(Promega, Madison, Wis.). Lower case "a"
oxo-A, and lower case "g" in the extensi
cation step is carried out with PCR usin
ining the T7 promoter element (underline
primers: 
uenced with rolling primers in the 
 of the template into RNA. The following 
a standard cloning vector, such as 
nt (double underline) and a rolling 
nto the indicated restriction sites 
 insertion of a target polynucleotide 
late (200) of FIG. 2a is obtained. 
vector has been linearized by cleaving 
e the changes that occur in sequences 
ng site as the process is taken through 
leotide in the template is identified. 
itions where mutations take place, 
the "converted template" indicates 
he p1 and p2 primers described above 
 employed with reverse transcriptase 
 in the extension region indicates 
on region indicates oxo-G. The amplifi-
g a forward primer (shown below) conta-
d) and the following p1 and p2 reverse 
 
Forward primer (SEQ ID NO: 17): 
 
5'-AATTTAATACGACTCACTATAGGGAGAATTCGAGCTC
GGTACCCGGG 
p1 reverse primer (SEQ ID NO: 18): 
 
5'-IGIGGIGTGTITTTTTIGIGG 
 
p2 reverse primer (SEQ ID NO: 19): 
 
5'-IGIGGITGTGTITTTTIGIGG 
 
RNA is produced from the dsDNA template 
(Promega, Madison, Wis.) using the manuf
50 .mu.l reaction volume, 0.1 pmol of ds
T7 RNA polymerase, 1.5 U/.mu.l human pla
19 U/.mu.l inorganic pyrophosphatase and
C. for 2-4 hours in transcription buffer
MgCl.sub.2, 2 mM spermidine-HCl, 40 mM D
cleoside triphosphates). After adding 47
MnCl.sub.2 and heating at 65.degree. C. 
I (U.S. Biochemical) is added and the mi
for 30 min. RNA is then purified from th
(Santa Clarita, Calif.) RNA purification
Four separate reverse transcription reac
ing a 1-10 pmol aliquot of the transcrib
rolling primer labeled with fluorescein,
Corp. Applied Biosystems Division, Foste
65.degree. C. for 5 min. to denature the
on ice and reverse transcriptase (0.1 U/
is added in a buffer consisting of 50 mM
50 mM NaCl , 10 mM DTT, and 25-50 .mu.M 
osphates, so that a reaction volume of 1
mixtures are incubated at 50-55.degree. 
incubated at 95.degree. C. for 5 min., t
remaining RNA. The four reaction mixture
a 3'-terminal A, C, G, and T, respective
of the four reactions will result in the
The product is identified by separating 
phoresis, after which the band containin
and the ssDNA recovered. 
with a RiboMax RNA production system 
acturer's protocol. Briefly, in a 
DNA template is combined with 30 U/.mu.l 
cental ribonuclease inhibitor, and 
 the mixture is incubated at 37.degree. 
 (80 mM HEPES-KOH (pH 7.5), 24 mM 
TT, and 7.5 mM each of the four ribonu-
 .mu.l H.sub.2 O and 1 .mu.l 100 mM 
for 5 min., 2 .mu.l (4.2 U) of DNase 
xture is incubated at 37.degree. C. 
e reaction mixture using a QIAGEN 
 system (elution volume 30 .mu.l). 
tion mixtures are formed, each contain-
ed RNA and 5 pmol of the appropriate 
 e.g FAM (available form Perkin-Elmer 
r City, Calif.). After heating to 
 RNA, the reaction mixture is cooled 
.mu.l) and RNase inhibitor (0.85 U/.mu.l) 
 Tris-HCl (pH 8.1), 8 mM MgCl.sub.2, 
each of the four deoxynucleoside triph-
0 .mu.l is obtained. The reaction 
C. for 5 min., after which they are 
hereby effectively destroying any 
s correspond to rolling primers with 
ly. Thus, in each cycle, only one 
 synthesis of a ssDNA extension product. 
the reaction components by gel electro-
g the extension product is excised 
 
The dsDNA template is re-formed by ampli
in a conventional PCR using the primers 
fying the ssDNA extension product 
listed above. 
EXAMPLE 5 
 
Effect of dNTP Concentration on Primer S
election on an RNA Template 
In this example, the affect of different
ations on primer selection was examined 
primer selection, and amplification. The
out as described in Example 4, with the 
employed a mixture of four primers each,
A's, C's, G's and T's competed against o
transcriptase at the indicated concentra
in FIGS. 5a-5c, show that dNTP concentra
to the greatest selectivity in primer ex
For FIG. 5a the correct primer was the f
 deoxynucleoside triphosphate concentr-
through three cycles of RNA synthesis, 
 steps of each cycle were carried 
exception that the extension reactions 
 so that the primers having 3'-terminal 
ne another for extension by reverse 
tions of dNTPs. The results, illustrated 
tions at about 50 .mu.M or below lead 
tension by reverse transcriptase. 
ollowing: 
5'- . . . GIGaTC . . . CCCUAGagaa . . . 
 
For FIG. 5b, the correct primer was the 
following: 
5'- . . . GIDgCT . . . CCCUAGAgaa . . . 
 
For FIG. 5c the correct primer was the f
ollowing: 
5'- . . . GIGaTC . . . CCCUCGAGaa . . . 
 
    ____________________________________
______________________________________
    #             SEQUENCE LISTING
 
    - (1) GENERAL INFORMATION:
 
    -    (iii) NUMBER OF SEQUENCES:  19
 
    - (2) INFORMATION FOR SEQ ID NO: 1:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 22 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #1:   (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #                 22ANN NN
 
    - (2) INFORMATION FOR SEQ ID NO: 2:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 22 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #2:   (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #                 22GNN NN
 
    - (2) INFORMATION FOR SEQ ID NO: 3:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 22 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #3:   (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #                 22GNN NN
 
    - (2) INFORMATION FOR SEQ ID NO: 4:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 22 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #4:   (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #                 22ANN NN
 
    - (2) INFORMATION FOR SEQ ID NO: 5:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 22 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #5:   (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #                 22ANN NN
 
    - (2) INFORMATION FOR SEQ ID NO: 6:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 22 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #6:   (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #                 22GNN NN
 
    - (2) INFORMATION FOR SEQ ID NO: 7:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 26 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    -     (ix) FEATURE:
 
              (A) NAME/KEY: primer
 
    #        a.B) LOCATION: n.
 
              (C) IDENTIFICATION METHOD:
 - # n.a.
    #first "N" is preferably deoxyinosin
e
    #7:   (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #              26  GGGG GNNNNN
 
    - (2) INFORMATION FOR SEQ ID NO: 8:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 26 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    -     (ix) FEATURE:
 
              (A) NAME/KEY: primer
 
              (B) LOCATION: n.a.
 
              (C) IDENTIFICATION METHOD:
 - # n.a.
    #first and second "N's" are preferab
ly
                   deoxyinosine
 
    #8:   (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #              26  GGGG GNNNNN
 
    - (2) INFORMATION FOR SEQ ID NO: 9:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 25 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #9:   (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #               25 CTTC CNNNN
 
    - (2) INFORMATION FOR SEQ ID NO: 10:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 20 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  doub - 
#le
              (D) TOPOLOGY:  linear
 
    #10:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    # 20               ACCG
 
    - (2) INFORMATION FOR SEQ ID NO: 11:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 20 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  doub - 
#le
              (D) TOPOLOGY:  linear
 
    #11:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    # 20               CGAT
 
    - (2) INFORMATION FOR SEQ ID NO: 12:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 62 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  doub - 
#le
              (D) TOPOLOGY:  linear
 
    #12:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #              50TTGATA GAGAGGACCT G
TTTAAACGG ATCCGCTGCT
    #       62
 
    - (2) INFORMATION FOR SEQ ID NO: 13:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 26 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #13:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #              26  GAGG CCTTGA
 
    - (2) INFORMATION FOR SEQ ID NO: 14:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 43 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  doub - 
#le
              (D) TOPOLOGY:  linear
 
    #14:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    # 43               AGAG GAGAGAGAGA G
TAGAGAGGA CCG
    - (2) INFORMATION FOR SEQ ID NO: 15:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 12 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #15:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #       12
 
    - (2) INFORMATION FOR SEQ ID NO: 16:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 104 nucleot - 
#ides
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #16:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #              50CACTAT AGGGAGAATT C
GAGCTCGGT ACCCGGGGAT
    #             100CACACC CCCGTCGACC T
GCAGGCATG CAAGCTTGGC
    #            104
 
    - (2) INFORMATION FOR SEQ ID NO: 17:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 47 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #17:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #                47CTAT AGGGAGAATT C
GAGCTCGGT ACCCGGG
    - (2) INFORMATION FOR SEQ ID NO: 18:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 21 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #18:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #21                NGNG G
 
    - (2) INFORMATION FOR SEQ ID NO: 19:
 
    -      (i) SEQUENCE CHARACTERISTICS:
 
              (A) LENGTH: 21 nucleoti - 
#des
              (B) TYPE: nucleic acid
 
              (C) STRANDEDNESS:  sing - 
#le
              (D) TOPOLOGY:  linear
 
    #19:  (xi) SEQUENCE DESCRIPTION: SEQ
 ID NO:
    #21                NGNG G
 
    ____________________________________
* * * * *
______________________________________-