METHODS FOR ACCURATELY DETERMINING ALLELE CONTENT IN DNA POOLS
Carolyn Phillips
Writer’s comment:
From the moment I learned of Mendel and his pea plants in a freshman
high school biology class, genetics has fascinated me. I have pursued
this interest at UC Davis through genetics classes and a laboratory
assistant job in a genetics lab. When assigned to write a review paper
in English 104E (Scientific Writing), I knew immediately that I wanted
to make use of the assignment to further my understanding of the
research aspects of genetics. I chose the topic of DNA pooling because
of its importance in my own research experience. In fact, examining the
literature devoted to pooling analysis techniques proved to be both
interesting and immensely valuable.
- Carolyn Phillips
Instructor’s comment:
For the review assignment in English 104E: Scientific Writing, Carolyn
Phillips wisely chose a topic based in genetics, a field that holds a
keen interest for her and in which she already had significant
experience. That background enabled her to identify a focus for her
paper that is narrow enough to be clearly explainable within the
constraints of the assignment. Based on a chronological organizing
scheme, Carolyn’s paper describes developments and refinements in an
effort to correct for errors inherent in the DNA pooling methods
employed in genetics research. I admire the clarity and efficiency with
which she explains these developments.
- Sondra Reid, English Department
Introduction
Many genetic diseases have yet to be located on the human
genome for reasons that include their multiple loci and incomplete
penetrance. To pinpoint these loci in terms of particular regions of
the chromosomes, association studies, which compare allele frequency
between affected individuals (probands) and controls, must be performed
across the entire human genome. With approximately 0.4 cMs between
markers, 10,000 microsatellite markers would be necessary to fully
saturate the genome (Collins et al., 2000). For a study of 1000
probands and 1000 controls, 20 million genotypings would be required
(Collins et al., 2000). Because of the relative impossibility of such a
task, a technique called DNA pooling has been developed. DNA pooling
involves the mixing of equal amounts of DNA from each individual in a
group and then proceeding as one would with individual samples—by
performing a PCR and running the samples out on a gel. DNA pooling can
determine total allele content of a group, for one microsatellite,
without the need to individually genotype each individual in that
group. As such, it is an effective way to decrease the number of
genotypings required, reducing the workload by factors of tens or
hundreds.
Although the benefits of DNA pooling are immense, two
significant sources of error must be addressed. The first, known as
stutter peaks, results from a slippage of the DNA polymerase during the
PCR replication process. Stutter peaks appear as progressively smaller
peaks before each real allele peak and result in artificially inflated
values for the smaller allele sizes. This error is consistent and
reproducible for a particular marker (Perlin et al., 1995). The other
major source of error is caused by preferential amplification of some
alleles over others. In this situation, uneven PCR replication and
amplification result from the differing sizes of the fragments being
replicated. Over the years, several methods have been developed to
overcome these sources of error, making DNA pooling a practical method
of screening for disease loci.
Early Methods for Stutter Correction
LeDuc et al. (1995) developed one of the first methods to
correct for the stutter artifact by measuring the allele and stutter
peak heights for the smallest allele of the pool and any allele not
immediately adjacent to other alleles. This information was used to
calculate a stutter ratio, which was then used to correct the rest of
the allele peak heights, providing estimates on the total allele
content of the pool (LeDuc et al., 1995).
Also correcting for stutter peaks, Perlin et al. (1995)
developed a more complex and accurate method, which utilized the
stutter peaks’ consistent nature to predict and subsequently eliminate
the error. The stutter for each allele of a microsatellite was measured
and put into columns of a matrix. Once the matrix had been developed, a
number of mathematical algorithms were enlisted to deconvolute the
stutter patterns and produce the allele frequencies. The individual
stutter patterns detected were so complicated that Perlin et al. (1995)
hypothesized that multiple markers could be run on top of one another
on a gel and could be separated and genotyped using developed matrixes.
Stutter and Amplification Correction Method
Using previously developed methods to eliminate stutter,
Barcellos et al. (1997) created their own statistical method, the
stutter and amplification correction method, to eliminate error
resulting from stutter peaks and preferential amplification. This
method involves the individual typing of 20 individuals for each marker
prior to use in pooling so that a correction matrix can be developed
(Barcellos et al., 1997). This matrix can be used on all pools
subsequently genotyped for that marker, significantly increasing
pooling accuracy over previous methods. Although the individual
genotyping required by this method needs to be performed only once,
after which the information remains constant, Collins et al. (2000)
believe that this extra step could increase the work necessary by more
than an order of magnitude. They suggest that collaboration among
several laboratories and division of the genotyping would increase
efficiency (Barcellos et al., 1997).
Allele Image Patterns Method
Daniels et al. (1998) wanted a technique that could
correct for stutter peaks and differential amplification without
requiring the labor-intensive individual typing required in previous
methods. They developed a method they called the allele image patterns
(DAIP) method. This method involves overlaying two images, one each
from the control and proband pools, obtained through the GENOTYPER
program, which converts fluorescently labeled bands from a gel into an
easily viewed image. The pools are compared for allele differences by
finding the areas not shared by the two images and adding up the area
they do not have in common. The test statistic, DAIP, is calculated by
dividing the unshared area by the total area under the curves (Daniels
et al., 1998).
The method is intended to compare the total allele
difference between the pools rather than to actually determine the
allele frequencies of the pools themselves. Instead of trying to
correct for the errors of stutter peaks and differential amplification,
the DAIP method simply eliminates the problems through the comparison
of images; the errors essentially cancel each other out (Daniels et
al., 1998). Although the DAIP method provides less information about
the marker, the difference between pools is of the most interest to
researchers. These differences provide information as to whether the
marker shows large allele differences and is thus of further interest.
This method gives only a total allele difference and does not reveal
which alleles show the differences, information that can also be quite
useful to researchers. Further, errors can result from the failure of
this technique to correct for baseline variation, for shifts in allele
running size resulting in inaccurate image overlays, and for scaling
problems (Collins et al., 2000). As a result of these and other
problems, Daniels et al. (1998) recommend using the technique only as
an initial screening method and not as the basis of research results.
Total Allele Content Method
Collins et al. (2000) combined the attributes of the
stutter and amplitude correction method and the DAIP method to create
the total allele content (DTAC) method. This method involves
calculating the peak height of each allele in a pool and determining
that peak’s percent contribution to that pool. These individual allele
content (DIAC) values are compared between pools and added to discern
the total allele difference between the pools, the DTAC (Collins et
al., 2000). As with the DAIP method, the DTAC method finds the
difference between two pools rather than an absolute value of allele
content in each pool. Unlike the DAIP method, however, the individual
allele differences between pools can also be examined. When compared
with the actual allele content of the pools, the DTAC value showed good
correlation, and a correlation coefficient could be determined.
Collins et al. (2000) also compared the accuracy of their
method with that of the stutter and amplification correction method and
the DAIP method. They found that the DTAC method was similar in
accuracy to the stutter and amplitude correction method, both of which
were far more accurate than the DAIP method. The DTAC method exceeded
the stutter and amplification correction method regarding preparation,
needing no initial individual genotyping. As a result of these
comparisons, Collins et al. (2000) say that the DTAC method is
currently the best alternative to correcting the errors of DNA pooling.
Conclusion
The methods examined above showcase several techniques
currently used to determine total allele content in DNA pools.
Accurately finding the allele frequencies is vital to the value of DNA
pooling. Errors can be dangerous; false-positives may lead to time lost
while individually genotyping markers with no significant differences.
Even more important, false-negatives may result in the discarding of
markers with true allelic differences. On the other hand, a pooling
method analysis that requires too much prep work negates the principal
advantage of pooling, that of reducing workload. Thus, the ideal method
needs to be highly accurate and efficient. Up until now, the DTAC
method seems to lie closest to this middle ground, but only further
experimentation will tell whether it stands up to the ease and accuracy
to which it lays claim.
Barcellos, L.F., Klitz, W., Field, L.L., Tobias, R., Bowcock, A.M., Wilson, R., Nelson, M.P., Nagatomi, J., Thomson, G. (1997). Association mapping of disease loci by use of a pooled DNA genomic screen. Am J Hum Genet 61:734–747.
Collins, H.E., Li, H., Inda, S.E., Anderson, J., Laiho, K., Tuomilehto, J., Seldin, M. (2000). A simple and accurate method for determination of microsatellite total allele content differences between DNA pools. Human Genetics, in press (accessible online through Springer Link, DOI:10.1007/s004399900213).
Daniels, J., Holmans, P., Williams, N., Turic, D., McGuffin, P., Plomin, R., Owen, M.J. (1998). A simple method for analyzing microsatellite allele image patterns generated from DNA pools and its application to allelic association studies. Am J Hum Genet 62:1189–1197.
LeDuc, C., Miller, P., Lichter, J., Parry, P. (1995). Batched analysis of genotypes. PCR Methods Appl 4:331–336.
Perlin, M.W., Lancia, G., Ng, S.K. (1995). Toward fully automated genotyping: Genotyping microsatellite markers by deconvolution. Am J Hum Genet 57:1199–1210.