Methods for Accurately Determining Allele Content in DNA Pools
Carolyn Phillips
Writer’s comment: From the moment I learned of Mendel and his pea plants in a freshman high school biology class, genetics has fascinated me. I have pursued this interest at UC Davis through genetics classes and a laboratory assistant job in a genetics lab. When assigned to write a review paper in English 104E (Scientific Writing), I knew immediately that I wanted to make use of the assignment to further my understanding of the research aspects of genetics. I chose the topic of DNA pooling because of its importance in my own research experience. In fact, examining the literature devoted to pooling analysis techniques proved to be both interesting and immensely valuable.
- Carolyn Phillips
Instructor’s comment: For the review assignment in English 104E: Scientific Writing, Carolyn Phillips wisely chose a topic based in genetics, a field that holds a keen interest for her and in which she already had significant experience. That background enabled her to identify a focus for her paper that is narrow enough to be clearly explainable within the constraints of the assignment. Based on a chronological organizing scheme, Carolyn’s paper describes developments and refinements in an effort to correct for errors inherent in the DNA pooling methods employed in genetics research. I admire the clarity and efficiency with which she explains these developments.
- Sondra Reid, English Department
Introduction
Many genetic diseases have yet to be located on the human genome for reasons that include their multiple loci and incomplete penetrance. To pinpoint these loci in terms of particular regions of the chromosomes, association studies, which compare allele frequency between affected individuals (probands) and controls, must be performed across the entire human genome. With approximately 0.4 cMs between markers, 10,000 microsatellite markers would be necessary to fully saturate the genome (Collins et al., 2000). For a study of 1000 probands and 1000 controls, 20 million genotypings would be required (Collins et al., 2000). Because of the relative impossibility of such a task, a technique called DNA pooling has been developed. DNA pooling involves the mixing of equal amounts of DNA from each individual in a group and then proceeding as one would with individual samples—by performing a PCR and running the samples out on a gel. DNA pooling can determine total allele content of a group, for one microsatellite, without the need to individually genotype each individual in that group. As such, it is an effective way to decrease the number of genotypings required, reducing the workload by factors of tens or hundreds.
Although the benefits of DNA pooling are immense, two significant sources of error must be addressed. The first, known as stutter peaks, results from a slippage of the DNA polymerase during the PCR replication process. Stutter peaks appear as progressively smaller peaks before each real allele peak and result in artificially inflated values for the smaller allele sizes. This error is consistent and reproducible for a particular marker (Perlin et al., 1995). The other major source of error is caused by preferential amplification of some alleles over others. In this situation, uneven PCR replication and amplification result from the differing sizes of the fragments being replicated. Over the years, several methods have been developed to overcome these sources of error, making DNA pooling a practical method of screening for disease loci.
Early Methods for Stutter Correction
LeDuc et al. (1995) developed one of the first methods to correct for the stutter artifact by measuring the allele and stutter peak heights for the smallest allele of the pool and any allele not immediately adjacent to other alleles. This information was used to calculate a stutter ratio, which was then used to correct the rest of the allele peak heights, providing estimates on the total allele content of the pool (LeDuc et al., 1995).
Also correcting for stutter peaks, Perlin et al. (1995) developed a more complex and accurate method, which utilized the stutter peaks’ consistent nature to predict and subsequently eliminate the error. The stutter for each allele of a microsatellite was measured and put into columns of a matrix. Once the matrix had been developed, a number of mathematical algorithms were enlisted to deconvolute the stutter patterns and produce the allele frequencies. The individual stutter patterns detected were so complicated that Perlin et al. (1995) hypothesized that multiple markers could be run on top of one another on a gel and could be separated and genotyped using developed matrixes.
Stutter and Amplification Correction Method
Using previously developed methods to eliminate stutter, Barcellos et al. (1997) created their own statistical method, the stutter and amplification correction method, to eliminate error resulting from stutter peaks and preferential amplification. This method involves the individual typing of 20 individuals for each marker prior to use in pooling so that a correction matrix can be developed (Barcellos et al., 1997). This matrix can be used on all pools subsequently genotyped for that marker, significantly increasing pooling accuracy over previous methods. Although the individual genotyping required by this method needs to be performed only once, after which the information remains constant, Collins et al. (2000) believe that this extra step could increase the work necessary by more than an order of magnitude. They suggest that collaboration among several laboratories and division of the genotyping would increase efficiency (Barcellos et al., 1997).
Allele Image Patterns Method
Daniels et al. (1998) wanted a technique that could correct for stutter peaks and differential amplification without requiring the labor-intensive individual typing required in previous methods. They developed a method they called the allele image patterns (DAIP) method. This method involves overlaying two images, one each from the control and proband pools, obtained through the GENOTYPER program, which converts fluorescently labeled bands from a gel into an easily viewed image. The pools are compared for allele differences by finding the areas not shared by the two images and adding up the area they do not have in common. The test statistic, DAIP, is calculated by dividing the unshared area by the total area under the curves (Daniels et al., 1998).
The method is intended to compare the total allele difference between the pools rather than to actually determine the allele frequencies of the pools themselves. Instead of trying to correct for the errors of stutter peaks and differential amplification, the DAIP method simply eliminates the problems through the comparison of images; the errors essentially cancel each other out (Daniels et al., 1998). Although the DAIP method provides less information about the marker, the difference between pools is of the most interest to researchers. These differences provide information as to whether the marker shows large allele differences and is thus of further interest. This method gives only a total allele difference and does not reveal which alleles show the differences, information that can also be quite useful to researchers. Further, errors can result from the failure of this technique to correct for baseline variation, for shifts in allele running size resulting in inaccurate image overlays, and for scaling problems (Collins et al., 2000). As a result of these and other problems, Daniels et al. (1998) recommend using the technique only as an initial screening method and not as the basis of research results.
Total Allele Content Method
Collins et al. (2000) combined the attributes of the stutter and amplitude correction method and the DAIP method to create the total allele content (DTAC) method. This method involves calculating the peak height of each allele in a pool and determining that peak’s percent contribution to that pool. These individual allele content (DIAC) values are compared between pools and added to discern the total allele difference between the pools, the DTAC (Collins et al., 2000). As with the DAIP method, the DTAC method finds the difference between two pools rather than an absolute value of allele content in each pool. Unlike the DAIP method, however, the individual allele differences between pools can also be examined. When compared with the actual allele content of the pools, the DTAC value showed good correlation, and a correlation coefficient could be determined.
Collins et al. (2000) also compared the accuracy of their method with that of the stutter and amplification correction method and the DAIP method. They found that the DTAC method was similar in accuracy to the stutter and amplitude correction method, both of which were far more accurate than the DAIP method. The DTAC method exceeded the stutter and amplification correction method regarding preparation, needing no initial individual genotyping. As a result of these comparisons, Collins et al. (2000) say that the DTAC method is currently the best alternative to correcting the errors of DNA pooling.
Conclusion
The methods examined above showcase several techniques currently used to determine total allele content in DNA pools. Accurately finding the allele frequencies is vital to the value of DNA pooling. Errors can be dangerous; false-positives may lead to time lost while individually genotyping markers with no significant differences. Even more important, false-negatives may result in the discarding of markers with true allelic differences. On the other hand, a pooling method analysis that requires too much prep work negates the principal advantage of pooling, that of reducing workload. Thus, the ideal method needs to be highly accurate and efficient. Up until now, the DTAC method seems to lie closest to this middle ground, but only further experimentation will tell whether it stands up to the ease and accuracy to which it lays claim.
Literature Cited
Barcellos, L.F., Klitz, W., Field, L.L., Tobias, R., Bowcock, A.M., Wilson, R., Nelson, M.P., Nagatomi, J., Thomson, G. (1997). Association mapping of disease loci by use of a pooled DNA genomic screen. Am J Hum Genet 61:734–747.
Collins, H.E., Li, H., Inda, S.E., Anderson, J., Laiho, K., Tuomilehto, J., Seldin, M. (2000). A simple and accurate method for determination of microsatellite total allele content differences between DNA pools. Human Genetics, in press (accessible online through Springer Link, DOI:10.1007/s004399900213).
Daniels, J., Holmans, P., Williams, N., Turic, D., McGuffin, P., Plomin, R., Owen, M.J. (1998). A simple method for analyzing microsatellite allele image patterns generated from DNA pools and its application to allelic association studies. Am J Hum Genet 62:1189–1197.
LeDuc, C., Miller, P., Lichter, J., Parry, P. (1995). Batched analysis of genotypes. PCR Methods Appl 4:331–336.
Perlin, M.W., Lancia, G., Ng, S.K. (1995). Toward fully automated genotyping: Genotyping microsatellite markers by deconvolution. Am J Hum Genet 57:1199–1210.