PCR-based versus Shotgun-Clone DNA sequence analysis
(a) A large genome (106~7 or more bps) is cut into random fragments and cloned into a plasmid vector library. A large number of these plasmids are sequenced at random, using a plasmid-specific sequencing primer (purple squiggle). The final genomic sequence is obtained by assembling the sequences of multiple overlapping fragments. In this example, a particular nucleotide position (red line) is a consensus of three overlapping clones (in practice, any given position may be read 5 ~10 or more times). The goal of the sequencing effort is to obtain the complete sequence of a single large genome with great accuracy. Full-length reading of any given clone is not essential, since genome coverage is redundant and regions with low sequence accuracy can be automatically excluded from the consensus.
(b) A small mitochondrial genome (~1.7 x 104 bps) is PCR-amplified as a set of ~17 1-kbp overlapping fragments. The PCR primer pairs (red & blue line segments) are designed based on a knowledge of the sequence from the same or related species. Multiple genomes are analyzed: in the example, 10 different individuals are sequenced (in practice, this may be hundreds from any one species). The sequence of any one nucleotide is read exactly twice, from the forward & reverse primers for a particular fragment. The goal of the sequencing effort is to identify single-nucleotide polymorphisms (SNPs) among many individuals (red blocks). Accurate, full-length reading of longer sequences (green squiggles) is essential to identify genuine SNPs within populations from a minimum number of PCR fragments.