Question: Re: Assemble A Consensus Genome From Ngs Data
0
Benjamin Dickins • 20 wrote:
Hi David,
I'm sorry for a slow response. Relatively recently I solved a problem
a bit like this and would be happy to share more information with you.
If your genome is small I think it makes sense to map to a reference
and identify variant sites. (In my opinion de novo assembly isn't
needed - see below).
A basic approach is: groom FASTA file -> map with BWA -> filter SAM
(uniquely mapped reads only) -> SAM-to-BAM -> Generate pileup ->
Filter pileup
This gives you a position-by-position summary relative to the
reference. And that last step is important and needs the most care:
you can have it print out differences total numbers of non-reference
bases. I can share some information about thresholding how many of
these constitute significant evidence that a non-reference base is
actually there at that position (basically I use a binomial
distribution and ask whether the distribution of ref/non-ref would
occur by chance). Given that coverage of small genomes tends to be
high, your first question about determining the actual genome sequence
(or the quasispecies consensus if you prefer!) can be answered by
majority rules: i.e., a small script (or with tools under "Text
Manipulation" heading) to read off the base with the most support at
each position and then to test whether that base == base in reference
nucleotide column.
It's probably also worth thinking about PCR duplicates (from library
prep) as these could be a significant source of error, but they are
also tricky when many reads will be identical anyway in the input DNA.
Feel free to get in touch with me if you need a bit more clarity
and/or some more specifics...
cheers,
Ben
Benjamin Dickins
Postdoctoral Researcher
Center for Comparative Genomics and Bioinformatics
The Pennsylvania State University
302 Wartik Laboratory
University Park, PA 16802, USA
Cell/mobile: +1 814 777 1852
Office tel: +1 814 863 2185
Office fax: +1 814 865 9131
Website: http://www.bendickins.net/
Weblog: http://www.open.ac.uk/blogs/ideasblog/
ADD COMMENT
• link
•
modified 7.6 years ago
by
David Matthews • 630
•
written
7.6 years ago by
Benjamin Dickins • 20