Question: Mappability
0
Brown, Stuart • 30 wrote:
I want to make an intersection between a few hundreds of genomic
intervals (predicted translocation sites from SVDetect) and low
mappability regions in genomes (we are working with mm9 right now).
UCSC has an excellent mappability track that exactly matches our
sequencing data (50 bp kmers), but it seems very difficult to get that
data into Galaxy. I want a BED format that summarizes intervals of low
mappability (ie. less than 0.5 on the scale used by UCSC). The UCSC
Table Browser has a limit of 10M lines, which seems to give just part
of chromosome 1. It will be very messy to try to get the whole genome
bit by bit using this method and then stitch it back together using
some sort of concatenation.
UCSC Help suggests downloading the mappability data for the whole
genome as a bigwig formatted file, then convert to BED. I gave this a
try, but we get a 4 GB file, with intervals of just one or two base
pairs. Again, lots of work to get back to the nicer BED that I could
make with the UCSC tools over smaller genomic regions. Also, super-
painful to upload this huge file to Galaxy, and unhappy trying to
write my own parsers to filter and smooth this file.
Any other suggestions? Maybe someone else knows where to find a
mappability file (for mm9) that has nice intervals in a Galaxy
compatible format.
Stuart Brown
ADD COMMENT
• link
•
modified 6.6 years ago
by
Jennifer Hillman Jackson ♦ 25k
•
written
6.6 years ago by
Brown, Stuart • 30