Hello everyone,
I have one problem at the very beginning of the exome sequencing analysis, probably something wrong with the data format. In general, I have a patient with an unknown mutation and I want to compare the data with the parents to get some candidate genes. So far so good. For Whole Exome Sequencing a company was instructed and they have sent us the BAM files. I have tried to follow the described plan for exome sequencing. Therefore, I want to use the Free Bayes but it did not work, even with the others. Every time the following error indication appeared: Sequences are not currently available for the specified build. Afterwards, I tried to go one step back and convert the bam files into sam files. Fortunately, this worked. But it is not possible to convert this sam files again into bam files. Again the error indication appears as before. Has someone an idea what’s wrong with this bam files and how I can solve this problem? I had to upload the files with ftp because of the size, could this be a problem? I am very unexperienced in this field and deeply grateful for every hint.
Thanks a lot and best regards, Nadja
Hi Nadja
I presume this data is on usegalaxy.org? If it is, could you please run the BAM files through the Flagstat tool and report on the statistics? And which species are these reads from?
Thank you very much. Yes, the data is on usegalaxy.org. I will try this Flagstat tool now. These reads are from human.
This is the result for my first BAM file with the flagstat tool.
57702389 + 0 in total (QC-passed reads + QC-failed reads) 402873 + 0 secondary 0 + 0 supplementary 8685951 + 0 duplicates 57298398 + 0 mapped (99.30%:-nan%) 57299516 + 0 paired in sequencing
Well, I have no experience if this data is good or bad. I have to check this now. Does this give some insights if the BAM files should work for the other tools (freebayes, etc)?
The data here are from an intact BAM file and represent well mapped (likely filtered) results for single-end sequencing.
The problem is with the database/build assignment or possibly a sorting issue. See comment below for details.
What build was used for the BAM files? Depending on the tools I try to use, I sometimes run into a problem if I am using hg19 but hg18 usually works.
Thank you for your answer. Unfortunately, I do not know which build was used for the BAM files, because we instructed a company. But I will ask them. I tried the hg19 and the hg18 but both did not work.
So the right build should be the hg19
Hi Nadja, See my comment below for how to confirm that hg19 is actually represented in the BAM dataset exactly how released from UCSC (otherwise this could be a genome build mismatch problem). Jen, Galaxy team