Dear all, after running BWA-MEM on a fastq dataset, I identified some "supplementary alignment" reads with Flag 2048 corresponding to chimeric reads. I desperately try to find a way to extract those reads from my dataset into a new fastq file for further analysis. Thanks for your help (I am not an expert in Galaxy)
Hello,
First, convert the BWA-MEM output with BAM-to-SAM without outputting the SAM header. Next, use Filter on the second column with the value as "2048". Check this output to see if only the target sequences remain and make sure that the database metadata is assigned (click on the pencil icon to assign if needed). Then as the last step extract the sequences with SamToFastq.
Best, Jen, Galaxy team
Thanks for your help. everything worked and I indeed get the fastq files of the filtered "2048" reads in the (UNPAIRED READS) output of SamToFastq. The problem is that it is the sequence of the aligned reads trimmed of the non aligned part (half of the chimeric read) In fact I would like to recover the initial full length read which was subsequently tagged "2048" and not the trimmed sequence I'm not sure I'm clear enough on this....... Best. JP