I am using RNA-seq data for two different species and their hybrids. I want to find all positions with fixed differences between the two parent species, so that I can then assess any differences in expression of the two alleles in the hybrids (allelic imbalance), which can be used to infer regulatory evolution between the two parental species.
I'm currently trying to use the Naïve Variant Caller to do this, and I have a few questions about how it works.
1. When I run it, there is nothing in the filter field indicating whether the reads pass the quality filter (even though I set a minimum quality of 20). There’s just a period in the filter field for every position. Is it actually filtering my data or not?
2. I set a minimum number of reads to consider REF/ALT to 10. However it seems to output any position where there is at least one base matching the reference, but doesn’t report alternate alleles at all if there aren’t at least 10. What I really want is for it to only report positions with 10+ supporting bases overall, regardless of how many do or don’t match the reference.
3. My samples are derived from Drosophila, so they’re diploid, but each sample contains tissues from 30-40 individuals. So I set the ploidy to 2, but theoretically there could be more than two bases present at a given position. What should I do to accommodate this? Set the ploidy to 4, since there are only 4 possible bases?
4. My samples have strand information, so I set it to report counts by strand. In every case, I get the same base reported for the + and - strands. Shouldn’t they be complementary, not identical? I get the same thing when running through mpileup.