Hello,
When looking closely to my alignments data I found something interesting. Some of my reads are aligned to the Y chromosome while the sample is from a ovarian cancer cell line - in short a female donor.
Indeed all of the these reads are aligned to repeated regions and for each gene on the Y chromosome having any reads aligned I can find a paralogue on the X chromosome.
Although these reads do not represent a high occurence, I still fear that it may falsify the calculation for gene / transcript expression level, since the genes on autosomes are not affected by the duplication.
I wonder if there is any way to turn off the Y chromosome when using tophat2 (I'm aware of the simple method of removing Y chromosome temporally) or merge the read counts before doing downstream analysis.
Thanks,
Hi,
This helps. If working command-line, then obtaining our version of the hg19female variant, along with assorted useful indexes (including Tophat2 ... the <dbkey>.*.bt2 files) is another option. All available on our rsync server in the hg19 top level directory. Link with instructions: http://wiki.galaxyproject.org/Admin/UseGalaxyRsync
Should you decide to try this, the .loc files in the /location directory are formatted in a way such that results are redirected back to the full hg19 assembly. Very useful for visualization at UCSC, use with other tools and reference files (later in the Cuff* tools), etcetera.
Good luck with you project, Jen, Galaxy team
Hi,
Thanks for your answer. I'm actually working with my local serveur via cmd line.
Judging by the presence of *random.fa files in the genome I think I'm using the hg19 full version. I think I'll just remove the Y chromosome and other unwanted .fa files next time.
Best,
Hi,
Thank you very much for the link and your effort!
Best,
Please accept the answer to help others find it. Thanks.