Hello,
What is the easiest way to find and count reads for exogenous transcripts (Cre, GFP, etc.) in RNA-Seq bam files using Galaxy? These transcripts are not in normal genomic databases, but I can make Fasta files for them. Thank you,
Valeri
Hello,
What is the easiest way to find and count reads for exogenous transcripts (Cre, GFP, etc.) in RNA-Seq bam files using Galaxy? These transcripts are not in normal genomic databases, but I can make Fasta files for them. Thank you,
Valeri
Hello,
The tool htseq_count could be used.
This would require a reference annotation file for these exogenous transcripts in GTF format as one input. Mapped your reads to the same exact reference genome as the GTF file is based on is the other.
There are other methods. Please let us know if you do not have a GTF file and we can go from there - please note the target genome and if the reads are RNA or DNA (assuming RNA, but please confirm).
Thanks, Jen, Galaxy team
Thanks Jen. I think what we want to do is something that will be useful for almost all researchers working with RNA-Seq. Our reads are from mouse cells and mouse tissues. We mapped them against mm10 genome and generated GTF and BAM files. What would be the way to generate GTF file for Cre and GFP using their corresponding cDNA sequences?
Blast can be used to map the longer sequences to the genome. One of the output format options is tabular. This tabular data can be simply rearranged to create a GFF/GTF file for all fields but the last one (attributes) which will take more formatting.
Attributes are important, especially the gene_id and transcript_id values. Both can be the same value for certain datatypes. Create and format these from the cDNA sequence name itself.
GFF/GTF specifications are available a few places on the web, this is one with links: https://wiki.galaxyproject.org/Learn/Datatypes#GFF
Blast+ is available in the Tool Shed for use with a local or cloud Galaxy.
Dear Jennifer, Thank you for taking your time to answer my questions.I think I am missing something important.Are you explaining how to make a GTF file from a given sequence?GFP and Cre sequences are not in any genomes. What do I blast them against? Wow, it seems that this is such an easy and useful task, to find out whether a given exogenous non-genomic sequence is present in RNA-Seq reads.I am surprised that there is no easy way of doing that. Thanks, Valeri
From: Jennifer Hillman Jackson on Galaxy Biostar <notifications@biostars.org>
To: vasioukhin@yahoo.com Sent: Wednesday, September 21, 2016 1:32 PM Subject: [galaxy-biostar] Finding exogenous transcripts in RNA-Seq bam files
Activity on a post you are following on Galaxy Biostar User Jennifer Hillman Jackson wrote Comment: Finding exogenous transcripts in RNA-Seq bam files: Blast can be used to map the longer sequences to the genome. One of the output format options is tabular. This tabular data can be simply rearranged to create a GFF/GTF file for all fields but the last one (attributes) which will take more formatting. Attributes are important, especially the gene_id and transcript_id values. Both can be the same value for certain datatypes. Create and format these from the cDNA sequence name itself.GFF/GTF specifications are available a few places on the web, this is one with links: https://wiki.galaxyproject.org/Learn/Datatypes#GFFBlast+ is available in the Tool Shed for use with a local or cloud Galaxy. You may reply via email or visit http://biostar.usegalaxy.org/p/19622/#19634
If the sequences are not mapped with coordinates to a reference genome, then you could map the reads to the GFP and Cre sequences directly instead. Put the target sequences into a single fasta file and use it as a Custom reference genome. Any fasta file can be used as a "reference genome" - it is a global term. Whole or partial transcriptomes, groups of mRNA or DNA sequences - all are acceptable as long as the data is in fasta format. https://wiki.galaxyproject.org/Support#Custom_reference_genome
Yes, the reads are from RNA.