GFF3 file (source: BRAD) incompatibilities

Question: GFF3 file (source: BRAD) incompatibilities

14 months ago by

Hi there,

I've been trying to analyze Brassica napus transcriptomic data for the purpose of isoform expression and incidence of splicing events which led me to use the Brassica Database GFF3 and fasta files for my index generation (STAR).

After a few errors I managed to get my STAR run working but subsequent software (e.g. rMATS require gtf files and the BRAD GFF3 doesn't seem to be compatible with any GFF3->gtf software.

(I've used gffread and genometools so far).

Has anyone had similar problems with the formatting of these BRAD annotation files?

Example formatting:

chrC03 GazeA2 mRNA 28541218 28543845 572.4227 + .
ID=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 UTR 28543523 28543845 6.0158 + .
Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 CDS 28543454 28543522 29.9339 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 CDS 28543158 28543369 27.5481 + 1 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 CDS 28542958 28543060 27.3743 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

Columns 1-8 are mostly consistent with sample GFF3 files but I've noticed a large space in the mRNA row between the score and strand columns. Also, the attribute column is different but I don't know if this is an acceptable departure from the norm.

I managed to get around this problem in STAR through: STAR --runMode genomeGenerate --genomeDir $1 --genomeFastaFiles $genfas --sjdbOverhang 99 --sjdbGTFfile $gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon CDS

Which seems to be correct, and following map job was successful.

Does anyone have any ideas as what could be causing this problem and/or any potential solutions?

Thanks in advance, I've been really wracking my brain.

rna-seq star gtf formatting error gff3 • 368 views

ADD COMMENT • link •

modified 14 months ago by Jennifer Hillman Jackson ♦ 25k • written 14 months ago by dejong.grant • 0

14 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The GFF3 data is out of specification. Details are in the same question posted here: https://www.biostars.org/p/273355/

You might want to contact the data source to ask if they offer alternate versions of the data or what their recommendations are for using this data with other tools that are not web-based at their site. (I don't personally know and couldn't find different data with a quick browse). http://brassicadb.org/brad/contact.php

Sorry we couldn't help more, Jen, Galaxy team

ADD COMMENT • link modified 14 months ago • written 14 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »