Hi there,
I've been trying to analyze Brassica napus transcriptomic data for the purpose of isoform expression and incidence of splicing events which led me to use the Brassica Database GFF3 and fasta files for my index generation (STAR).
After a few errors I managed to get my STAR run working but subsequent software (e.g. rMATS require gtf files and the BRAD GFF3 doesn't seem to be compatible with any GFF3->gtf software.
(I've used gffread and genometools so far).
Has anyone had similar problems with the formatting of these BRAD annotation files?
Example formatting:
chrC03 GazeA2 mRNA 28541218 28543845 572.4227 + .
ID=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
chrC03 GazeA2 UTR 28543523 28543845 6.0158 + .
Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
chrC03 GazeA2 CDS 28543454 28543522 29.9339 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
chrC03 GazeA2 CDS 28543158 28543369 27.5481 + 1 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
chrC03 GazeA2 CDS 28542958 28543060 27.3743 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
Columns 1-8 are mostly consistent with sample GFF3 files but I've noticed a large space in the mRNA row between the score and strand columns. Also, the attribute column is different but I don't know if this is an acceptable departure from the norm.
I managed to get around this problem in STAR through: STAR --runMode genomeGenerate --genomeDir $1 --genomeFastaFiles $genfas --sjdbOverhang 99 --sjdbGTFfile $gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon CDS
Which seems to be correct, and following map job was successful.
Does anyone have any ideas as what could be causing this problem and/or any potential solutions?
Thanks in advance, I've been really wracking my brain.