Dear all, I was trying to generate a protein database using the Galaxy tool NCBI BLAST+ makeblastdb, with a fasta file of Uniref50 (downloaded here: http://www.uniprot.org/downloads). This is so I could do blastP to my Transdecoder output of de novo transcriptome assembly. I received an error message :"30757757 sequences Fatal error: Exit code 1 () BLAST Database creation error: Error: Duplicate seq_ids are found: GNL|BL_ORD_ID:2599973" Does anybody know how I can fix this in Galaxy? What shall I do? Thanks in advance
Hello,
Duplicated fasta IDs seems unlikely from this source.
I used the tool NormalizeFasta on the downloaded fasta file to strip off the extra annotation on the title line (can cause problems with tools). The options should be set to wrap the sequences at 80 bases and to remove title line content (">" lines) after the first whitespace. This results in just the fasta IDs being retained.
I'm testing the makeblastdb tool on that to see what happens next. More feedback once completed. The data is large, so will take some time to process. Meanwhile, you could also try to do the same (normalize first, then run the tool).
Thanks and I'll follow up soon, Jen, Galaxy team
Thank you Jen for your reply. I will try it as well. did you mean to turn the option :"Truncate sequence names at first whitespace" to Yes?
Correct, use that option.
This is true when using most fasta datasets in Galaxy (and frankly, also when used line command - some tools are pickier about format than others). This FAQ is for custom genomes, but has good general fasta formatting advice: https://galaxyproject.org/learn/custom-genomes/
- Galaxy FAQs: https://galaxyproject.org/support/
- Galaxy Tutorials: https://galaxyproject.org/learn/
I tried to use the tool NormalizeFasta (basically cut down the uniref annotation and left with only the id code), and than to Makeblastdb, but again I got the error message :"30757757 sequences Fatal error: Exit code 1 () BLAST Database creation error: Error: Duplicate seq_ids are found: GNL|BL_ORD_ID:2599973" do you have other idea what I should try? Thank you again