Question: No error with CD-HIT-EST-2D
    
    0
        
peri.tobias • 0 wrote:
I am using a HPC at my facility to cluster 8 denovo transcriptomes for downstream analysis using CD-HIT--EST-2D. Here is my script based on manual at:
http://weizhong-lab.ucsd.edu/cd-hit/wiki/doku.php?id=cd-hit_user_guide:
cd-hit-est-2d -i BU_Trinity.fasta -i2 BS_Trinity.fasta -o SYZ1.fasta -c 0.95 -n 10 -d 0 -M 16000 - T 8
There is no error output so I can't work out the problem but it stops running as below.
Job Name: SYZ1_CD-HIT Execution terminated Exit_status=1 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.ncpus=20 resources_used.vmem=0kb resources_used.walltime=00:00:02
Would be happy if anyone can spot anything wrong with my script? Many thanks in advance,
Peri

I have had success changing my parameters slightly (see below) - so here I am answering my own question.
cd-hit-est-2d -i BU_Trinity.fasta -i2 BS_Trinity.fasta -o SYZ1.fasta -c 0.95 -n 10 -d 0 -M 0 -T 0
I got two output files: The new clustered fasta file and a list of the clustered contigs. Interestingly the input fasta files were 167M and 141M in size while the new fasta is 91M. I hope I am not losing sequences that arise from closely related genes as these are the ones I am hoping to review in my analysis. Any comments/advice appreciated.