Question: Indexing Files Everytime - Performance Issue
0
Praveen Raj Somarajan • 100 wrote:
All,
It is noticed that Galaxy/GATK indexes reference fasta & dbSNP file
everytime when it runs. Re-indexing takes time (~10min), hence it
affects overall run time when it use for multiple times. However, this
could be avoided by reusing the available index. Here is the snapshot
of the log:
INFO 11:43:57,365 HelpFormatter - The Genome Analysis Toolkit (GATK)
v1.4-21-g30b937d, Compiled 2012/02/01 19:01:14
INFO 11:43:57,365 HelpFormatter - Copyright (c) 2010 The Broad
Institute
INFO 11:43:57,365 HelpFormatter - Please view our documentation at
http://www.broadinstitute.org/gsa/wiki
INFO 11:43:57,366 HelpFormatter - For support, please view our
support site at http://getsatisfaction.com/gsa
INFO 11:43:57,367 HelpFormatter - -----------------------------------
----------------------------------------------
INFO 11:43:57,429 GenomeAnalysisEngine - Strictness is STRICT
INFO 11:43:57,432 ReferenceDataSource - Index file /tmp/tmp-gatk-
6jlUfH/gatk_input.fasta.fai does not exist. Trying to create it now.
PROGRESS UPDATE: file is 15 percent complete
PROGRESS UPDATE: file is 28 percent complete
PROGRESS UPDATE: file is 91 percent complete
INFO 11:45:32,231 ReferenceDataSource - Dict file /tmp/tmp-gatk-
6jlUfH/gatk_input.dict does not exist. Trying to create it now.
INFO 11:45:54,262 SAMDataSource$SAMReaders - Initializing SAMRecords
in serial
INFO 11:45:54,280 SAMDataSource$SAMReaders - Done initializing BAM
readers: total time 0.02
INFO 11:45:54,304 RMDTrackBuilder - Creating Tribble index in memory
for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf
INFO 11:48:05,910 RMDTrackBuilder - Writing Tribble index to disk for
file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf.idx
Do we have any option/alternate in Galaxy to avoid this re-indexing at
/tmp, as I have already built the index for reference and dbSNP.
Look forward to any suggestions.
Thanks,
Raj
________________________________
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
solely for the use of the addressee(s). If you are not the intended
recipient, please notify the sender by e-mail and delete the original
message. Further, you are not to copy, disclose, or distribute this
e-mail or its contents to any other person and any such actions that
are unlawful. This e-mail may contain viruses. Ocimum Biosolutions has
taken every reasonable precaution to minimize this risk, but is not
liable for any damage you may sustain as a result of any virus in this
e-mail. You should carry out your own virus checks before opening the
e-mail or attachment.
The information contained in this email and any attachments is
confidential and may be subject to copyright or other intellectual
property protection. If you are not the intended recipient, you are
not authorized to use or disclose this information, and we request
that you notify us by reply mail or telephone and delete the original
message from your mail system.
OCIMUMBIO SOLUTIONS (P) LTD
ADD COMMENT
• link
•
modified 6.4 years ago
by
Jennifer Hillman Jackson ♦ 25k
•
written
6.4 years ago by
Praveen Raj Somarajan • 100