Hi all,
I'm having a issue with the picard tools downloaded from the toolshed. When trying to use the CollectInsertSizeMetrics tool in the picard suite, I'm asked to use a reference genome as input. This is no problem, since the tool should read the available references from the all_fasta.loc file.
However, I noticed that the tool apparently sources it's references from some other location too. This results in 2 links to the hg19 genome, which translates in a comma separated list of the input paths as arguments to the tool, which then enters an error state, as this isn't a legal argument.
Anyone any idea where Galaxy reads it's reference files from, apart from the loc files in tool-data? It seems to look for data in the folder defined in the data table but it also seems to include the ~/galaxy-dist/tool-data/hg19/seq/hg19.fa path for some reason.
This also occurs in all other picard related tools that require a reference fasta input.
Thanks! M
Extra info:
- tracking Galaxy release brach 16.01
- picard tools revision 11:efc56ee1ade4 (https://toolshed.g2.bx.psu.edu/repository?repository_id=c45d6c51a4fcfc6c)
- Ubuntu 14.04 / Python 2.7
~/galaxy-dist/tool-data/all_fasta.loc:
mm10 mm10 Mouse (Mus Musculus): mm10 /Shared/references/mm10/seq/mm10.fa
danRer7 danRer7 Zebrafish (Danio rerio): danRer7 /Shared/references/danRer7/seq/danRer7.fa
hg19 hg19 Human (Homo sapiens) (b37): hg19 /Shared/references/hg19/seq/hg19.fa
hg_g1k_v37 hg_g1k_v37 Human (Homo sapiens) (b37): hg_g1k_v37 /Shared/references/hg_g1k_v37/seq/hg_g1k_v37.fa
hg38 hg38 Human (Homo sapiens) (b38): hg38 /Shared/references/hg38/seq/hg38.fa
equCab2 equCab2 Horse (Equus caballus): equCab2 /Shared/references/equCab2/seq/equCab2.fa
excerpt from the offending tool xml:
<command>
@java_options@
##set up input files
#set $reference_fasta_filename = "localref.fa"
#if str( $reference_source.reference_source_selector ) == "history":
ln -s "${reference_source.ref_file}" "${reference_fasta_filename}" &&
#else:
#set $reference_fasta_filename = str( $reference_source.ref_file.fields.path )
#end if
java -jar \$JAVA_JAR_PATH/picard.jar
CollectInsertSizeMetrics
INPUT="${inputFile}"
OUTPUT="${outFile}"
HISTOGRAM_FILE="${histFile}"
DEVIATIONS="${deviations}"
#if str( $hist_width ):
HISTOGRAM_WIDTH="${hist_width}"
#end if
MINIMUM_PCT="${min_pct}"
REFERENCE_SEQUENCE="${reference_fasta_filename}"
ASSUME_SORTED="${assume_sorted}"
METRIC_ACCUMULATION_LEVEL="${metric_accumulation_level}"
VALIDATION_STRINGENCY="${validation_stringency}"
QUIET=true
VERBOSITY=ERROR
</command>
<inputs>
<param format="sam,bam" name="inputFile" type="data" label="Select SAM/BAM dataset or dataset collection" help="If empty, upload or import a SAM/BAM dataset."/>
<conditional name="reference_source">
<param name="reference_source_selector" type="select" label="Load reference genome from">
<option value="cached">Local cache</option>
<option value="history">History</option>
</param>
<when value="cached">
<param name="ref_file" type="select" label="Using reference genome" help="REFERENCE_SEQUENCE">
<options from_data_table="all_fasta">
</options>
<validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/>
</param>
</when>
<when value="history">
<param name="ref_file" type="data" format="fasta" label="Use the folloing dataset as the reference sequence" help="REFERENCE_SEQUENCE; You can upload a FASTA sequence to the history and use it as reference" />
</when>
</conditional>