What does Contaminant list mean? For an example when I select the settings for the FastQC:Read QC tool there is a drop down box and my reference genome is listed there. Should I select it?
Hello,
The FastQC manual is linked from the tool form, which is the best source for usage details.
However, I can let you know what I have used this for: screening out known artifact from the analysis. For example, if in a public dataset an earlier run of FastQC revealed an overrepresented sequence that was identified as likely being an adaptor, or if the description of the data contains an adaptor. Use a tabular formatted file: column 1 an identifier, column 2 a nucleotide string. The underlying tool may also accept a fasta file, but not in the Galaxy wrapped version, that I know of.
Hopefully this helps, Jen, Galaxy team
The help text on the tool form is about all you'll find anywhere but an example with some explanation is here: https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt
The choices you see in the fastqc tool are the tabular datasetsl from your local history as defined by the tool xml:
<param name="contaminants" type="data" format="tabular" optional="true" label="Contaminant list"
help="tab delimited file with 2 columns: name and sequence. For example: Illumina Small RNA RT Primer CAAGCAGAAGACGGCATACGA"/>
</inputs>
Choosing the reference genome as the contaminant sequences list would probably be a very bad idea :)