Dear Galaxy Community,
I have actually installed Galaxy on our cluster, and I am now trying to design a workflow to process our data. However, I am facing a "technical" issue and would like to have your suggestions on how I could solve it.
I have around 1,200 Fastq bs-seq datasets, which I want to align on a modified reference genome. Each dataset comes from a different individual, for which I have SNP information in VCF format. I would like my workflow to substitute the reference genome with the SNP (this can be done easily with bcftools or vcf tools consensus), index this substituted genome and then align the Fastq sequences on the substituted indexed genome (with Bismark).
At first, it seemed to be pretty straightforward to me. However, to run this pipeline, I need two clicks for each individuals: one to select the Fastq file, and one to select the VCF file (and a third click to press "execute, of course!). As I have 1,200 individuals (and will have more in the future), this is very laborious and error prone.
What I would like, is to be able to somehow "link" together the corresponding VCF and Fastq files for each individual, and then run the pipeline on on several individuals at the same time using something like the "multiple datasets" option normally available with any tools.
Is there a way to do that? I initially thought this could possibly be done using the "dataset collection" functionality, but from what I have read it only works with 2 files of the same type. Also, as the VCF and Fastq files are not used during the same step (and not with the same tool) of the workflow, it is problematic.
For information, my Fastq and VCF files are (at the moment) stored in data libraries in Galaxy
I am open to any suggestions, and I thank you in advance for your help!
Sincerely
David
This is an interesting use case that does not have a solution yet through the UI. It could possibly be solved by writing a script and making use of the API (do you have programming resource?). We are discussing and more feedback soon, likely as a ticketed future enhancement idea, for the UI implementation. Thanks! Jen, Galaxy team
Hi Jen and Galaxy Team,
Thank you for your very quick answer. I am a biologist with some IT skills, but not really at scripting. It is a reason why I went for Galaxy, as the pipeline scripts left by our previous bioinformatician were not really flexible. I'll ask a friend to see if he can help me with a script, but I would definitely be interested if there is a ticket for a future enhancement. I will post the script here if I can make it.
Sincerely
David