I'm running my own Galaxy server, and I'm looking to import 20-40gb NGS fastq files for processing...... I know that's unusual..... :)
Currently I'm manually uploading to the FTP directory, and importing using "Choose FTP". The upload script completes in a short period of time, and the upload job is successfully started in the galaxy queue.
However the upload.py script is taking hours to complete for each file. Is there anyway to speed up the process by linking the files directly or by not performing some of the sanity checks contained within the script, or by another means?
Regards,
Mat
The majority of time you spent waiting when importing large datasets to Galaxy from a local filesystem is most probably during 'detecting metadata' step - when Galaxy is trying to reason about the data (count sequences etc.). This would go faster if you can get a faster machine to run the job.
Besides that I do not think there is much you can do since Galaxy needs metadata for every dataset.
Thanks Martin, And yes I can pretty much confirm this is the case. one core, 100% utilized for 4+ hours on the upload.py script. Maybe time for me to have a closer look at that script :)