HI, I am using local instance of Galaxy. I have uploaded my dataset and they are in fastq.gz format. The first step that I need to perform is use "Fastq groomer" to get my files in the correct format for TopHat. When using the web based version of Galaxy I believe it automatically decompresses the files in the background, and we can readily use the uploaded files in "Fastq groomer". However, that is not the case in the local instance of Galaxy. Is there a separate tool that needs to be installed for the local instance of galaxy to decompress the .gz files? Or do files need to be decompressed before uploading? Thanks in advance for your time.
Hello,
By default, uploaded files are uncompressed when added to a history. Was there some administrative change to leave them compressed (it is an option)? Or as the file just still named "gz" but the dataset actually uncompressed? The file names are not changed upon upload.
If uncompressed, maybe the tool is not picking them up because of a missing datatype. It should be assigned as "fastq" or some version of fastq. https://wiki.galaxyproject.org/Support#Tool_doesn.27t_recognize_dataset
And finally, you might not need to groom. See this wiki about how to tell if the datatype "fastqsanger" can just be assigned. https://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA
Let us know if this does not address the issue, Jen, Galaxy team
Thank you Jennifer! I did not make any changed to the administrative settings to leave them uncompressed. Instead of uploading all the files, I did link them to a folder in the directory as I am dealing with a large data set, could that have been the issue? If so is there way to address that?
If these were added to a Data Library without loading (aka "copy") into Galaxy, then there is no way to uncompress them. Instead, you have two choices:
1) Use the Data Library upload method that actually copies the data into Galaxy. This will uncompress them.
2) Uncompress the files in the linked directory and re-add them to Galaxy in that format.
Reference: https://wiki.galaxyproject.org/Admin/DataLibraries/UploadingLibraryFiles
When I try to copy into Galaxy I run out of space due to the size of my data set. Is there a way to copy the files directly into Galaxy files so, I don't have 2 copies of them?
Did you upload them or link them into "shared data"? If you do anything except link them in then they'll be uncompressed.
I added them to the data libraries using a shared data directory. Is there any way to address it without having to decompress all the files before adding them?
Some of the Galaxy wrappers aren't written in a way to handle gzipped files. Perhaps fastq groomer is one of those (in any case, its output will be uncompressed, which I agree is a design flaw, since uncompressed fastq files should never exist).