Identifying Tags - Galaxy Question

Question: Identifying Tags - Galaxy Question

5.2 years ago by

Hello, I need to perform an action (or series of actions) on an 454 dataset using Galaxy, and have not been able to figure out the necessary steps, even after looking through the toolbar expressions and using custom search. My file is a fasta and has the standard format: CTGAGTCAGGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC ATGTTA CTGAGTCAGGTCAACAATCATAAGACATCGGCTCTCTATATTTAATATTGGT Each of the 100,000 sequences within this file contains a specific tag, which is the first 8 nucleotides. There are 19 tags total. I would like to identify these tags and add an identifier of the tag to the sequence name. Therefore, if I am looking for the first tag (CTGAGTCA), the output would look like: *CTGAGTCA*GGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC ATGTTA Is it possible to achieve this using Galaxy? If possible, could you kindly suggest tools to use. Thank you in advance, Dominique Cowart

galaxy • 680 views

ADD COMMENT • link •

modified 5.2 years ago by Jennifer Hillman Jackson ♦ 25k • written 5.2 years ago by D. A. Cowart • 30

5.2 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello Dominique, Yes, this can be done. Here is the process -> Start by splitting up the data by using the 'NGS: QC and manipulation -> Barcode Splitter" tool. The result files will be available as links. These can be copied and added to the "Get Data -> Upload File" tool in the text box, in batch, and each will loaded as a dataset. Copying them into a simple text file, then pasting into the Upload tool all at once is a quick way to do this, or you can do one by one. Once you have the individual files as datasets, you probably will want to rename them to better keep track of which barcode/tag they represent. Click on the pencil icon in the upper right corner of each dataset to do this on the Edit Attributes form. Next, the idea is to convert the fasta dataset to tabular, add in a column with the "_Tag1" information, merge the original identifier column with the new tag column, cut the columns to rearrange - (you want just the new merged identifier and the original fasta sequence - leaving behind the two columns with the original identifier + tag), then covert back from tabular to fasta format. Use the tools in 'Text Manipulation' and 'FASTA manipulation' to do these operations. I would normally suggest creating/using a workflow at this point, but as the tags will all be different, and the "Add column" step is in the middle of the processing, this is probably not worth it. Hopefully this helps! Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org

ADD COMMENT • link written 5.2 years ago by Jennifer Hillman Jackson ♦ 25k

5.2 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hi Dominique, Glad that helped. And yes, you can merge many file types that are text-based with the tool 'Text Manipulation -> Concatenate datasets". Sometimes you will need to convert to format tabular first, and then back to the desired format (fasta, gtf, etc.) after. Take care, Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org

ADD COMMENT • link written 5.2 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »