I'm an old school molecular biologist who studies gene expression but is quite new to bio-computing, I am feeling my way around using some ENCODE siRNA-RNA-seq data through web interfaces in GenomeSpace (like Galaxy, thanks!). I Managed to do an analysis of differential gene expression between 2 control and 2 siRNA replicates (CuffDiff). I got 50K ish genes with UCSC gene IDs, but couldn't manage to convert these to gene symbol/names I might be more familiar with using the tools a google search pointed me to. I re-ran the analysis with ENSEMBL genes then RefSeq genes to see what would change, and to see if this helped my ability to retrieve gene symbols. I got more genes from ENSEMBL (huh?) and otherwise unfortunately all I got was bubkis (meaning the cuffdiff worked, the most meaningful changes showed up in all conditions, but there were still no gene symbol entries on any of the outputs .
When I cut and paste any of these gene IDs/ENSEMBLE IDs into a google/PubMed search, they easliy locate the associated genes, but I want to convert the entire list to gene symbol not go through on by one. A different google search pointed me to some tools that seemed designed for that purpose (Biomart, UCSC table browser, NCBI DAVID) but after fumbling around I surmised that these tools don't/can't convert 50K genes at once, and to complicate the task that there are a good proportion of those IDs without a proper gene symbol. When I use a much smaller list (100-500 gene ID range) I was able to get some conversions, however this list didn't correspond to to the list I entered: They were not in the order that I entered them on the list and there were fewer/more entries than I entered, making merging them with my original gene list problematic/impossible without manually correlating all of these (exactly what I am trying to avoid).
I NEED ADVICE: Am I going about this all wrong? Is converting large lists of genesIDs to symbols not possible/or naive? If it is naive, then what is it that people in my position normally do? If it is possible, how do I get to the gene symbols for a gene expression analysis if the original output doesn't include them? If I am using the right tools, then how do I put in a list of genes, and get back a one to one correspondence of gene symbols in the order I entered them, with a skipped space where a partucluar gene ID has no corresponding gene symbol?
thanks