hi i have followed pipeline grooming the sra data followed by tophat and cufflink.cuffdiff job is taking too long.i want to know that is their some error in my input to cuffdiff.what i should check in cufflink result.how to see the result of tophat is satisfactory or not
Hi,
To validate if your tophat worked(i.e.the reads aligned correctly to the reference genome), you can try:
Visualizing regions of interest in the genome. This can be done using Tracker or browsers such as Integrated Genome Browser(IGB) or Integrative Genomics Viewer(IGV) by expanding your tophat "accepted_hits" files, and clicking on either: "display with IGV" or "display in IGB View."
You can also check your mapping statistics by accessing the "align_summary" file.
Yena
This means that tophat did not work. Only 1226 reads out of 19118751 reads were aligned to the reference genome, hence 0.0% of input.
Double check if you had provided the correct reference genome. Did you run a quality check on your reads (i.e. fastQC)? This will allow you to see if the reads are of good quality, with which you can decide whether you need to manipulate the reads or filter out poor quality reads. Every part of the fastQC results is described in the link provided:
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Module
Yena
hi thanks alot my result says Reads: Input : 19118751 Mapped : 1226 ( 0.0% of input) of these: 518 (42.3%) have multiple alignments (427 have >20) 0.0% overall read mapping rate.
what does it mean. thanks
Something is probably wrong with the input to map this poorly. Also double check:
Data represents spliced reads
Fastq inputs are the true pairs and entered on tool form in forward/reverse read order
Target reference genome mapped against is the right one. If custom genome, double check formatting: https://wiki.galaxyproject.org/Support#Custom_reference_genome
Fastq format has quality scores scaled correctly as fastqsanger. Here is how to check: https://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA
QA was not overly zealous resulting in lost reads/sequence content. Can try mapping original and compare if you clipped. Then adjust.
"Minimum length of read segments" (full parameters) is one half the length of the shortest sequence mapped (or that is expected to map).
One of these reasons is behind most poor read mapping results (from a usage perspective). Content/sequencing errors are upstream. Run FastQC to get a bead on overall read quality. QA might fix this or you may need to check in with the lab that did the sequencing.
Good luck! Jen, Galaxy team
mam 1. the link u have provided is not working . 2. fastq input is single end reads. 3. reference genome is human genome hg38.
thanx