I am trying to userstand the results from the mpile up tool but I am unsure what the "<*>" is in the alt column is? It also appears in conjunction in another base. What does it mean?!? ='(
Hello,
The *
represents a deletion in the reference genome.
From the VCF Specification: https://samtools.github.io/hts-specs/VCFv4.2.pdf
ALT - alternate base(s): Comma separated list of alternate non-reference alleles. These alleles do not have to be called in any of the samples. Options are base Strings made up of the bases A,C,G,T,N,*, (case insensitive) or an angle-bracketed ID String (“<id>”) or a breakend replacement string as described in the section on breakends. The * allele is reserved to indicate that the allele is missing due to a upstream deletion. If there are no alternative alleles, then the missing value should be used. Tools processing VCF files are not required to preserve case in the allele String, except for IDs, which are case sensitive. (String; no whitespace, commas, or angle-brackets are permitted in the ID String itself)
Thanks! Jen, Galaxy team
hi there
I was using galaxy software online for alighning my mirna seq data . i have a queasion is it appropriate to use reference genome hg19 for mirna seq data in bowtie step. i am getting a vcf file in 40 to 60 mb size with most of the alteration are <*> symbal. anyone kindly explain the meaning of it. i m getting it in almost all position except few
It is not clear what your steps are. Is the output a VCF dataset or a BAM dataset? Both formats are described here: https://galaxyproject.org/learn/datatypes/
If the data is RNA-seq, and human, then mapping against the hg19 human genome can be a valid use case. Tool choices and options matter too, as does data QA upstream from mapping. HISAT2 can map spliced reads (RNA). Bowtie is an unspliced mapper, only (DNA).
Please see the Galaxy tutorials here for examples of proper tool usage grouped by analysis goals: https://galaxyproject.org/learn/
Related: https://biostar.usegalaxy.org/p/27105/
thank you for your reply. i have got vcf file with<*> symbols as alternate sequence .why it is so?.IS it because i used bowtie as my map[ping tool? my steps for mirna sequence analysis are
fastqgrromer> fastqc>trimmomatic>fastqc>bowtie2>sort>mark duplicate>rmdup>mpile up