Question: getting number of SNPs, insertion/deletion from VCF with VCFfilter
2.7 years ago by
lorenasfe50 wrote:


I am trying to extract from a VCF file the number of SNPs: Single nucleotide variants , Insertion/deletions variants, multi-nucleotide variants and variants with multiple alternate alleles. I am trying to use VCFfilter from Galaxy but I am getting error continuosly when I tried to filter. What do I have to write in 'Specify filterting expression'? Is there another way to get that information?

23 months ago by
chen.randy110 wrote:

use VCF Filter, input such as: -f "TYPE = del" to filter what you need

this helped me! thnx chen.randy

2.6 years ago by
Guy Reeves1.0k
Guy Reeves1.0k wrote:

HI I can help you write an expression for VCFfilter, but can you look and see if using *NGS: GATK Tools (beta)>Select Variants from VCF files will let you do what you want as it is easier- at least I think so

select Basic or Advanced Analysis options>Advanced

then scroll down

and check which ever boxes you want 'Select only a certain type of variants from the input file INDEL SNP MIXED MNP SYMBOLIC NO_VARIATION'

you may also want to look at the 'Select only variants of a particular allelicity' option. This allow you to count what you want from the output .vcf files This should all work on Tell me if it works. Guy

Thanks so much for your help! At the end I could count with VCFfilter, but next time I will try to use GATK as you saggested, it seems much easier.

23 months ago by
ron10 wrote:

Hi! I know you guys both wrote this a few months ago, but now I am in the exact same situation as as lorenasfe. Could any of you help me in the use of VCFfilter?

@ lorenasfe: how did you do it?

@ Guy Reeves: I have tried the method you explain, but I get quite lost in the process... and I cannot find the way to choose a reference genome. When I select "from the history", I am required to give a fasta file, which I do not have and which I have not used in any moment during the whole process. When I select "locally cached", it simply says "No options available" and does not allow me to make any changes.

So for these reasons, I would ask you both, or anyone else reading this post, to give me any tips. I am working with hg19. As a last resource, I even tried to use the filter in Excel, but it does not seem to bring me anywhere... I believe though, that VCFfilter should be a better tool, although I am open to hear new ideas.


as "locally cached" does not give you the hg19 option you want I guess you are not working on I suggest you register for an account and move your data there (or at least part of it). Then you can see what works and then work on installing reference genomes onto your local galaxy instance

