The 7 metagenomic assemblies of different species of Azolla fern was the raw data i had. The aim was to identify bacteria in leaf ecosytem of azolla different species. Out hypotheisis was, if there are similar bacteria which repeat within the azollas different species, they will cluster together when their genomes will be plotted in dendrogram or a tree. The plant dna material was filtered by backmapping with reference genome and as our target was micobial dna the dna which didnt mapped with host plants was used for further analysis.
The method BWA was used to do backmapping, samtools for sorting, metabat for binning and checkm for to see completeness and contamination of bins.
Then prokka was used to annotate the genomes and uniport ids were obtained and table was made of all uniport id of all the bins. the table was changed to binary table and then used to create a dendrogram in R.
The dendrogram was used as a input in fig tree. In the tree i observed that the bacteria are clustering according to the metagenomic sample or plant host not on the basis of their similar taxonomical name eg rhizobiales is clustering with burkholderiales of same metagenomic assembly but not with rhizobiales of other host plant assembly. what are the other ways by which i can compare genomes of taxonomical same bacteria other than using list of uniprot ids. One approach i got from here on biostars was using mash distances between genomes and then drawing trees using pairwise mash distances among all your genomes. Are there other ways to compare taxonomicaly similar genomes and then try to cluster them together in tree. Im on the dead end, how to intrepret these results and what can i deduce from it. And are there other ways to improve my approach? Can i compare similar taxonomical bins directly of different metagenomic assemblies? Any suggestions will be highly valuable. kind regards manpreet utrecht university student