Stockholm University
Browse
1/1
5 files

Metagenomic NrdJm5 sequences placed in full phylogeny

dataset
posted on 2019-01-29, 17:00 authored by Daniel LundinDaniel Lundin, Christoph Loderer, Karin Holmfeldt

To search for sequences from metagenomics projects, we downloaded all TARA Ocean ORFs (Eren 2017, https://doi.org/10.6084/m9.figshare.4902917.v1 ; Delmont 2018, https://doi.org/10.1038/s41564-018-0176-9), all ORFs from the Human Microbiome Project (2019-01-09; HMP 2012a, https://doi.org/10.1038/nature11234 ; HMP 2012b, https://doi.org/10.1038/nature11209 ) the majority of bacterial MAGs and SAGs from IMG/MER (4910 MAGs, 2230 SAGs) plus 53 aquatic and soil metagenomes, in particular those with project names containing “virus”, “phage”, “therm” or “hot“ (see img_sags.tsv, img_metag_samples.tsv and img_mags.tsv) (Markowitz 2008, https://doi.org/10.1093/nar/gkm869). Together, we downloaded a total of 250,881,638 ORFs. We used hmm profiles designed for each clan in the phylogeny to search the sequences. We found 181 sequences with a best match to the profile designed from the TV clan. These were aligned to the original alignment using Clustal Omega in profile mode (all.NrdJm5.co.profile.wa.masked.alnfaa ; Sievers 2014, https://doi.org/10.1038/msb.2011.75) and phylogenetically placed in the phylogeny from https://doi.org/10.17045/sthlmuni.7117430.v2 with RAxML (Stamatakis 2014, https://doi.org/10.1093/bioinformatics/btu033). The resulting tree can be viewed with Dendroscope (Huson et al. 2007, https://doi.org/10.1186/1471-2105-8-460); placed sequences have "QUERY" prepended to their names.



Funding

Fonds der Chemischen Industrie (FCI) (SK201/03)

Zukunftskonzept of TU Dresden (Federal and State Excellence Initiative)

VR 2016-01920

History