datasetposted on 01.07.2019 by Daniel Lundin
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Maximum likelihood phylogeny of NrdBk and NrdBi class I ribonucleotide reductase radical generating subunits. Sequences from NCBI's RefSeq database (Haft et al. 2018; https://doi.org/10.1093/nar/gkx1068), downloaded July 2018, was searched with subclass specific HMMER (Eddy 2011; https://doi.org/10.1371/journal.pcbi.1002195) profiles from RNRdb (http://rnrdb.pfitmap.org) representing the NrdBk and NrdBi subclasses plus an outgroup consisting of NrdBe and NrdBn. The choice of outgroup was made by analysis of the full NrdB phylogeny presented in Grinberg et al. 2018 (https://doi.org/10.7554/eLife.31529). The resulting sequences were clustered at 70% identity with UCLUST (Edgar 2010; https://doi.org/10.1093/bioinformatics/btq461) to create a representative set of sequences. After manual inspection of sequences, 144 out of 7725 original sequences remained. The sequences were aligned with ProbCons (Do et al. 2005; https://doi.org/10.1101/gr.2821705) and 158 reliably aligned positions were selected with BMGE (Criscuolo & Gribaldo 2010; https://doi.org/10.1186/1471-2148-10-210) using the BLOSUM30 matrix. The alignment file is NrdBik_with_NrdBen.uc0.70.c.pb.BLOSUM30.bmge.alnfaa. A maximum likelihood phylogeny was estimated using RAxML v. 8.2.4 (Stamatakis 2014; https://doi.org/10.1093/bioinformatics/btu033) with the PROTGAMMAAUTO model using the rapid bootstopping algorithm and subsequent maximum likelihood search. The phylogeny, in Dendroscope (Huson et al. 2007; https://doi.org/10.1186/1471-2105-8-460) nexml format, is NrdBik_with_NrdBen.uc0.70.c.pb.BLOSUM30.bmge.PROTGAMMAAUTO.raxml.bipartitions.nexml.