Bayesian NrdA phylogeny
datasetposted on 10.01.2020 by Daniel Lundin
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Bayesian phylogeny of NrdA, class I ribonucleotide reductase catalytic component. Sequences from NCBI's RefSeq and Genbank databases (Haft et al. 2018; https://doi.org/10.1093/nar/gkx1068), downloaded March 2019, was searched with subclass specific HMMER (Eddy 2011; https://doi.org/10.1371/journal.pcbi.1002195) profiles for NrdA and NrdJ, class II RNR, serving as outgroup, (Lundin et al. in preparation). The resulting sequences were clustered at 60% identity with UCLUST (Edgar 2010; https://doi.org/10.1093/bioinformatics/btq461) to create a representative set of sequences. After manual inspection of sequences, 342 out of 27821 original NrdA sequences remained, plus 26 NrdJ sequences selected for aligning well to NrdA. The sequences were aligned with ProbCons (Do et al. 2005; https://doi.org/10.1101/gr.2821705) and 283 reliably aligned positions were selected with BMGE (Criscuolo & Gribaldo 2010; https://doi.org/10.1186/1471-2148-10-210) using the BLOSUM30 matrix.
The alignment file is NrdA_uc0.60.NrdJ_uc0.30_outgroup.intr.correct.nolb.co.profile.BLOSUM30.bmge.mb.nxs.
A bayesian phylogeny was estimated with MrBayes v. 3.2.6 (Ronquist & Huelsenbeck 2003; https://doi.org/10.1093/bioinformatics/btg180; https://github.com/NBISweden/MrBayes) using a gamma distribution for rate variation and rjMCMC to jump between amino acid models. MrBayes was run with four chains and five runs until average standard deviation of split frequencies reached 0.015. (See NrdA_uc0.60.NrdJ_uc0.30_outgroup.intr.correct.nolb.co.profile.BLOSUM30.bmge.mb.)
The phylogeny, in Dendroscope (Huson et al. 2007; https://doi.org/10.1186/1471-2105-8-460) nexml format, isNrdA_uc0.60.NrdJ_uc0.30_outgroup.intr.correct.nolb.co.profile.BLOSUM30.bmge.mb.con.fullname.nexml .
Cancerfonden (CAN 2018/820)