ATP-cone sequence clusters
datasetposted on 05.04.2019 by Daniel Lundin
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The NCBI RefSeq database (2019-03-19; Haft et al. 2018 https://doi.org/10.1093/nar/gkx1068) was searched with Pfam's ATP-cone profile (accno: PF03477; Finn et al. 2010 https://doi.org/10.1093/nar/gkp985) returning 44367 NCBI accessions. Ribonucleotide reductase proteins were identified using HMMER (Eddy 2011 https://doi.org/10.1371/journal.pcbi.1002195) profiles from the RNRdb database (http://rnrdb.pfitmap.org). Subsequently, sequences were clustered with UCLUST (Edgar 2010 https://doi.org/10.1093/bioinformatics/btq461) at identity to remove sequence duplicates (24477 sequences remaning).
All sequences were pairwise aligned to each other using LAST (Kiełbasa et al. 2011 https://doi.org/10.1101/gr.113985.110) and a bitscore matrix was constructed. The bitscore matrix was clustered with MCL (Enright, Dongen & Ouzounis 2002 https://doi.org/10.1093/nar/30.7.1575) using the Cluster Maker 2 (Morris et al. 2011 https://doi.org/10.1186/1471-2105-12-436) Cytoscape (Shannon et al. 2003 https://doi.org/10.1101/gr.1239303) app. Bitscores < 200 were not included in the initial network and an inflation parameter of 2.5 was used.
The file "atp-cone_mcl_clustering.tsv" contains all information necessary to recreate the clustering as well as the assigned cluster numbers to each sequence.
Column names: SUID: Cytoscape's id, accno: NCBI's accession number, mcl2.0ewc200-mcl3.0: cluster assignments with inflation parameters and bitscore cutoff (ewc; when used), name: sequence identifier composed of accno plus cone number, outer_inner: "inner", "middle" or "outer" when more than one cone present in full sequence, pclass and psubclass: RNR class and subclass, ptype: protein type, taxon, tdomain: taxonomic domain, title: NCBI's description of the sequence.
The "precluster_assignments.tsv" file contains the results of the preclustering with USEARCH, i.e. which usearch cluster (first column) each accession number (second column) belong to.
Exploring ribonucleotide reductase as a target to combat bacterial infections
Swedish Research CouncilFind out more...