The sizes of the black internal node circles are proportional to the posterior node support. Conservatively, we combined the three BFRs >2kb identified above into non-recombining region1 (NRR1). Of the countries that have contributed SARS-CoV-2 data, 30% had genomes of this lineage. N. China corresponds to Jilin, Shanxi, Hebei and Henan provinces, and the N. China clade also includes one sequence sampled in Hubei Province in 2004. Specifically, we used a combination of six methods implemented in v.5.5 of RDP5 (ref. In the presence of time-dependent rate variation, a widely observed phenomenon for viruses43,44,52, slower prior rates appear more appropriate for sarbecoviruses that currently encompass a sampling time range of about 18years. 725422-ReservoirDOCS). We extracted a total of 2189 full-length SARS-CoV-2 viral genomes from various states of India from the EpiCov repository of the GISAID initiative on 12 June 2020. This leaves the insertion of polybasic. 2, bottom) show that SARS-CoV-2 is unlikely to have acquired the variable loop from an ancestor of Pangolin-2019 because these two sequences are approximately 1015% divergent throughout the entire Sprotein (excluding the N-terminal domain). Mol. USA 113, 30483053 (2016). To gauge the length of time this lineage has circulated in bats, we estimate the time to the most recent common ancestor (TMRCA) of SARS-CoV-2 and RaTG13. Five example sequences with incongruent phylogenetic positions in the two trees are indicated by dashed lines. In December 2019, a cluster of pneumonia cases epidemiologically linked to an open-air live animal market in the city of Wuhan (Hubei Province), China1,2 led local health officials to issue an epidemiological alert to the Chinese Center for Disease Control and Prevention and the World Health Organizations (WHO) China Country Office. Wong, A. C. P., Li, X., Lau, S. K. P. & Woo, P. C. Y. Google Scholar. The S1 protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively. Article Bioinformatics 22, 26882690 (2006). The unsampled diversity descended from the SARS-CoV-2/RaTG13 common ancestor forms a clade of bat sarbecoviruses with generalist propertieswith respect to their ability to infect a range of mammalian cellsthat facilitated its jump to humans and may do so again. Yuan, J. et al. SARS-CoV-2 and RaTG13 are also exceptions because they were sampled from Hubei and Yunnan, respectively. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Posada, D., Crandall, K. A. Trafficked pangolins can carry coronaviruses closely related to For the current pandemic, the novel pathogen identification component of outbreak response delivered on its promise, with viral identification and rapid genomic analysis providing a genome sequence and confirmation, within weeks, that the December 2019 outbreak first detected in Wuhan, China was caused by a coronavirus3. Mol. You signed in with another tab or window. The Sichuan (SC2018) virus appears to be a recombinant of northern/central and southern viruses, while the two Zhejiang viruses (CoVZXC21 and CoVZC45) appear to carry a recombinant region from southern or central China. It compares the new genome against the large, diverse population of sequenced strains using a Virological.org http://virological.org/t/ncovs-relationship-to-bat-coronaviruses-recombination-signals-no-snakes-no-evidence-the-2019-ncov-lineage-is-recombinant/331 (2020). NTD, N-terminal domain; CTD, C-terminal domain. Overview of the SARS-CoV-2 genotypes circulating in Latin America Nature 503, 535538 (2013). Google Scholar. Viruses 11, 979 (2019). Annu Rev. We thank A. Chan and A. Irving for helpful comments on the manuscript. Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. In outbreaks of zoonotic pathogens, identification of the infection source is crucial because this may allow health authorities to separate human populations from the wildlife or domestic animal reservoirs posing the zoonotic risk9,10. Menachery, V. D. et al. Chernomor, O. et al. A third approach attempted to minimize the number of regions removed while also minimizing signals of mosaicism and homoplasy. Download a free copy. The key to successful surveillance is knowing which viruses to look for and prioritizing those that can readily infect humans47. Use of Genomics to Track Coronavirus Disease Outbreaks, New Zealand Its genome is closest to that of severe acute respiratory syndrome-related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. Biol. Evol. Dudas, G., Carvalho, L. M., Rambaut, A. & Holmes, E. C. Recombination in evolutionary genomics. Zhou, P. et al. Coronavirus origins: genome analysis suggests two viruses may have combined This new approach classifies the newly sequenced genome against all the diverse lineages present instead of a representative select sequences. 6, eabb9153 (2020). 4, vey016 (2018). Evol. While such models have recently been made available, we lack the information to calibrate the rate decline over time (for example, through internal node calibrations44). The plots are based on maximum likelihood tree reconstructions with a root position that maximises the residual mean squared for the regression of root-to-tip divergence and sampling time. Which animal did the novel coronavirus come from? | Live Science Li, X. et al. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus . Press, H.) 3964 (Springer, 2009). 21, 255265 (2004). Mol. In Extended Data Fig. Despite the SARS-CoV-2 lineages acquisition of residues in its Spike (S) proteins receptor-binding domain (RBD) permitting the use of human ACE2 (ref. Viral metagenomics revealed Sendai virus and coronavirus infection of Malayan pangolins (Manis javanica). Forni, D., Cagliani, R., Clerici, M. & Sironi, M. Molecular evolution of human coronavirus genomes. 82, 48074811 (2008). the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in 95% credible interval bars are shown for all internal node ages. 24, 490502 (2016). This statement informs us of the possibility that a virus has spilled over from a very rare and shy reptile-looking mammal . 2). Phylogenetic Assignment of Named Global Outbreak LINeages, The pangolin web app is maintained by the Centre for Genomic Pathogen Surveillance. 62,63), the GTR+ model and 100bootstrap replicateswas inferred for each BFR >500nt. 68, 10521061 (2019). The difficulty in inferring reliable evolutionary histories for coronaviruses is that their high recombination rate48,49 violates the assumption of standard phylogenetic approaches because different parts of the genome have different histories. Stegeman, A. et al. Coronavirus: Pangolins found to carry related strains - BBC News We say that this approach is conservative because sequences and subregions generating recombination signals have been removed, and BFRs were concatenated only when no PI signals could be detected between them. CAS Further information on research design is available in the Nature Research Reporting Summary linked to this article. These shy, quirky but cute mammals are one of the most heavily trafficked yet least understood animals in the world. All four of these breakpoints were also identified with the tree-based recombination detection method GARD35. Viruses 11, 174 (2019). Bayesian evolutionary rate and divergence date estimates were shown to be consistent for these three approaches and for two different prior specifications of evolutionary rates based on HCoV-OC43 and MERS-CoV. If stopping an outbreak in its early stages is not possibleas was the case for the COVID-19 epidemic in Hubeiidentification of origins and point sources is nevertheless important for containment purposes in other provinces and prevention of future outbreaks. Of the nine breakpoints defining these ten BFRs, four showed phylogenetic incongruence (PI) signals with bootstrap support >80%, adopting previously published criteria on using a combination of mosaic and PI signals to show evidence of past recombination events19. The command line tool is open source software available under the GNU General Public License v3.0. Evol. EPI_ISL_410721) and Beijing Institute of Microbiology and Epidemiology (W.-C. Cao, T.T.-Y.L., N. Jia, Y.-W. Zhang, J.-F. Jiang and B.-G. Jiang, nos. 21, 15081514 (2015). While there is involvement of other mammalian speciesspecifically pangolins for SARS-CoV-2as a plausible conduit for transmission to humans, there is no evidence that pangolins are facilitating adaptation to humans. PANGOLIN lineage database (15, 16) was used to analyze the frequency of lineages among countries. Two exceptions can be seen in the relatively close relationship of Hong Kong viruses to those from Zhejiang Province (with two of the latter, CoVZC45 and CoVZXC21, identified as recombinants) and a recombinant virus from Sichuan for which part of the genome (regionB of SC2018 in Fig. PLoS ONE 5, e10434 (2010). Trends Microbiol. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Google Scholar. Because there is no single accepted method of inferring breakpoints and identifying clean subregions with high certainty, we implemented several approaches to identifying three classic statistical signals of recombination: mosaicism, phylogenetic incongruence and excessive homoplasy51. Patino-Galindo, J. ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). The genetic distances between SARS-CoV-2 and Pangolin Guangdong 2019 are consistent across all regions except the N-terminal domain, implying that a recombination event between these two sequences in this region is unlikely. Nucleotide positions for phylogenetic inference are 147695, 9621,686 (first tree), 3,6259,150 (second tree, also BFR B), 9,26111,795 (third tree, also BFR C), 12,44319,638 (fourth tree) and 23,63124,633, 24,79525,847, 27,70228,843 and 29,57430,650 (fifth tree). Sequences are colour-coded by province according to the map. 2a. In other words, a true breakpoint is less likely to be called as such (this is breakpoint-conservative), and thus the construction of a non-recombining region may contain true recombination breakpoints (with insufficient evidence to call them as such). A second breakpoint-conservative approach was conservative with respect to breakpoint identification, but this means that it is accepting of false-negative outcomes in breakpoint inference, resulting in less certainty that a putative NRR truly contains no breakpoints. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Preprint at https://doi.org/10.1101/2020.05.28.122366 (2020). R. Soc. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. Thank you for visiting nature.com. Nature 583, 282285 (2020). Nature 538, 193200 (2016). 5). For weather, science, and COVID-19 . Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. The assumption of long-term purifying selection would imply that coronaviruses are in endemic equilibrium with their natural host species, horseshoe bats, to which they are presumably well adapted. Sci. Internet Explorer). Softw. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins Across a large region of the virus genome, corresponding approximately to ORF1b, it did not cluster with any of the known bat coronaviruses indicating that recombination probably played a role in the evolutionary history of these viruses5,7. Anderson, K. G. nCoV-2019 codon usage and reservoir (not snakes v2). Host ecology determines the dispersal patterns of a plant virus. A distinct name is needed for the new coronavirus. This dataset comprises an updated version of that used in Hon et al.15 and includes a cluster of genomes sampled in late 2003 and early 2004, but the evolutionary rate estimate without this cluster (0.00175 substitutions per siteyr1 (0.00117,0.00229)) is consistent with the complete dataset (0.00169 substitutions per siteyr1, (0.00131,0.00205)). Grey tips correspond to bat viruses, green to pangolin, blue to SARS-CoV and red to SARS-CoV-2. Biol. Boni, M. F., Posada, D. & Feldman, M. W. An exact nonparametric method for inferring mosaic structure in sequence triplets. This provides compelling support for the SARS-CoV-2 lineage being the consequence of a direct or nearly-direct zoonotic jump from bats, because the key ACE2-binding residues were present in viruses circulating in bats. Combining regions A, B and C and removing the five named sequences gives us putative NRR1, as an alignment of 63sequences. Correspondence to Nature 579, 270273 (2020). Sequence similarity. 1c). Mol. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. B.W.P. SARS-CoV-2 Variant Classifications and Definitions Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019), with the light and dark coloured version based on the HCoV-OC43 and MERS-CoV centred priors, respectively. One geographic clade includes viruses from provinces in southern China (Guangxi, Yunnan, Guizhou and Guangdong), with its major sister clade consisting of viruses from provinces in northern China (Shanxi, Henan, Hebei and Jilin) as well as Hubei Province in central China and Shaanxi Province in northwestern China. Coronavirus: Pangolins may have spread the disease to humans Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. Graham, R. L. & Baric, R. S. Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission. master 4 branches 94 tags Code AngieHinrichs Add entries for pangolin-data/-assignment 1.18.1.1 ( #512) ad16752 4 days ago 990 commits .github/ workflows Update pangolin.yml 7 months ago docs docs need guide tree now 3 years ago pangolin Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. Trova, S. et al. Lemey, P., Minin, V. N., Bielejec, F., Pond, S. L. K. & Suchard, M. A. Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. 5. Extended Data Fig. & Andersen, K. G. The evolution of Ebola virus: insights from the 20132016 epidemic. Because these subclades had different phylogenetic relationships in regionD (Supplementary Fig. (2020) with additional (and higher quality) snake coding sequence data and several miscellaneous eukaryotes with low genomic GC content failed to find any meaningful clustering of the SARS-CoV-2 with snake genomes (a). 35, 247251 (2018). Lond. While pangolins could be acting as intermediate hosts for bat viruses to get into humansthey develop severe respiratory disease38 and commonly come into contact with people through traffickingthere is no evidence that pangolin infection is a requirement for bat viruses to cross into humans. Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). Genetics 172, 26652681 (2006). A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection. performed Srecombination analysis. The inset represents divergence time estimates based on NRR1, NRR2 and NRA3. Wang, L. et al. Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% highest posterior density (HPD): 18791999), 1969 (95% HPD: 19302000) and 1982 (95% HPD: 19482009), indicating that the lineage giving rise to SARS-CoV-2 has been circulating unnoticed in bats for decades. In March, when covid cases began spiking around India, Bani Jolly went hunting for answers in the virus's genetic code. Its origin and direct ancestral viruses have not been . Visual exploration using TempEst39 indicates that there is no evidence for temporal signal in these datasets (Extended Data Fig. Note that six of these sequences fall under the terms of use of the GISAID platform. 36)gives a putative recombination-free alignment that we call non-recombinant alignment3 (NRA3) (see Methods). PDF single centre retrospective study J. Virol. 53), this is inferred to have occurred before the divergence of RaTG13 and SARS-CoV-2 and thus should not influence our inferences. We call this approach breakpoint-conservative, but note that this has the opposite effect to the construction of NRR1 in that this approach is the most likely to allow breakpoints to remain inside putative non-recombining regions. To evaluate the performance procedure, we confirmed that the recombination masking resulted in (1) a markedly different outcome of the PHI test64, (2) removal of well-supported (bootstrap value >95%) incompatible splits in Neighbor-Net65 and (3) a near-complete reduction of mosaic signal as identified by 3SEQ. Get the most important science stories of the day, free in your inbox. We find that the sarbecovirusesthe viral subgenus containing SARS-CoV and SARS-CoV-2undergo frequent recombination and exhibit spatially structured genetic diversity on a regional scale in China. 1. Bioinformatics 30, 13121313 (2014). The coronavirus genome that these researchers had assembled, from pangolin lung-tissue samples, contained some gene regions that were ninety-nine per cent similar to equivalent parts of the SARS . Conducting analogous analyses of codon usage bias as Ji et al. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. A., Lytras, S., Singer, J. As of December 2, 2021, SJdRP, a medium-sized city in the Northwest region of So Paulo state, Brazil (Fig. PubMed Central Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. RegionsAC had similar phylogenetic relationships among the southern China bat viruses (Yunnan, Guangxi and Guizhou provinces), the Hong Kong viruses, northern Chinese viruses (Jilin, Shanxi, Hebei and Henan provinces, including Shaanxi), pangolin viruses and the SARS-CoV-2 lineage. Adv. Google Scholar. Posterior means with 95% HPDs are shown in Supplementary Information Table 2. This is not surprising for diverse viral populations with relatively deep evolutionary histories. In addition, sequences NC_014470 (Bulgaria 2008), CoVZXC21, CoVZC45 and DQ412042 (Hubei-Yichang) needed to be removed to maintain a clean non-recombinant signal in A. Identification of diverse alphacoronaviruses and genomic characterization of a novel severe acute respiratory syndrome-like coronavirus from bats in China. obtained the genome sequences of 10 SARS-CoV-2 virus strains through nanopore sequencing of nasopharyngeal swabs in Malta and analyzed the assembled genome with pangolin software, and the results showed that these virus strains were assigned to B.1 lineage, indicating that SARS-CoV-2 was widely spread in Europe (Biazzo et al., 2021). According to GISAID . Avian influenza a virus (H7N7) epidemic in The Netherlands in 2003: course of the epidemic and effectiveness of control measures. Wan, Y., Shang, J., Graham, R., Baric, R. & Li, F. Receptor recognition by the novel Coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus. & Muhire, B. RDP4: Detection and analysis of recombination patterns in virus genomes. Allen O'Brien on LinkedIn: #r #rstudio #rstats #pangolin #covid19 # 5, 536544 (2020). Even before the COVID-19 pandemic, pangolins have been making headlines. Sequencing from Malayan pangolins collected during anti-smuggling operations in southern China detected coronavirus lineages related to SARS-CoV-2. Virus Evol. Because 3SEQ is the most statistically powerful of the mosaic methods61, we used it to identify the best-supported breakpoint history for each potential child (recombinant) sequence in the dataset. However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. To estimate non-synonymous over synonymous rate ratios for the concatenated coding genes, we used the empirical Bayes Renaissance countingprocedure67. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage - Nature volume5,pages 14081417 (2020)Cite this article. Epidemiology, genetic recombination, and pathogenesis of coronaviruses. 3) clusters with viruses from provinces in the centre, east and northeast of China. 3) to examine the sensitivity of date estimates to this prior specification. N. Engl. Holmes, E. C. The Evolution and Emergence of RNA Viruses (Oxford Univ. S. China corresponds to Guangxi, Yunnan, Guizhou and Guangdong provinces. Of importance for future spillover events is the appreciation that SARS-CoV-2 has emerged from the same horseshoe bat subgenus that harbours SARS-like coronaviruses. Another similarity between SARS-CoV and SARS-CoV-2 is their divergence time (4070years ago) from currently known extant bat virus lineages (Fig. Posterior means (horizontal bars) of patristic distances between SARS-CoV-2 and its closest bat and pangolin sequences, for the spike proteins variable loop region and CTD region excluding the variable loop. PubMed Meet the people who warn the world about new covid variants