JVI Accepted Manuscript Posted Online 11 January 2017 J. Virol. doi:10.1128/JVI.01953-16 Copyright © 2017 American Society for Microbiology. All Rights Reserved. 1 Title: 2 Surveillance of bat coronaviruses in Kenya identifies relatives of human coronaviruses NL63 3 and 229E and their recombination history 4 5 Running title: 6 Bat origin of human coronaviruses 7 D o 8 Authors: w n lo 9 Ying Tao1#, Mang Shi2#, Christina Chommanard1, Krista Queen1, Jing Zhang1, Wanda a d 10 Markotter3, Ivan V. Kuzmin4†, Edward C. Holmes2, Suxiang Tong1* ed f r o 11 m h 12 Affiliations: tt p 13 1 Division of Viral Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30333, :// jv i. 14 USA; 2Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, a s m . 15 School of Life and Environmental Sciences and Sydney Medical School, The University of o r g / 16 Sydney, Sydney, Australia; 3Centre for Viral Zoonoses, Department of Medical Virology, o n A 17 Faculty of Health Sciences, University of Pretoria, Pretoria, South Africa; 4Division of High p r il 4 18 Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, , 2 0 19 GA 30333, USA. 1 9 b y 20 # Y.T. and M.S. contributed equally to this work g u e 21 † Present address: Department of Pathology, University of Texas Medical Branch, Galveston, s t 22 TX 77555, USA. 23 * Correspondence to: Dr. Suxiang Tong, 1 1600 Clifton Rd, mail stop G18, CDC, 24 Atlanta, GA 30333; Tel: 4046391372; Email: [email protected]. 25 The findings and conclusions in this report are those of the author(s) and do not necessarily 26 represent the official position of the Centers for Disease Control and Prevention. 27 28 Type of Publication: ‘Full length’ paper 29 Word count: Abstract (155); Importance (106); Text body (4618) 1 30 ABSTRACT 31 Bats harbor a large diversity of coronaviruses (CoVs), several of which are related to 32 zoonotic pathogens that cause severe disease in humans. Our screening of bat samples 33 collected in Kenya during 2007-2010 not only detected RNA from several novel CoVs but, 34 more significantly, identified sequences that were closely related to human CoVs NL63 and 35 229E, suggesting that these two human viruses originate from bats. We also demonstrated D 36 that human CoV NL63 is a recombinant between NL63-like viruses circulating in Triaenops o w n 37 bats and 229E-like viruses circulating in Hipposideros bats, with the break-point located near lo a d 38 5’ and 3’ end of the spike (S) protein gene. In addition, two further inter-species e d f r 39 recombination events involving the S gene were identified, suggesting that this region may o m h 40 represent a recombination “hotspot” in CoV genomes. Finally, using a combination of t t p : / 41 phylogenetic and distance-based approaches we showed that genetic diversity of bat CoVs is /jv i. a 42 primarily structured by host species and subsequently by geographic distances. s m . o 43 r g / o 44 IMPORTANCE n A p 45 Understanding the driving forces of cross-species virus transmission is central to r il 4 46 understanding the nature of disease emergence. Previous studies have demonstrated that bats , 2 0 1 47 are the ultimate reservoir hosts for a number of coronaviruses (CoVs) including ancestors of 9 b y 48 SARS-CoV, MERS-CoV, and HCoV-229E. However, the evolutionary pathways of bat g u e 49 CoVs remain elusive. We provide evidence for natural recombination between distantly- s t 50 related African bat coronaviruses associated with Triaenops afer and Hipposideros sp. bats 51 that resulted in a NL-63 like virus, an ancestor of the human pathogen HCoV-NL63. These 52 results suggest that inter-species recombination may play an important role in CoV evolution 53 and the emergence of novel CoVs with zoonotic potential. 2 54 INTRODUCTION 55 Coronaviruses (CoVs) (subfamily Coronavirinae, family Coronaviridae, order Nidovirales) 56 are common infectious agents that infect a wide range of hosts including humans, causing 57 respiratory, gastrointestinal, liver, and neurologic diseases, and that possess the largest 58 genomes of any RNA viruses described to date (1). The subfamily Coronavirinae is currently 59 classified into four genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and D 60 Deltacoronavirus (2). The alphacoronaviruses (alpha-CoV) and betacoronaviruses (beta-CoV) o w n 61 are exclusively found in mammals while the gammacoronaviruses (gamma-CoV) and lo a d 62 deltacoronaviruses (delta-CoV) are mainly associated with birds. Presently, the greatest e d f r 63 diversity of alpha- and beta-CoVs has been documented in bats, which in part reflects the o m h 64 more intensive surveillance of these animals since Rhinolophus spp. bats were implicated as t t p : / 65 the reservoir hosts for SARS-related CoVs (3, 4). This surveillance resulted in the discovery /jv i. a 66 of a potential reservoir host (bat) species for another two human CoVs: Human CoV 229E s m . o 67 (HCoV-229E), a relative of which is present in Hipposideros bats (5, 6), and Middle East r g / o 68 respiratory syndrome coronavirus (MERS-CoV), for which related viruses are present in n A p 69 Pipistrellus, Tylonycteris, and Neoromicia bats (7-10), although the most likely reservoir host r il 4 70 of human MERS-CoV identified to date is the dromedary camel (11). Most recently HCoV- , 2 0 1 71 229E-like CoVs were also identified in camels, although their role in human infection is 9 b y 72 unknown (12). g u e 73 Africa is a major hotspot of zoonotic emerging diseases. With its rich biodiversity, s t 74 Africa is inhabited by many bats of different species including those that serve as reservoirs 75 of important zoonotic diseases such as Marburg hemorrhagic fever and rabies (13). Our initial 76 screening demonstrated the presence of diverse CoVs in African bats, including those 77 collected in the southern parts of Kenya during 2006 (14, 15), and in other countries 78 including South Africa, Nigeria, and Ghana (16). Furthermore, recent studies have provided 3 79 strong evidence that HCoV-229E originated from bat viruses circulating in Africa (5), 80 underscoring the zoonotic potential of bat-borne CoVs from this continent. 81 One human coronavirus, HCoV-NL63, was first isolated in 2004 from the aspirate of 82 a 8-month-old boy suffering from pneumonia in the Netherlands (17). While the clinical 83 significance of this virus is debated, it has a worldwide distribution and is known to infect 84 both the upper and lower respiratory tract (18). Based on a phylogeny of the RNA-dependent D o 85 RNA polymerase (RdRp), HCoV-NL63 is related to another human virus HCoV-229E and w n lo 86 had no close relatives identified in bats (16). Although Huynh et al. (19) suggested that a a d e 87 virus (ARCoV.2/2010/USA) isolated from the American tricolored bat (Perimyotis subflavus) d f r o 88 may share common ancestry with HCoV-NL63, the genetic distance between the two viruses m h t 89 is large, and their close relationship has not been corroborated in other phylogenetic analyses tp : / / jv 90 (16, 20). Nevertheless, the successful passage of HCoV-NL63 in an immortalized bat cell i. a s m 91 line suggests its potential association with bats (19). . o r 92 As is well appreciated, recombination leads to rapid changes of genetic diversity in g / o n 93 RNA viruses (21). CoVs represent a classic example of viruses with high frequencies of A p r 94 homologous recombination through discontinuous RNA synthesis (22). Indeed, under il 4 , 2 95 experimental conditions, the recombination frequency can be as high as 25% for the entire 0 1 9 96 CoV genome (23). Recombination in CoVs is also frequently reported under natural b y g 97 conditions, including some emerging human pathogens such as SARS-CoV (24, 25), MERS- u e s 98 CoV (11), HCoV-OC43 (26), and HCoV-NL63 (27), although most reports are between t 99 closely related viruses. 100 The Global Disease Detection Program (GDD) of the Centers for Disease Control and 101 Prevention (CDC, Atlanta, GA) is focused on the detection of emerging infectious agents 102 worldwide. One of the GDD projects was directed toward the detection of such potential 103 zoonotic pathogens in African bats. Since the initial study performed during 2006 in Kenya 4 104 (14, 15), an expanded surveillance of bat CoVs has been performed in the same and other 105 countries including Kenya, Nigeria, Democratic Republic of Georgia, Democratic Republic 106 of Congo, Guatemala, and Peru. The project included more bat species and geographic 107 locations, allowing a more thorough investigation of the genetic diversity and ecological 108 dynamics of CoVs circulation in bats. In this study, we performed an ecological and 109 evolutionary characterization of CoVs circulating in Kenya and identified distinct CoVs from D 110 Triaenops afer and Hipposideros sp. bats that are phylogenetically related to HCoV-NL63 in o w n 111 different parts of the genome. Based on this data, we propose a scenario for the origin and lo a d 112 evolutionary history of HCoV-NL63 and related viruses. e d f r 113 o m h t t p : / / jv i. a s m . o r g / o n A p r il 4 , 2 0 1 9 b y g u e s t 5 114 MATERIALS AND METHODS 115 Sample collection. Between 2007 and 2010 a total of 2050 bat specimens were collected 116 from 30 different locations in Kenya (Table S1) in collaboration with the CDC GDD regional 117 country office in Kenya and National Museums of Kenya. The bats were captured using mist- 118 nets, hand nets or manually. The protocol (2096FRAMULX-A3) was approved by the CDC 119 IACUC and by Kenya Wildlife Services. Upon capture, each bat was measured, sexed and D 120 identified to species by a trained field biologist. Subsequently, fecal and oral swabs (if o w n 121 possible) were collected in compliance with field protocol and were then transported on dry lo a d 122 ice from the field to -80°C storage before further processing. e d f r 123 o m 124 CoV RNA detection. Each fecal and oral swab was suspended in 200 µL of a phosphate ht t p : / 125 buffered saline. Viral total nucleic acids (TNA) were extracted using the QIAamp Mini Viral /jv i. a 126 Spin kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions, followed s m . o 127 by semi-nested RT-PCR (SuperScript III One-Step RT-PCR kit and Platinum Taq kit, r g / o 128 Invitrogen, San Diego, CA, USA) using primer sets designed to target the conserved genome n A p 129 region of alpha-, beta-, gamma- and delta-CoVs, respectively (15). PCR products of the r il 4 130 expected size (~ 400 nucleotides) were purified by gel extraction using the QIAquick Gel , 2 0 1 131 Extraction kit (Qiagen, Valencia, CA, USA) and sequenced in both directions on an ABI 9 b y 132 Prism 3130 automated sequencer (Applied Biosystems, Foster City, CA, USA). As g u e 133 validation, the RT-PCR procedure was repeated for each of the CoV positive specimens. s t 134 135 Bat mitochondrial gene sequencing. Bat species were further confirmed by sequencing the 136 host mitochondrial cytochrome b (cytB) gene in each of the CoV-positive specimens. Both 137 the method and the primers used have been described previously, and a final 1104 bp 138 fragment of the cytB gene was amplified and sequenced as described previously (14, 15). 6 139 140 Phylogenetic analyses. This study generated a total of 240 CoV RdRP sequences (402 bp) 141 from Kenyan bats. These sequences were first aligned in MAFFT v7.013 (28), using amino 142 acid sequences as a guide for the nucleotide sequence alignment. Phylogenetic trees were 143 then inferred using the maximum likelihood (ML) method available in PhyML version 3.0 144 (29) assuming a general time-reversible (GTR) model with a discrete gamma distributed rate D 145 variation among sites (Γ4) and the SPR branch-swapping algorithm. To produce a more o w n 146 condensed data set, we clustered the highly similar sequences from the same geographic lo a d e 147 location and host species, and randomly chose one or two to represent each cluster. This d f r 148 condensed data set was subsequently combined with 121 reference sequences representative o m h 149 of the genetic diversity of alpha- and beta-CoVs on a global scale taken from GenBank. ML tt p : / / 150 phylogenetic trees of these final alignments were inferred using the same procedure and jv i. a 151 substitution models as described above. sm . o 152 r g / o 153 Comparisons of viral genetic, geographic, and host genetic distance matrices. To n A p 154 determine the relationship between viral genetic, geographic, and host genetic distances, we r il 4 , 155 compiled a data set containing the Kenyan CoV samples generated in this study. The genetic 2 0 1 156 distance matrices were produced from pairwise comparisons either in the form of uncorrected 9 b y 157 percentage differences or calculated from the phylogenetic trees (patristic distance) using the g u e 158 Patristic v1.0 program (30) The geographic distances (Euclidean distance) were calculated s t 159 using the formula “distance = (acos((sin(latitude1) * sin(latitude2)) + (cos(latitude1) * 160 cos(latitude2) * cos(longitude2 - longitude1)))) * 6371”, with spatial coordinates of the 161 samples derived from the geographic location information. 162 We used Mantel correlation analyses to test the extent of the correlation between 163 these matrices (31). Both simple Mantel’s test and partial Mantel’s test were performed, and 7 164 the correlation was evaluated with 10000 permutations. To access which of the two factors – 165 geographic or host genetic distance – best explained total variation in the virus genetic 166 distance matrices, we performed multiple linear regression on these distance matrices (32). 167 The statistical significance of each regression was evaluated by performing 10000 168 permutations. To examine whether the degree of virus genetic relatedness corresponded to 169 the scale of geographic distance or host relatedness, we generated Mantel correlograms. In D 170 each correlogram, 10-12 distance classes were assigned following an equal-frequency o w n 171 criterion: each class had similar number of pairwise comparisons. All statistical analyses lo a d 172 were performed using the Ecodist package implemented in R3.0.2 (33), and all statistical e d f r 173 results were considered significant at the P = 0.05 level. o m h 174 t t p : / 175 Full genome sequencing and sequence analyses. Five viruses representative of the full /jv i. a 176 diversity of the CoVs newly described here were selected for full genome sequencing: s m . o 177 BtKY229E-1, BtKY229E-8, BtKYNL63-9a, BtKYNL63-9b, and BtKYNL63-15. We first r g / o 178 sequenced a number of conserved regions throughout the genome using several semi-nested n A p 179 or nested consensus degenerate RT-PCR amplicons. These regions were then bridged using r il 4 180 sequence-specific RT-PCR followed by Sanger sequencing (< 2 kb) or using the PacBio , 2 0 1 181 platform (> 2 kb). The assembled consensus genome sequences from PacBio sequencing 9 b y 182 were later confirmed by sequence-specific RT-PCR and Sanger sequencing (GenBank g u e 183 accession numbers KY073744-KY073748). The 5’ and 3’ genome termini were not s t 184 determined due to the limited RNA remaining, and were derived with PCR primers based on 185 the conserved genome regions in alpha-CoVs. 186 For each complete genome sequence, potential ORFs were predicted based on the 187 conserved core sequence, 5′-CUAAAC-3′, with a minimum length of 66 amino acids. 188 Ribosomal frameshifts were identified based on the presence of the conserved slippery 8 189 sequence, “UUUAAAC”. For phylogenetic analyses, the data set was first separated into six 190 ORFs, namely; ORF1a, ORF1b, Spike (S), Envelope (E), Membrane (M), and Nucleoprotein 191 (N) genes. The data set for each gene was translated into amino acid sequences and aligned 192 using MAFFT v7.013. Phylogenetic trees were then inferred using PhyML as described 193 above. Recombination events were first identified from the occurrence of incongruent 194 topologies in these initial phylogenies, and were then confirmed and characterized using D 195 Simplot v3.5.1 (34). In the Simplot analysis, seven sequences were analyzed, including the o w n 196 potential recombinant, the parental viruses, as well as an outgroup. The similarity lo a d 197 comparisons of recombinant and the other sequences were plotted using a sliding window e d f r 198 with a size of 1000 bp and a step size of 10 bp. o m h 199 t t p : / 200 RESULTS /jv i. a 201 Prevalence of CoV in Kenyan bats. We examined bats from at least 27 species (17 genera) s m . o 202 collected over a four year period (2007-2010) from 30 locations across the southern part of r g / o 203 Kenya (Figure 1). A total of 2,050 bats samples were screened for CoV RNA using a pan- n A 204 coronavirus RT-PCR assay. We found an overall prevalence of 11.7% (240/2,050 bats) p r il 4 205 (Table S1). This overall prevalence is in line with recent reports of CoVs in bats from , 2 0 1 206 numerous locations including South Africa, Mexico, Philippines, Kenya, United Kingdom, 9 b y 207 Japan, Italy, and Ghana (6, 14, 15, 35-40). g u e 208 Bats of the species tested (Chaerephon pumilus, Coleura afra, Lissonycteris s t 209 angolensis, Miniopterus africanus, Neoromicia tenuipinnis, Neoromicia sp., Nycteris sp., 210 Pipistrellus sp., and Scotoecus sp.) did not yield CoV positive samples although the sample 211 number was limited and might not reflect the real prevalence (Table S1). Conversely, in bats 212 of several other species the CoV prevalence was high (Cardioderma cor, 25%; Eidolon 213 helvum, 21%; Epomophorus labiatus, 28.6%; Hipposideros sp., 27.6%; Miniopterus minor, 9 214 22.6%; Otomops martiensseni, 28.6%; Rhinolophus hildebrandtii, 31.3%; Rhinolophus sp., 215 28.9%; Triaenops afer, 26.7%). Most species (21/27) were sampled at more than one 216 location. Of note, we detected CoVs in 21% of E. helvum bats tested in Kenya, whereas a 217 previous study in Ghana failed to detect any CoVs in a similar number of bats from this 218 species (6). 219 D 220 Phylogenetic diversity of Kenyan bat CoVs. The viral sequences identified in Kenyan bats o w n 221 showed a remarkable diversity within both alpha- and beta-CoVs (Figure 2). Based on our lo a d 222 phylogenetic analysis, the CoVs newly identified here can be grouped into 20 phylogenetic e d f r 223 lineages (Figure 2). Many of the sampled bat genera are associated with more than one viral o m h 224 lineage. Furthermore, in some cases, the divergence of the CoVs within the same host genera t t p : / 225 may also be associated with possible differences in sample types. For example, we found two /jv i. a 226 lineages of CoV in Rousettus aegyptiacus bats, one of which was present in oral swabs s m . o 227 (Figure 2: L7 Rousettus) while the other one was identified in fecal swabs (L17 Rousettus). r g / o 228 The default tissue tropism for bat CoVs is believed to be intestinal and samples of choice are n A p 229 fecal swabs. In agreement with this, only four viruses were identified from oral swab samples r il 4 230 (L7 Rousettus) as indicated in the phylogeny (Figure 2). , 2 0 1 231 Our phylogenetic analyses also revealed a number of cross-species transmission 9 b y 232 events at the genus level, many of which appeared to be transient spill-overs with no evidence g u e 233 of onward transmission. This pattern was observed as CoV sequences recovered from bats of s t 234 a particular genus located as tree tips within the phylogenetic diversity that is mainly 235 associated with a different bat genus. From our Kenyan data set, there were seven such cross- 236 species transmission events in total, each represented by a single sequence (dotted red in 237 Figure 2), suggesting these are most likely viruses with limited transmission within new hosts, 238 although this hypothesis requires confirmation on a larger set of samples. 10
Description: