TRANSPOSABLE ELEMENTS AND GENOME EVOLUTION Georgia Genetics Review 1 VOLUME 1 Transposable Elements and Genome Evolution Edited by JOHN F. McDONALD Reprinted from Genetica, Volume 107(1-3), 1999 SPRINGER-SCIENCE+BUSINESS MEDIA, B.Y. A c.I.P. Catalogue record for this book is available from the Library of Congress. ISBN 978-94-010-5812-4 ISBN 978-94-011-4156-7 (eBook) DOI 10.1007/978-94-011-4156-7 Printed an acid-free paper AII Rights Reserved © 2000 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 2000 Softcover reprint ofthe hardcover Ist edition 2000 No part of the material protected by this copyright notice may be reproduced or utlized in any form or by any means, electronic or mechanical, inc1uding photocopying, recording or by any information storage and retrieval system, without written permis sion from the copyright owner. CONTENTS Introduction J.F. McDonald I. Mechanisms and dynamics of transposable element evolution Comparative genomics and evolutionary dynamics of Saccharomyces cerevisiae Ty elements 3 I.K. Jordan, J.F. McDonald Is the evolution of transposable elements modular? 15 E. Lerat, F. Brunet, C. Bazin, P. Capy Molecular paleontology of transposable elements from Arabidopsis thaliana 27 v.v. Kapitonov, 1. Jurka Human L 1 retrotransposition: insights and peculiarities learned from a cultured cell retrotransposition 39 assay J.v. Moran Structure, functionality, and evolution of the BARE-l retrotransposon of barley 53 ' C.M. Vicient, R. Kalendar, K. Anamthawat-Jonsson, A. Suoniemi, A.H. Schulman Retrolycl-l, a member of the Tntl retrotransposon super-family in the Lycopersicon peruvianum 65 genome AP Pimentel Costa, K.c. Scortecci, R.Y. Hashimoto, PG. Araujo, M.-A Grandbastien, M.-A. Van Sluys Retrotransposon 1731 in Drosophila melanogaster changes retrovirus-like expression strategy in 73 host genome A Kalmykova, C. Maisonhaute, V. Gvozdev Regulatory potential of nonautonomous mariner elements and subfamily crosstalk 79 D. De Aguiar, D.L. Hartl Phylogenetic evidence for Tyl-copia-like endogenous retroviruses in plant genomes 87 H.M. Laten Evidence for genomic regulation of the telomeric activity in Drosophila melanogaster 95 D. Fortunati, N. Junakovic How valuable are model organisms for transposable element studies? 103 M.G. Kidwell, M.B. Evgen'ev Transposable elements and genome evolution: the case of Drosophila simulans 113 C. Biemont, c. Vieira, N. Borie, D. Lepetit Horizontal transfer of non-LTR retrotransposons in vertebrates 121 D. Kordis, F. Gubensek Sure facts, speculations, and open questions about the evolution of transposable element copy number 129 S.v. Nuzhdin Transposon dynamics and the breeding system 139 S.1. Wright, D.J. Schoen Recently integrated human Alu repeats: finding needles in the haystack 149 AM. Roy, M.L. Carroll, D.H. Kass, S.v. Nguyen, A-H. Salem, M.A. Batzer, P.L. Deininger Phylogenetic signals from point mutations and polymorphic Alu insertions 163 D.S. York, V Blum, J.A. Low, D.I. Rowold, V Puzyrev, V Saliukov, O. Odinokova, R.I. Herrera II. The impact of transposable elements on host genome evolution Transposable elements as the key to a 21st century view of evolution 171 J.A. Shapiro Transposable elements as activators of cryptic genes in E. coli 181 B.G. Hall Drosophila telomeres: two transposable elements with important roles in chromosomes 189 M.-L. Pardue, P.G. DeBaryshe Molecular domestication-more than a sporadic episode in evolution? 197 W.I. Miller, IF. McDonald, D. Nouaud, D. Anxolabehere Genomes were forged by massive bombardments with retroelements and retrosequences 209 J. Brosius Sectorial mutagenesis by transposable elements 239 l Jurka, VV Kapitonov Cell-surface area codes: mobile-element related gene switches generate precise and heritable cell 249 surface displays of address molecules that are used for constructing embryos W.I. Dreyer, J. Roman-Dreyer Transposable DNA elements and life history traits. II. Transposition of P DNA elements in somatic 261 cells reduces fitness, mating activity, and locomotion of Drosophila melanogaster R.c. Woodruff, J.N. Thompson, Jr., lS.F. Barker, H. Huai Host defenses to parasitic sequences and the evolution of epigenetic control mechanisms 271 M.A. Matzke, M.F. Mette, W. Aufsatz, l Jakowitsch, A.I.M. Matzke Sex brings transposons and genomes into conflict 289 T.H. Bestor Key word Index 297 Author Index 299 * Genetica 107: 1-2, 1999. Introduction Recent discoveries on the molecular structure and eral instances a genomics approach has been taken to function of eukaryotic genomes are of major evolu analyze patterns and processes underlying TE evolu tionary significance. Although inherited changes or tion (Jordan & McDonald; Lerat et al.; Kapitnov & mutations have long been recognized as the ultimate Jurka). A number of papers focus on the evolution source of evolutionary change, the mechanisms un of various molecular features and characteristics of derlying these genetic changes were initially assumed particular elements or families of elements (Moran; to be rather simplistic random events which provided Vicient et al.; Costa et al.; Kalmykova, Maisonhaute the raw material but not directionality to evolutionary & Gvozdev; De Aguiar & Hartl; Laten; Fortunati & change. In light of recent discoveries, both of these as Junakovic) while others discuss the evolutionary sig sumptions are today called into question. For example, nificance of more general patterns based upon compar we now know that inherited changes are the result of ative andlor theoretical analyses of related groups of much more than simple enzymatic errors made dur transposable elements (e.g., Kidwell & Evgen'ev; Bie ing the process of DNA replication. Indeed, many mont et al.; Kordis & Gubensek; Nuzhdin; Wright & inherited changes that have significant phenotypic ef Schoen). The final two papers in this section deal with fects are known to be due to the insertion of viral-like the evolution of Alu elements in humans (Roy et al.) transposable elements that comprise a major fraction and how these elements can be applied to the ana of all genomes. We now know that the movement of lysis of patterns of recent human evolution (Roy et al.; transposable elements, and hence the rate of transpos York et al.). able element-mediated insertional mutation, is not a The papers in Section II of this volume generally constant. Rather transposable element movement is a focus on the impact transposable elements have had highly regulated process and in many instances may be on the evolution of the genomes in which they reside. induced by environmental and other forms of genomic The first paper in this section is a provocative review stress. In short, the molecular processes which have by James Shapiro arguing for a central role of trans resulted in inherited changes over evolutionary time posable elements in 21st century views of evolution. are a good deal more complex than envisioned even Striking examples of the apparent direct intervention a decade ago and, in large measure, are associated of TEs in the evolution of particular host genes and with the movement and insertional consequences of functions, as well as their significance to various life transposable elements. history traits, are documented in a number of papers The papers presented in this volume are based in this section (Hall; Pardue & DeBargshe; Miller upon the proceedings of the first annual Georgia Ge et al.; Brosius; lurka & Kapitonov; Dreyer & Roman netics Symposium, held October 8-10, 1999, on the Dreyer; Woodruff et al.). Recent evidence suggests campus of the University of Georgia in Athens. This that TEs may have not only contributed to host gen meeting brought together over 60 scientists from 11 ome evolution directly by creating a unique functional different countries to discuss the evolutionary sig class of genetic variants but indirectly as well by eli nificance of transposable elements. Evolutionary in citing the evolution of various defense mechanisms terests in transposable elements have traditionally that have been subsequently co-opted for host cellu focused on two areas: the factors underlying the lar functions unrelated to TE control. This interesting dynamics of the elements themselves and the im hypothesis is addressed from two different perspect pact these elements have had on the evolution of ives in the papers by Marjorie Matzke and Timothy the genomes in which they reside. While not mu Bestor. tually exclusive, these two areas of focus provide As the papers in this volume demonstrate, our convenient criteria by which to group the papers that understanding and appreciation of the evolutionary follow. significance of 'selfish DNA' have come a long way Section I consists of papers dealing generally with since the publication of the two seminal papers by the topic of transposable element evolution. In sev- Doolittle and Sapienza (Nature 284: 601-603) and 2 Orgel and Crick (Nature 284: 604-607) nearly two the emergence of complex organismic diversity over decades ago. While there is still much to be learned, evolutionary time. it is now clear that TEs are much more than mere excess genomic baggage. Indeed, it appears that the JOHN F. McDONALD evolution of TEs and their host genomes are intim University of Georgia ately related processes that have combined to catalyze Athens, Georgia, USA Genetica 107: 3-13, 1999. 3 © 2000 Kluwer Academic Publishers. Comparative genomics and evolutionary dynamics of Saccharomyces cerevisiae Ty elements I. King Jordanl & John F. McDonald2 1N ational Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38A Bethesda, MD 20894, USA (E-mail: [email protected]); 2Department of Genetics, University of Georgia Accepted 11 February 2000 Key words: genomics, molecular evolution, retrotransposons, selection, Ty elements Abstract The availability of the complete genome sequence of Saccharomyces cerevisiae provides the unique opportunity to study an entire genomic complement of retrotransposons from an evolutionary perspective. There are five families of yeast retrotransposons, Tyl-Ty5. We have conducted a series of comparative sequence analyses within and among S. cerevisiae Ty families in an effort to document the evolutionary forces that have shaped element variation. OUf results indicate that within families Ty elements vary little in terms of both size and sequence. Furthermore, intra-element 5'-3' long terminal repeat (LTR) sequence comparisons indicate that almost all Ty elements in the genome have recently transposed. For each family, solo LTR sequences generated by intra-element recombination far outnumber full length insertions. Taken together, these results suggest a rapid genomic turnover of S. cerevisiae Ty elements. The closely related Tyl and Ty2 are the most numerous elements in the genome. Phylogenetic analysis of full length insertions reveals that reverse transcriptase mediated recombination between Tyl and Ty2 elements has generated a number of hybrid Tyl!2 elements. These hybrid Ty1l2 elements have similar genomic structures with chimeric LTRs and chimeric TYB (pol) genes. Analysis of the levels of non synonymous (Ka) and synonymous (Ks) nucleotide variation indicates that Tyl and Ty2 coding regions have been subject to strong negative (purifying) selection. Distribution of Ka and Ks on Ty 1, Ty2 and Ty 112 phy logenies reveals evidence of negative selection on both internal and external branches. This pattern of variation suggests that the majority of full length Tyl, Ty2 and Ty1l2 insertions represent active or recently active element lineages and is consistent with a high level of genomic turnover. The evolutionary dynamics of S. cerevisae Ty elements uncovered by our analyses are discussed with respect to selection among elements and the interaction between the elements and their host genome. Transposable elements and evolution in the age of are made up of transposable element (TE) sequences. genomics Thus, it is becoming increasingly clear that any at tempt to fully understand genome organization and The current era of biology is characterized by massive evolution will be incomplete without a concerted effort accumulation of sequence data due in large part to nu to comprehend the evolutionary dynamics of TEs. merous genome sequencing projects (e.g. Abbott et The first complete eukaryotic genome sequence aI., 1998; Blattner et aI., 1997; Fleischmann et aI., to be reported was that of Saccharomyces cerevisiae 1995; Goffeau et aI., 1996). Comparative analyses (Goffeau et aI., 1996). From a TE centric view this of these genomic data have the potential to provide meant that, for the first time, a full genomic comple unprecedented insight into genome organization and ment of retrotransposons (Ty elements) was available evolution (e.g. Koonin et aI., 1997; Rivera et aI., for analysis. Comparative sequence analysis of these 1998; Tatusov, Koonin & Lipman, 1997). One strik elements has the potential to provide increased power ing fact confirmed by sequencing projects is the extent and resolution for addressing fundamental questions to which genomes (particularly eukaryotic genomes) concerning TE evolution. Towards this end we have 4 conducted a series of sequence analyses within and Table 1. Number of Ty element insertions in the S. cerevisiae genome" among Ty element families in the S. cerevisiae gen ome. Here and elsewhere (Jordan & McDonald, 1998, Family Fulliengthb Solo LTRsc 1999a,b,c; Promislow, Jordan & McDonald, 1999) Tyl & Tyl/2 32 185 we describe results pertaining to a number of spe Ty2 13 21 cific questions that we were able to address using this Ty3 2 39 wealth of sequence data. These questions concern but Ty4 3 29 are not limited to (l) the role of recombination in Ty Ty5 6 element evolution, (2) the role of inter-element selec tion in Ty element evolution, (3) active versus inactive aData from Kim et aI., 1998. Ty elements, and (4) recent versus ancient Ty element bFull length insertions are defined as Ty elements that posess 5' and 3' LTRs that flank ORFs; although, in a few cases the ORF regions insertions. may be partially deleted (e.g. Ty5). C Solo LTRs are insertions of single Ty LTRs not associated with ORF sequences. Saccharomyces cerevisiae Ty elements of the total genomic DNA (Kim et aI., 1998). The ma The S. cerevisiae genome (strain aS288C) contains jority ofTy element insertions are solo LTRs (Table 1) five families of long terminal repeat (LTR) containing that are remnants of intra-element LTR-LTR recom retrotransposons, Tyl-Ty5 (Kim et aI., 1998). These bination. The same survey revealed that Tyl and Ty2 five families have similar genomic structures (Fig are the most populous families in the genome while ure 1) characterized by LTRs (direct repeats) that flank the Ty3, Ty4 and Ty5 families are represented by far the open reading frames (ORFs) TYA (gag) and TYB (pol). ITA encodes primarily structural proteins and fewer members (full length insertions, Table 1). Phylo genetic analysis based on TYB amino acid sequences TYB encodes enzymatic proteins involved in reverse shows that the five Ty element families belong to two transcription. Interestingly, LTR retrotransposons are groups (copia-like and gypsy-like) of LTR retroele the only class of TEs present in the S. cerevisiae gen ments. Tyl, Ty2, Ty4 and Ty5 are all copia-like LTR ome (Sandmeyer, 1998). An initial survey of the S. retroelements (Figure 2). As such their TYB polypro- cerevisiae genome revealed that Ty insertions make up a relatively paltry (as far as eukaryotes go) 3.1 % ,------------------MULV '----------------------WDSV TYB (pol) L-----------------------H~ Ty1 §LTRj TYA (gag) II PR IN RT RH ~ ~------..L.-_r_-_--_-_-_-_-_-_-_-_-_-_-__M MHTEVR V.K retrovirus '------------------------- HIV TYB r-------------------T~ I Ty2 ~ TYA PR IN RT RH §] r-L---------------------------------------W-P-~- Surl gypsy L-_____________________ M.g Tyl TYB Ty2 Ty3 ~ PR RT AH IN §J '--------------------------Ty4 NA ,---------------------copi. copia ,----------------Tntl TYB L-_______________ Tal Ty4 §\ TYA II PA IN AT RH §] ,--------------------Ty5 NA '----------------------1731 Figure 2. Phylogeny of representative LTR retroelements based on Ty5 ~r---~N~A-~PR----T~YiAN· ~TY-B- --R-T----R-H-~ an amino acid alignment of RT sequences. LTR retroelements can be divided into three monophyletic groups (retroviruses, gypsy-like Figure 1. Schematic examples of the genomic structures of the and copia-like). The phylogenetic relationships of the five Ty fam five S. cerevisiae Ty element families. Ty elements are LTR retro ilies are shown with repsect to other LTR retroelements and the transposons that have direct repeats (LTRs) that flank one or two three LTR retroelement groups. LTR retrotransposon names are ORFs. ORF designations are shown above the genomic structures. standard. Retrovirus abbreviations are: MULV - murine leukemia Abbreviations of proteins encoded by the ORFs are shown within virus, WDSV - walleye dermal sarcoma virus, HFV - human foamy the genomic structures. LTR - long terminal repeat; NA - nucleic virus, MMTV - mouse mammary tumor virus, HERV-K - hu acid binding protein; PR - protease; IN - integrase; RT - reverse man endogenous retrovirus, HIV - human immunodeficiency virus. transcriptase; RH - RNase H. Phylogeny provided by Nathan Bowen.