ebook img

Statistical Genomics: Linkage, Mapping, and QTL Analysis PDF

642 Pages·1998·53.982 MB·\642
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical Genomics: Linkage, Mapping, and QTL Analysis

Statistical Genomics Linkage, Mapping, and QTL Analysis Ben Hui Liu CRCPress Boca Raton London New York Washington, D.C. Library of Congress Cataloging-in-Publication Data Liu, Ben-Hui. Statistical genomics : linkage, mapping, and QTL analysis I Ben-Hui Liu. p. em. Includes bibliographical references and index. ISBN 0-8493-3166-8 I. Genetics-Statistical methods. 2. Gene mapping. I. Title. QH438.4.S73L55 1997 572.8'633'0727-dc21 97-1863 CIP This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe. Visit the CRC Press Web site at www.crcpress.com © 1998 by CRC Press LLC No claim to original U.S. Government works International Standard Book Number 0-8493-3166-8 Library of Congress Card Number 97-1863 9 0 To QIANG, MING AND ANNIE FOREWORD Statistical Genomics is a new book about an old subject, viewed in a new way. The major focus of the book is genetic mapping, a technique that has been an indispensable part of genetic analysis since Alfred Sturtevant developed the first genetic map in Drosophila more than 85 years ago. This book will be a valuable resource to students learning about genomic science and to geneticists and statisticians using genomic mapping for research and development. This book should be useful for genetic analyses across many disciplines, including plant and animal breeding, the genetic analysis of natural populations and human genetic analysis. There is wide utility for the book both as a text and as a reference. For many students, it will provide their first access to fundamental statistical concepts essential for genomic mapping. For the more experienced scientist, it provides useful explanations of some complex statistical concepts and provides approaches to solving many common problems. In addition, the book provides a considerable amount of new theory and techniques, not yet in the primary literature, related to mating populations and estimation of genetic parameters. Genetic mapping has become an important component of both fundamental research and practical application in a great many studies of plants, animals and microorganisms. The ability to correlate genetic function and chromosome location has led to the identification of the biological roles of a great many important genes. New technology has created the potential to generate functional genetic maps for entire genomes of virtually any plant or animal, limited only by the degree of genetic diversity and the ability to obtain a modest number of progeny. This technology has resulted in an expanded and integrated view of genomes and led to the expectation that we may be able to understand genomes at both a molecular and a phenotypic level. The acquisition and application of this knowledge is one part of what has come to be called genomic science. Our current view of genomic science has its origin in the human genome project, an experiment of major importance for our century. Within a period of only 50 years since Watson and Crick proposed that genetic information was contained in a DNA sequence, it is likely that we will have a catalog of most of the expressed human genes and a large fraction of DNA sequence for the entire human genome. This information will lead to the determination of functions for many genes and to the first estimates of the number of genes that are necessary for the known functions of the human organism. The human genome project will profoundly affect our understanding of human biology and disease, as well as leading to new methods of diagnosis and new therapeutic products. Similarly, the extension of the theory and technology to other organisms will have profound effects on agriculture, forestry, bioprocessing, material science and environmental science as well. vi FOREWORD Since the application of DNA based molecular markers to genomic mapping, formidable advances have been made in human genetics and in plant and animal breeding. Molecular marker analysis has made possible the integration of many aspects of quantitative and molecular genetics through the genetic dissection of quantitative traits. It is now possible to define what fraction of quantitative variation may be assigned to major qualitative factors and quantitative factors in an individual or in a pedigree. These developments have made marker aided selection a practical tool with widening applications and provided new approaches to breeding based on molecular technology. The methods have been applied to many nontraditional genetic systems because the technology allows development of intensive genetic analysis, even in organisms not previously studied. Detailed genetic analysis, including genetic mapping of forest trees from natural populations, has become possible through these advances in technology. Genomic mapping has allowed greater understanding of many types of complex traits, including traits with incomplete penetrance, heterosis, genetic load, segregation distortion and epistasis, through the correlation of molecular markers and components of a complex phenotype. Genomic mapping is an essential tool for the discovery of new genes and the definition of their function. Location of a qualitative trait locus may define a new gene. Location of a gene known only by sequence may distinguish between members of a multigene family related directly by descent (orthologs) or through duplication and divergence of function (paralogs). Used in this way, mapping of quantitative effects becomes a tool for molecular genetics. The primary focus of this book is on new statistical methods needed to use genetic mapping in the context of this new genomic research paradigm. Genomic science represents a new integration of many aspects of genetics related to the determination of the function, location and evolution of all genes in all genomes. This concept brings together molecular genetics, quantitative genetics, cytogenetics, evolutionary biology, statistics and the technology of computers and automated systems for structural and quantitative analysis of nucleic acid sequences. Continued integration of these diverse disciplines requires new statistical methods such as those found in this book. Ronald Sederoff, Edwin F. Conger Professor of Forestry and Director of the Forest Biotechnology Group North Carolina State University PREFACE GENOMICS Genomics is a new science that studies whole genomes by integrating the traditional disciplines of cytology, Mendelian genetics, quantitative genetics, population genetics and molecular genetics with new technology from informat ics and automated robotic systems. The purpose of genomic research is to learn about the structure, function and evolution of all genomes, past and present. As a branch of biological science, genomics has made great progress in the last decade and has helped to fill the gap between the laws of chemistry, physics and biology. Many fundamental biological questions are close to being answered as a result of research in genomics, such as: • What aspects of genome structure are physically and chemi cally necessary? • Do genes have to be located at certain sites in a genome to per form their functions? • What DNA sequences and structures are needed for genes to perform their specific functions? • How many functional genes are necessary and sufficient in a biological system, such as an animal or a plant species? • How many different functional genes are present in the whole biosphere? • How many genes are essentially the same in terms of their DNA sequences and structures across different species? What Does Genomics Include? Classical genomics started in the early 20th century when the discovery of gene linkage (genes on the same chromosome) led to the idea of making a link age map. This classical discipline was rejuvenated following the spectacular development of molecular marker technology beginning in 1980, especially when the Human Genome Project and the Plant Genome Project were initiated. Classi cal genomics, based mainly on cytogenetics and genetic mapping, is close to tra ditional genetics. In a way, classical genomics is a bridge between genetics and the other branches of genomics, as genomics as a whole is a bridge among differ ent branches of genetics. Genomic informatics was created to manage the massive amount of new information which arose from the advances of genomics. Physical genomics emphasizes the physical composition of genomes, such as nucleotide sequences and DNA clones and includes subjects such as DNA sequencing, DNA library construction, physical map assembly and DNA sequence analysis. viii PREFACE Where Is Genomics Heading? Genomics has been driven by technology development since its inception. This impetus will continue into the future. Large throughput DNA sequencing equipment, DNA chip technology for quick and automated genotyping, high capacity robotic stations for sample preparation and biochemical assays, sophisticated mathematical algorithms and statistical procedures for informa tion analyses and high power computer software packages for handling data may dominate the development of genoinics. Genomics is driven also by practical applications. It is widely recognized that genomic research has a great potential to benefit the biomedical industry, agri culture and forestry and forensic science. Genome programs (human, plant and animal) will greatly advance the technology for DNA analysis and. generate vast amounts of information that will ultimately lead to a deeper understanding of biology. The technology and information from genome research can revolutionize the health-care system by advancing the diagnosis of human diseases and for forensic purposes by enabling more precise identification of human individuals. For agriculture and forestry, the technology and information from genomics can assist plant and animal breeding, disease management, genetic diversity assess ment and variety protection. These applications will help to ensure environmen tally friendly and sustainable agriculture and forestry. THIS BOOK Topics Statistics plays an essential role in genomics because random sampling and experimental error are greatly involved in genomic·research. This book covers the following topics in the area of statistical genomics: • introduction to genomics (Chapter 1) • brief biological background for genomics, including basic genet ics, mating design and genetic markers (Chapters 2 - 3) • brief statistical theory for genomics, including probability distri bution, hypothesis testing and point and confidence interval estimation (Chapter 4) • statistical principles and methods for screening genetic mark ers, linkage analysis, linkage grouping, gene ordering, multi locus models, linkage map construction, linkage map merging and searching for genes using population .disequilibrium (Chapters 5 - 11) • theory and methods for identifYing genes controlling complex traits, commonly called Quantitative Trait Loci (QTL) mapping, including single-marker analysis, interval mapping, composite interval mapping, QTL mapping using natural populations and some future considerations (Chapters 12 - 17) • computer tools for genomic map construction and QTL analysis (Chapter 18) • resampling techniques and computer simulations in genomics, including jackknife and bootstrap methods, permutation analy sis and mapping population simulations (Chapter 19) Intended Audience This book can be used as a handbook for biologists interested in statistical issues related to genomics, as a textbook for upper-level undergraduate and graduate students majoring in genetics or in genomic science and as an intra- PREFACE ix duction to genomics for statisticians. For biologists, information is outlined on how statistical analyses of genomic data are implemented and how results are interpreted and presented. Students will find information on the statistical prin ciples and on new developments in the field of statistical genomics. Information that ties statistics and biology together is outlined for statisticians. The book is intended to reach many scientists in plant, animal and human genome communities. Important topics, such as linkage analysis and QTL map ping, are divided into different parts for experimental populations (controlled crosses) and natural populations (uncontrolled crosses). Experimental popula tions cover most mapping populations for plants and animals. The natural pop ulations include some of the plant (for example, forest trees) and animal populations and the mapping populations for the humans. How to Read This Book As a handbook: This book can be a handbook for genomic data analysis for readers with either a good background in statistics (statisticians) or a good back ground in biology (biologists). The considerable information on the biological aspects of genomics will help the statisticians in their interpretations of the sta tistical treatments of genomic data. For the biologists, the detailed discussion on the statistical procedures and derivations will help them to gain a better under standing of the principles behind computer software packages for genomic anal ysis. I have included 222 tables and 170 figures to explain both the underlying statistics and the biology. This certainly will help the biologists to follow the sta tistical procedures and the statisticians to understand the kinds of biological data with which they work. As a textbook: This book can be used as a textbook for students majoring in genetics or genomic science. For undergraduates with basic biology, a one semester introduction to genomics can be covered by: • Chapters 1, 2, 3, 6, 9, 12, 13 and 18. For upper level undergraduates or graduates with basic genetics and statistics, a one-semester presentation of statistical genomics can include the following chapters: • track one for biologists: Chapters 5 to 18 and 4; or • track two for statisticians: Chapters 2, 3, 5 to 19. I presume that students in track one already have basic knowledge of genomics, such as basic genetics and DNA structure and that those in track two have a rel atively good background in theoretical statistics. About Exercises A set of exercises is given at the end of each chapter. Some of the exercises emphasizing statistical derivation usually have defined answers and are rela tively easy for readers with a strong background in statistics. Those exercises should be very helpful to biologists in building a firm background in statistical genomics. Some exercises emphasize a comprehensive understanding in biology and these exercises may not have defined answers. I included some of my thoughts on the perspectives of genomics into some of the exercises rather than into the text and I encourage readers to try those exercises in the context of group discussions.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.