ebook img

Current Topics in Computational Molecular Biology PDF

556 Pages·2002·11.081 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Current Topics in Computational Molecular Biology

Current Topics in Computational Molecular Biology Computational MolecularBiology SorinIstrail, Pavel Pevzner,and Michael Waterman,editors Computational Methods forModeling Biochemical Networks JamesM. Bower and HamidBolouri,editors,2000 Computational MolecularBiology: AnAlgorithmic Approach Pavel A. Pevzner,2000 Current Topics inComputationalMolecular Biology Tao Jiang, Ying Xu, and Michael Q. Zhang, editors, 2002 Current Topics in Computational Molecular Biology edited by Tao Jiang Ying Xu Michael Q. Zhang A BradfordBook The MIT Press Cambridge, Massachusetts London, England (2002MassachusettsInstituteofTechnology Allrightsreserved.Nopartofthisbookmaybereproducedinanyformbyanyelectronicormechanical means(includingphotocopying,recording,orinformationstorageandretrieval)withoutpermissionin writingfromthepublisher. PublishedinassociationwithTsinghuaUniversityPress,Beijing,China,aspartofTUP’sFrontiersof ScienceandTechnologyforthe21stCenturySeries. ThisbookwassetinTimesNewRomanon3B2byAscoTypesetters,HongKongandwasprintedand boundintheUnitedStatesofAmerica. LibraryofCongressCataloging-in-PublicationData Currenttopicsincomputationalmolecularbiology/editedbyTaoJiang,YingXu,MichaelZhang. p. cm.—(Computermolecularbiology) Includesbibliographicalreferences. ISBN0-262-10092-4(hc.:alk.paper) 1.Molecularbiology—Mathematics. 2.Molecularbiology—Dataprocessing. I.Jiang,Tao,1963– II.Xu,Ying. III.Zhang,Michael. IV.Series. QH506.C88 2002 572.8001051—dc21 2001044430 Contents Preface vii I INTRODUCTION 1 1 The Challenges Facing GenomicInformatics 3 TempleF. Smith II COMPARATIVESEQUENCE AND GENOMEANALYSIS 9 2 BayesianModelingandComputation inBioinformaticsResearch 11 Jun S. Liu 3 Bio-Sequence Comparisonand Applications 45 XiaoqiuHuang 4 Algorithmic Methods for Multiple Sequence Alignment 71 Tao Jiang and Lusheng Wang 5 Phylogenetics andthe Quartet Method 111 Paul Kearney 6 GenomeRearrangement 135 DavidSanko¤ and Nadia El-Mabrouk 7 Compressing DNA Sequences 157 Ming Li III DATAMININGAND PATTERN DISCOVERY 173 8 LinkageAnalysisof Quantitative Traits 175 ShizhongXu 9 Finding Genes by Computer: Probabilistic and Discriminative Approaches 201 Victor V. Solovyev 10 ComputationalMethods for Promoter Recognition 249 Michael Q. Zhang 11 Algorithmic Approaches toClustering Gene Expression Data 269 Ron Shamirand Roded Sharan 12 KEGGfor ComputationalGenomics 301 Minoru Kanehisa and Susumu Goto vi Contents 13 Datamining: Discovering Information from Bio-Data 317 LimsoonWong IV COMPUTATIONAL STRUCTURAL BIOLOGY 343 14 RNA SecondaryStructure Prediction 345 Zhuozhi Wang and Kaizhong Zhang 15 Properties and Predictionof Protein SecondaryStructure 365 VictorV. Solovyev and IlyaN. Shindyalov 16 Computational Methods for Protein Folding:Scalinga Hierarchyof Complexities 403 Hue Sun Chan, Hu¨seyin Kaya, and Seishi Shimizu 17 Protein Structure Prediction byComparison:Homology-Based Modeling 449 Manuel C. Peitsch, Torsten Schwede,AlexanderDiemand, and Nicolas Guex 18 Protein Structure Prediction byProtein Threading and Partial Experimental Data 467 Ying Xu and DongXu 19 Computational Methods for Docking and Applications to Drug Design: FunctionalEpitopes and Combinatorial Libraries 503 Ruth Nussinov, Buyong Ma,and Haim J. Wolfson Contributors 525 Index 527 Preface Science is advanced by new observations and technologies. The Human Genome Project has led to a massive outpouring of genomic data, which has in turn fueled the rapid developments of high-throughput biotechnologies. We are witnessing a revolutiondrivenbythehigh-throughputbiotechnologiesanddata,arevolutionthatis transformingtheentirebiomedicalresearchfieldintoanewsystemslevelofgenomics, transcriptomics,andproteomics,fundamentallychanginghowbiologicalscienceand medicalresearcharedone.Thisrevolutionwouldnothavebeenpossibleiftherehad not been a parallel emergence of the new field of computational molecular biology, or bioinformatics, as many people would call it. Computational molecular biology/ bioinformaticsisinterdisciplinarybynatureandcallsuponexpertiseinmanydi¤erent disciplines—biology, mathematics, statistics, physics, chemistry, computer science, andengineering;andisubiquitousattheheartofalllarge-scaleandhigh-throughput biotechnologies. Though, like many emerging interdisciplinary fields, it has not yet found its own natural home department within traditional university settings, it has beenidentifiedasoneofthetopstrategicgrowingareasthroughoutacademicaswell as industrial institutions because of its vital role in genomics and proteomics, and its profoundimpact onhealthand medicine. Attheeveofthecompletionofthehumangenomesequencingandannotation,we believe it would be very useful and timely to bring out this up-to-date survey of cur- rent topics in computational molecular biology. Because this is a rapidly developing fieldandcoversaverywiderangeoftopics,itisextremelydi‰cultforanyindividual towriteacomprehensivebook.Wearefortunatetobeabletopulltogetherateamof renownedexpertswhohavebeenactivelyworkingattheforefrontofeachmajorarea of the field. This book covers most of the important topics in computational molec- ular biology, ranging from traditional ones such as protein structure modeling and sequence alignment, to the recently emerged ones such as expression data analysis andcomparativegenomics.Italsocontainsageneralintroductiontothefield,aswell as a chapter on general statistical modeling and computational techniques in molec- ular biology. Although there are already several books on computational molecular biology/bioinformatics, we believe that this book is unique as it covers a wide spec- trum of topics (including a number of new ones not covered in existing books, such as gene expression analysis and pathway databases) and it combines algorithmic, statistical, database, and AI-basedmethods for biologicalproblems. Althoughwe havetried toorganize thechaptersin alogicalorder,each chapteris aself-containedreviewofaspecificsubject.Ittypicallystartswithabriefoverviewof a particular subject, then describes in detail the computational techniques used and thecomputationalresultsgenerated,andendswithopenchallenges.Hencethereader neednot read the chapters sequentially. We have selected thetopicscarefully sothat viii Preface thebookwouldbeusefultoabroadreadership,includingstudents,nonprofessionals, andbioinformatic experts who want to brush up topics related to their own research areas. The19chaptersaregroupedintofoursections.Theintroductorysectionisachapter byTempleSmith,whoattemptstosetbioinformaticsintoausefulhistoricalcontext. Foroverhalfacentury,mathematicsandevencomputer-basedanalyseshaveplayed afundamentalroleinbringingourbiologicalunderstandingtoitscurrentlevel.Toa verylargeextent,whatisnewisthetypeandsheervolumeofnewdata.Thebirthof bioinformaticswasadirectresultofthisnewdataexplosion.Asthisinterdisciplinary area matures, it is providing the data and computational support for functional genomics,whichisdefinedastheresearchdomainfocusedonlinkingthebehaviorof cells,organisms, and populations to the information encoded in the genomes. The secondofthefoursectionsconsists of sixchaptersoncomputational methods forcomparative sequence andgenomeanalyses. Liu’s chapter presents a systematic development of the basic Bayesian methods alongside contrasting classical statistics procedures, emphasizing the conceptual im- portanceofstatisticalmodelingandthecoherentnatureoftheBayesianmethodology. The missing data formulation is singled out as a constructive framework to help one build comprehensive Bayesian models and design e‰cient computational strategies. Liu describes the powerful computational techniques needed in Bayesian analysis, including the expectation-maximization algorithm for finding the marginal mode, MarkovchainMonteCarloalgorithmsforsimulatingfromcomplexposteriordistri- butions, and dynamic programming-like recursive procedures for marginalizing out uninteresting parameters or missing data. Liu shows that the popular motif sampler usedforfindinggeneregulatorybindingmotifsandforaligningsubtleproteinmotifs can be derived easily from a Bayesian missing data formulation. Huang’s chapter focuses on methods for comparing two sequences and their applications in the analysis of DNA and protein sequences. He presents a global alignment algorithm for comparing two sequences that are entirely similar. He also describes a local alignment algorithm for comparing sequences that contain locally similar regions. The chapter gives e‰cient computational techniques for comparing two long sequences and comparing two sets of sequences, and it provides real appli- cations to illustrate the usefulness of sequence alignment programs in the analysis of DNA and protein sequences. The chapter by Jiang and Wang provides a survey on computational methods for multiple sequence alignment, which is a fundamental and challenging problem in computational molecular biology. Algorithms for multiple sequence alignment are routinely used to find conserved regions in biomolecular sequences, to construct Preface ix family and superfamily representations of sequences, and to reveal evolutionary histories of species (or genes). The authors discuss some of the most popular mathematical models for multiple sequence alignment and e‰cient approximation algorithms for computing optimal multiple alignment under these models. The main focusofthechapterisonrecentadvancesincombinatorial(asopposedtostochastic) algorithms. Kearney’s chapter illustrates the basic concepts in phylogenetics, the design and development of computational tools for evolutionary analyses, using the quartet methodasanexample.Quartetmethodshaverecentlyreceivedmuchattentioninthe research community. This chapter begins by examining the mathematical, compu- tational, and biological foundations of the quartet method. A survey of the major contributions to the method reveals an excess of diverse and interesting concepts in- dicativeofaripeningresearchtopic.Thesecontributionsareexaminedcriticallywith strengths, weakness,and open problems. Sanko¤ and El-Mabrouk’s chapter describes the basic concepts of genome re- arrangement and applications. Genome structure evolves through a number of non- local rearrangement processes that may involve an arbitrarily large proportion of a chromosome. The formal analysis of rearrangements di¤ers greatly from DNA and proteincomparisonalgorithms.Inthischapter,theauthorsformalizethenotionofa genomeintermsofasetofchromosomes,eachconsistingofanorderedsetofgenes. The chapter surveys genomic distance problems, including the Hannenhalli-Pevzner theory for reversals and translocations, and covers the progress to date on phyloge- neticextensionsofrearrangementanalysis.Recentworkfocusesonproblemsofgene and genome duplication and their implications for genomic distance and genome- based phylogeny. The chapter by Li describes the author’s work on compressing DNA sequences and applications. The chapter concentrates on two programs the author has devel- oped: a lossless compression algorithm, GenCompress, which achieves the best com- pressionratiosforbenchmarksequences;andanentropyestimationprogram,GTAC, which achieves the lowest entropy estimation for benchmark DNA sequences. The author then discusses a new information-based distance measure between two se- quencesandshowshowtousethecompressionprogramsasheuristicstorealizesuch distancemeasures.Someexperimentsaredescribedtodemonstratehowsuchatheory can be used to comparegenomes. The third section covers computational methods for mining biological data and discovering patterns hidden in the data. The chapter by Xu presents an overview of the major statistical techniques for quantitative trait analysis. Quantitative traits are defined as traits that have a con-

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.