ebook img

Bioinformatics Challenges at the Interface of Biology and Computer Science: Mind the Gap PDF

424 Pages·2016·5.341 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bioinformatics Challenges at the Interface of Biology and Computer Science: Mind the Gap

Bioinformatics Challenges at the Interface of Biology and Computer Science Mind The Gap TERESA K. ATTWOOD STEPHEN R. PETTIFER DAVID THORNE Bioinformatics challenges at the interface of biology and computer science Mind the Gap Bioinformatics challenges at the interface of biology and computer science: Mind the Gap Teresa K. Attwood, Stephen R. Pettifer and David Thorne The University of Manchester Manchester, UK This edition first published 2016 © 2016 by John Wiley & Sons Ltd. Registered office: John Wiley & Sons, Ltd., The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK Editorial offices: 9600 Garsington Road, Oxford, OX4 2DQ, UK The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK 111 River Street, Hoboken, NJ 07030-5774, USA For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell. The right of the author to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. Limit of Liability/Disclaimer of Warranty: While the publisher and author(s) have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Library of Congress Cataloging-in-Publication Data Names: Attwood, Teresa K., author. | Pettifer, Stephen R. (Stephen Robert), 1970- author. | Thorne, David, 1981-, author. Title: Bioinformatics challenges at the interface of biology and computer science : mind the gap / Teresa K. Attwood, Stephen R. Pettifer and David Thorne. Description: Oxford : John Wiley & Sons Ltd., 2016. | Includes bibliographical references and index. Identifiers: LCCN 2016015332| ISBN 9780470035504 (cloth) | ISBN 9780470035481 (pbk.) Subjects: LCSH: Bioinformatics. Classification: LCC QH324.2 .A87 2016 | DDC 570.285–dc23 LC record available at https://lccn.loc.gov/2016015332 A catalogue record for this book is available from the British Library. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Cover images: Courtesy of the authors, apart from the DNA strand image: Getty/doguhakan. Set in 10/12pt Sabon LT Std by Aptara Inc., New Delhi, India 1 2016 Table of Contents Preface x Acknowledgements xvii About the companion website xviii PART 1 1 Introduction 3 1.1 Overview 3 1.2 Bioinformatics 3 1.2.1 What is bioinformatics? 3 1.2.2 The provenance of bioinformatics 4 1.2.3 The seeds of bioinformatics 5 1.3 Computer Science 7 1.3.1 Origins of computer science 7 1.3.2 Computer science meets bioinformatics 9 1.4 What did we want to do with bioinformatics? 10 1.5 Summary 12 1.6 References 13 1.7 Quiz 14 1.8 Problems 16 2 The biological context 17 2.1 Overview 17 2.2 Biological data‐types and concepts 17 2.2.1 Diversity of biological data‐types 17 2.2.2 The central dogma 18 2.2.3 Fundamental building‐blocks and alphabets 19 2.2.4 The protein structure hierarchy 29 2.2.5 RNA processing in prokaryotes and eukaryotes 30 2.2.6 The genetic code 33 2.2.7 Conceptual translation and gene finding 35 v vi Table of Contents 2.3 Access to whole genomes 42 2.4 Summary 43 2.5 References 43 2.6 Quiz 46 2.7 Problems 47 3 Biological databases 49 3.1 Overview 49 3.2 What kinds of database are there? 49 3.3 The Protein Data Bank (PDB) 50 3.4 The EMBL nucleotide sequence data library 56 3.5 GenBank 58 3.6 The PIR‐PSD 61 3.7 Swiss‐Prot 62 3.8 PROSITE 64 3.9 TrEMBL 69 3.10 InterPro 71 3.11 UniProt 73 3.12 The European Nucleotide Archive (ENA) 77 3.13 Summary 81 3.14 References 82 3.15 Quiz 85 3.16 Problems 87 4 Biological sequence analysis 89 4.1 Overview 89 4.2 Adding meaning to raw sequence data 89 4.2.1 Annotating raw sequence data 94 4.2.2 Database and sequence formats 96 4.2.3 Making tools and databases interoperate 101 4.3 Tools for deriving sequence annotations 103 4.3.1 Methods for comparing two sequences 103 4.3.2 The PAM and BLOSUM matrices 104 4.3.3 Tools for global and local alignment 110 4.3.4 Tools for comparing multiple sequences 114 4.3.5 Alignment‐based analysis methods 115 4.4 Summary 131 4.5 References 132 4.6 Quiz 134 4.7 Problems 136 5 The gap 138 5.1 Overview 138 5.2 Bioinformatics in the 21st century 138 5.3 Problems with genes 139 Table of Contents vii 5.4 Problems with names 142 5.5 Problems with sequences 143 5.6 Problems with database entries 146 5.6.1 Problems with database entry formats 147 5.7 Problems with structures 148 5.8 Problems with alignments 150 5.8.1 Different methods, different results 150 5.8.2 What properties do my sequences share? 154 5.8.3 How similar are my sequences? 157 5.8.4 How good is my alignment? 160 5.9 Problems with families 163 5.10 Problems with functions 168 5.11 Functions of domains, modules and their parent proteins 173 5.12 Defining and describing functions 176 5.13 Summary 179 5.14 References 180 5.15 Quiz 182 5.16 Problems 183 PART 2 6 Algorithms and complexity 187 6.1 Overview 187 6.2 Introduction to algorithms 187 6.2.1 Mathematical computability 189 6.3 Working with computers 191 6.3.1 Discretisation of solutions 191 6.3.2 When computers go bad 193 6.4 Evaluating algorithms 197 6.4.1 An example: a sorting algorithm 197 6.4.2 Resource scarcity: complexity of algorithms 199 6.4.3 Choices, choices 200 6.5 Data structures 201 6.5.1 Structural consequences 202 6.5.2 Marrying form and function 210 6.6 Implementing algorithms 211 6.6.1 Programming paradigm 212 6.6.2 Choice of language 214 6.6.3 Mechanical optimisation 216 6.6.4 Parallelisation 224 6.7 Summary 227 6.8 References 227 6.9 Quiz 227 6.10 Problems 229 viii Table of Contents 7 Representation and meaning 230 7.1 Overview 230 7.2 Introduction 230 7.3 Identification 233 7.3.1 Namespaces 233 7.3.2 Meaningless identifiers are a good thing 233 7.3.3 Identifying things on the Web 236 7.3.4 Cool URIs don’t change 238 7.3.5 Versioning and provenance 238 7.3.6 Case studies 239 7.4 Representing data 243 7.4.1 Design for change 245 7.4.2 Contemporary data‐representation paradigms 247 7.5 Giving meaning to data 255 7.5.1 Bio ontologies in practice 260 7.5.2 First invent the universe 263 7.6 Web services 264 7.6.1 The architecture of the Web 266 7.6.2 Statelessness 267 7.7 Action at a distance 268 7.7.1 SOAP and WSDL 270 7.7.2 HTTP as an API 270 7.7.3 Linked Data 272 7.8 Summary 275 7.9 References 275 7.10 Quiz 276 7.11 Problems 277 8 Linking data and scientific literature 279 8.1 Overview 279 8.2 Introduction 279 8.3 The lost steps of curators 281 8.4 A historical perspective on scientific literature 286 8.5 The gulf between human and machine comprehension 288 8.6 Research objects 295 8.7 Data publishing 297 8.8 Separating scientific wheat from chaff – towards semantic searches 298 8.9 Semantic publication 300 8.9.1 Making articles ‘semantic’ 301 8.10 Linking articles with their cognate data 305 8.10.1 What Utopia Documents does 305 8.10.2 A case study 306

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.