ebook img

Bioinformatics : principles and applications PDF

229 Pages·2005·22.242 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bioinformatics : principles and applications

Principles and Applications K K M M Author’s Profile Harshawardhan P. Bal is currently a management and strategy consultant at Booz Allen Hamilton Inc., Rockville, MD. He has more than a decade of expe- rience in biotechnology and bioinformatics, both in academia and in the indus- try. Through his research, he has made direct contributions to the annotation of the important food crop—rice—and in mining the human genome to iden- tify novel target proteins for new drug development. At Millennium Pharmaceuticals Inc., Cambridge, MA, Harshawardhan ac- quired considerable experience in design and deployment of enterprise-wide knowledge management systems in the pharma industry. Harshawardhan has a Master’s degree in pharmaceutical sciences and was a formulation scientist in the pharma industry in Mumbai. He has a PhD. in molecular biology from the National Institute of Immunology, New Delhi. He pursued research on HIV/AIDS and gene therapy at the University of Roches- K ter Medical Center, Rochester, NY and moved on to Cold Spring Harbor Labo- K M M ratory, Cold Spring Harbor, NY. At Cold Spring, he worked on whole genome sequencing projects and received training from experts such as Prof. W. Rich- ard McCombie, Dr. Andy Baxevanis, Dr. William Pearson, Dr. Randall Smith, and Dr. Stephen Altschul. Harshawardhan is the author of several peer-reviewed publications in sci- entific journals and a book entitled Perl Programming for Bioinformatics. Principles and Applications Harshawardhan P. Bal Management and Strategy Consultant Booz Allen Hamilton Inc., Rockville, MD K K M M Tata McGraw-Hill Publishing Company Limited NEW DELHI McGraw-Hill Offices New Delhi New York St Louis San Francisco Auckland Bogotá Caracas Kuala Lumpur Lisbon London Madrid Mexico City Milan Montreal San Juan Santiago Singapore Sydney Tokyo Toronto K K M M Copyright © 2005, by Tata McGraw-Hill Publishing Company Limited. No part of this publication may be reproduced or distributed in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise or stored in a database or retrievalsystemwithoutthepriorwrittenpermissionofthepublishers.Theprogramlistings (if any) may be entered, stored and executed in a computer system, but they may not be reproduced for publication. This edition can be exported from India only by the publishers, Tata McGraw-Hill Publishing Company Limited. ISBN 0-07-058320-X Published by the Tata McGraw-Hill Publishing Company Limited, 7 West Patel Nagar, New Delhi 110 008, typeset in PalmSprings at The Composers, 260, C.A. Apt., Pashchim Vihar, New Delhi 110 063 and printed at SDR Printers, A-28, West Jyoti Nagar, Shahdara, Loni Road, Delhi 110 094 RZZCRDRIRYZLA To My parents, wife, and son K K M M Preface Modern science has been transformed in recent times. Our thinking, our ways of analyses, our tools, our experimental systems, and certainly our powers to probe living systems have fundamentally altered in ways that we never imagined. Bioinformatics is that one field of science which has admirably K demonstrated what integration and knowledge sharing across different K M M disciplines can achieve to advance our understanding of complex living systems. This book is about those fundamental tools and devices that spearheaded swift changes, which revolutionized biomedical research and enabled us to perform biology in silico. Today, as a result of these tools (and despite their limitations), discovering novel coding regions, genes and gene products in a haystack of unknown sequences, searching for remote homologies between sequences, etc. are but routine tasks that biologists with little or no background in computer science can perform effortlessly at the flick of a button. The volume, the unstructured or the heterogeneous nature of data, is no longer a bottleneck to scientific research. Instead, scientists can now focus on the more important and fundamental questions of the molecular basis of disease, and find new cures for hitherto untreatable ailments. Part I of the book focuses on a core set of tools that have become indispensable to scientific discoveries. Part II of the book focuses on how these tools can be integrated with BioPerl modules programmatically, to enable them in an enhanced—bioinformatics on steroids—manner. The first book in the series, Perl Programming for Bioinformatics, introduced Unix and Perl programming for bioinformatics analysis. The intent of this viii Preface book is to supplement it with the knowledge of bioinformatics tools and BioPerl. Both books have been written with a grassroots approach based on real-life experiences from high throughput genome sequencing centers and the pharma industry. It is hoped that the two books will facilitate the transition a biologist needs to make into the intriguing and fast-paced world of bioinformatics. Thank you and happy reading! HARSHAWARDHAN P. BAL K K M M Acknowledgements My first words of appreciation are for those clairvoyant thought leaders who brought together modern biology, medicine, mathematics, and information tech- nology, and laid the foundations for the advent of the new sciences of genomics and bioinformatics. K K M My transition from molecular biology to bioinformatics was an exciting and M intellectually rewarding experience, and indeed, provided me with new ways to put my basic research skills to understanding genome research, complex disease pathology, and drug discovery. I would like to thank the many teachers who made this possible. Among these are my mentors, Neilay Dedhia and W. Richard McCombie, and my colleagues at the Lita Annenberg Hazen Genome Sequencing Center at the Cold Spring Harbor Laboratory, New York, who helped me make this transition. I also thank my mentor Brian Osborne at OSI Pharmaceuticals, Melville, New York, who first gave me the opportunity to utilize my combination of molecular biology and bioinformatics knowledge to new target and drug discovery. No experience in bioinformatics can be complete without an understanding of modern day software design and development techniques, and I thank my supervisor, Jeffrey Moore, for providing this at Millennium Pharmaceuticals, Inc., Cambridge, MA. It was also at Millennium that I applied Knowledge Management to large scale integration of heterogeneous data sets emanating from diverse sources in a typical pharma environment such as high through- put genome sequencing, genome annotation, target validation, transcriptional profiling, pathway analysis and proteomics, etc. x Acknowledgements I also want to thank Wayne Marasco at the Division of Cancer Immunology and AIDS, at the Dana-Farber Cancer Institute, an NCI designated Comprehensive cancer center, and Harvard Medical School teaching affiliate, Boston, Massachusetts, for enabling me to come full circle and lead a full scale development effort in discovery research of Adult T-cell Leukemia and HIV/ AIDS. Finally, I would like to thank the readers of my first book for encouraging me with their enthusiasm and their faith in me—I hope this second book proves as enjoyable and useful as the first. Of course, nothing would have been possible without the dedicated efforts of the Tata McGraw-Hill team who guided me through the entire publication process and kept me motivated to keep turning the pages till the book was complete. HARSHAWARDHAN P. BAL K K M M Contents Preface vii Acknowledgements ix PART ONE: PRINCIPLES K 1. Web-based Sequence Analysis: BLAST I 3 K M M 1.1 Basic Local Alignment Search Tool (BLAST) 3 1.2 The Purpose of BLAST 3 1.3 Terminology 5 1.4 BLAST Analysis 9 1.5 BLAST 2 13 1.6 Automated Alignments with Perl 17 References 22 2. Web-based Sequence Analysis: BLAST II 23 2.1 Basic Local Alignment Search Tool (BLAST) 23 2.2 Scoring Matrices 23 2.3 PAM or Per cent Accepted Mutation Matrices 24 2.4 BLOSUM (Blocks Substitution Matrices) 25 2.5 The Relationship between BLOSUM and PAM Substitution Matrices 26 2.6 Working of the BLAST Algorithm 26 2.7 A Practical BLASTN Exercise 28 2.8 Explanation of the BLAST Output 31 2.9 Advanced BLASTN 35 2.10 Biological Analysis of BLASTN: Cystic Fibrosis 40 2.11 Automating BLAST Analyses with Perl 41

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.