SYSTEMS AND COMPUTATIONAL BIOLOGY – BIOINFORMATICS AND COMPUTATIONAL MODELING Edited by Ning-Sun Yang Systems and Computational Biology – Bioinformatics and Computational Modeling Edited by Ning-Sun Yang Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2011 InTech All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any medium, so long as the original work is properly cited. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book. Publishing Process Manager Davor Vidic Technical Editor Teodora Smiljanic Cover Designer Jan Hyrat Image Copyright Reincarnation, 2011. Used under license from Shutterstock.com First published August, 2011 Printed in Croatia A free online edition of this book is available at www.intechopen.com Additional hard copies can be obtained from [email protected] Systems and Computational Biology – Bioinformatics and Computational Modeling, Edited by Ning-Sun Yang p. cm. ISBN 978-953-307-875-5 free online editions of InTech Books and Journals can be found at www.intechopen.com Contents Preface IX Part 1 Tools and Design for Bioinformatics Studies 1 Chapter 1 Parallel Processing of Complex Biomolecular Information: Combining Experimental and Computational Approaches 3 Jestin Jean-Luc and Lafaye Pierre Chapter 2 Bioinformatics Applied to Proteomics 25 Simone Cristoni and Silvia Mazzuca Chapter 3 Evolutionary Bioinformatics with a Scientific Computing Environment 51 James J. Cai Part 2 Computational Design and Combinational Approaches for Systems Biology 75 Chapter 4 Strengths and Weaknesses of Selected Modeling Methods Used in Systems Biology 77 Pascal Kahlem, Alessandro DiCara, Maxime Durot, John M. Hancock, Edda Klipp, Vincent Schächter, Eran Segal, Ioannis Xenarios, Ewan Birney and Luis Mendoza6 Chapter 5 Unifying Bioinformatics and Chemoinformatics for Drug Design 99 J.B. Brown and Yasushi Okuno Chapter 6 Estimating Similarities in DNA Strings Using the Efficacious Rank Distance Approach 121 Liviu P. Dinu and Andrea Sgarro Chapter 7 The Information Systems for DNA Barcode Data 139 Di Liu and Juncai Ma VI Contents Chapter 8 Parallel Processing of Multiple Pattern Matching Algorithms for Biological Sequences: Methods and Performance Results 161 Charalampos S. Kouzinopoulos, Panagiotis D. Michailidis and Konstantinos G. Margaritis Part 3 Techniques for Analysis of Protein Families and Small Molecules 183 Chapter 9 Understanding Tools and Techniques in Protein Structure Prediction 185 Geraldine Sandana Mala John, Chellan Rose and Satoru Takeuchi Chapter 10 Protein Progressive MSA Using 2-Opt Method 213 Gamil Abdel-Azim, Aboubekeur Hamdi-Cherif, Mohamed Ben Othman and Z.A. Aboeleneen Chapter 11 Clustering Libraries of Compounds into Families: Asymmetry-Based Similarity Measure to Categorize Small Molecules 229 Wieczorek Samuel, Aci Samia, Bisson Gilles, Gordon Mirta, Lafanechère Laurence, Maréchal Eric and Roy Sylvaine Chapter 12 Systematic and Phylogenetic Analysis of the Ole e 1 Pollen Protein Family Members in Plants 245 José Carlos Jiménez-López, María Isabel Rodríguez-García and Juan de Dios Alché Chapter 13 Biological Data Modelling and Scripting in R 261 Srinivasan Ramachandran, Rupanjali Chaudhuri, Srikant Prasad Verma, Ab Rauf Shah, Chaitali Paul, Shreya Chakraborty, Bhanwar Lal Puniya and Rahul Shubhra Mandal Chapter 14 Improving Bio-technology Processes Using Computational Techniques 289 Avinash Shankaranarayanan and Christine Amaldas Chapter 15 Signal Processing Methods for Capillary Electrophoresis 311 Robert Stewart, Iftah Gideoni and Yonggang Zhu Preface Immediately after the first drafts of the human genome sequence were reported almost a decade ago, the importance of genomics and functional genomics studies became well recognized across the broad disciplines of biological sciences research. The initiatives of Leroy Hood and other pioneers on developing systems biology approaches for evaluating or addressing global and integrated biological activities, mechanisms, and network systems have motivated many of us, as bioscientists, to re- examine or revisit a whole spectrum of our previous experimental findings or observations in a much broader, link-seeking and cross-talk context. Soon thereafter, these lines of research efforts generated interesting, fancy and sometimes misleading new names for the now well-accepted “omics” research areas, including functional genomics, (functional) proteomics, metabolomics, transcriptomics, glycomics, lipidomics, and cellomics. It may be interesting for us to try to relate these “omics” approaches to one of the oldest omics studies that we all may be quite familiar with, and that is “economics”, in a way that all “omics” indeed seemed to have meant to address the mechanisms/activities/constituents in a global, inter-connected and regulated way or manner. The advancement of a spectrum of technological methodologies and assay systems for various omics studies has been literally astonishing, including next-generation DNA sequencing platforms, whole transcriptome microarrays, micro-RNA arrays, various protein chips, polysaccharide or glycomics arrays, advanced LC-MS/MS, GC-MS/MS, MALDI-TOF, 2D-NMR, FT-IR, and other systems for proteome and metabolome research and investigations on related molecular signaling and networking bioactivities. Even more excitingly and encouragingly, many outstanding researchers previously trained as mathematicians, information or computation scientists have courageously re-educated themselves and turned into a new generation of bioinformatics scientists. The collective achievements and breakthroughs made by our colleagues have created a number of wonderful database systems which are now routinely and extensively used by not only young but also “old” researchers. It is very difficult to miss the overwhelming feeling and excitement of this new era in systems biology and computational biology research. It is now estimated, with good supporting evidence by omics information, that there are approximately 25,000 genes in the human genome, about 45,000 total proteins in X Preface the human proteome, and around 3000 species of primary and between 3000 and 6000 species of secondary metabolites, respectively, in the human body fluid/tissue metabolome. These numbers and their relative levels to each other are now helping us to construct a more comprehensive and realistic view of human biology systems. Likewise, but maybe to a lesser extent, various baseline omics databases on mouse, fruit fly, Arabidopsis plant, yeast, and E. coli systems are being built to serve as model systems for molecular, cellular and systems biology studies; these efforts are projected to result in very interesting and important research findings in the coming years. Good findings in a new research area may not necessarily translate quickly into good or high-impact benefits pertaining to socio-economic needs, as may be witnessed now by many of us with regard to research and development in omics science/technology. To some of us, the new genes, novel protein functions, unique metabolite profiles or PCA clusters, and their signaling systems that we have so far revealed seemed to have yielded less than what we have previously (only some 5 to 10 years ago) expected, in terms of new targets or strategies for drug or therapeutics development in medical sciences, or for improvement of crop plants in agricultural science. Nonetheless, some useful new tools for diagnosis and personalized medicine have been developed as a result of genomics research. Recent reviews on this subject have helped us more realistically and still optimistically to address such issues in a socially responsible academic exercise. Therefore, whereas some “microarray” or “bioinformatics” scientists among us may have been criticized as doing “cataloging research”, the majority of us believe that we are sincerely exploring new scientific and technological systems to benefit human health, human food and animal feed production, and environmental protections. Indeed, we are humbled by the complexity, extent and beauty of cross-talks in various biological systems; on the other hand, we are becoming more educated and are able to start addressing honestly and skillfully the various important issues concerning translational medicine, global agriculture, and the environment. I am very honored to serve as the editor of these two volumes on Systems and Computational Biology: (I) Molecular and Cellular Experimental Systems, and (II) Bioinformatics and Computational Modeling. I believe that we have collectively contributed a series of high-quality research or review articles in a timely fashion to this emerging research field of our scientific community. I sincerely hope that our colleagues and readers worldwide will help us in future similar efforts, by providing us feedback in the form of critical comments, interdisciplinary ideas and innovative suggestions on our book chapters, as a way to pay our high respect to the biological genomes on planet earth. Dr. Ning-Sun Yang Agricultural Biotechnology Research Center, Academia Sinica Taiwan, R.O.C