Lecture Notes in Computer Science 4447 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen UniversityofDortmund,Germany MadhuSudan MassachusettsInstituteofTechnology,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA MosheY.Vardi RiceUniversity,Houston,TX,USA GerhardWeikum Max-PlanckInstituteofComputerScience,Saarbruecken,Germany Elena Marchiori Jason H. Moore Jagath C. Rajapakse (Eds.) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics 5th European Conference, EvoBIO 2007 Valencia, Spain, April 11-13, 2007 Proceedings 1 3 VolumeEditors ElenaMarchiori VUUniversityofAmsterdam,IBIVU DepartmentofComputerScience deBoelelaan1081a,1081HVAmsterdam,TheNetherlands E-mail:[email protected] JasonH.Moore Dartmouth-HitchcockMedicalCenter ComputationalGeneticsLaboratory 706RubinBuilding,HB7937,OneMedicalCenterDr.,Lebanon,NH03756,USA E-mail:[email protected] JagathC.Rajapakse NanyangTechnologicalUniversity SchoolofComputerEngineering BlkN4-2a05,50NanyangAvenue,Singapore639798 E-mail:[email protected] Coverillustration:Morphogenesisseries#12byJonMcCormack,2006 LibraryofCongressControlNumber:2007923724 CRSubjectClassification(1998):D.1,F.1-2,J.3,I.5,I.2 LNCSSublibrary:SL1–TheoreticalComputerScienceandGeneralIssues ISSN 0302-9743 ISBN-10 3-540-71782-XSpringerBerlinHeidelbergNewYork ISBN-13 978-3-540-71782-9SpringerBerlinHeidelbergNewYork Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. SpringerisapartofSpringerScience+BusinessMedia springer.com ©Springer-VerlagBerlinHeidelberg2007 PrintedinGermany Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SPIN:12044597 06/3180 543210 Preface The field of bioinformatics has two main objectives: the creation and mainte- nance of biological databases, and the discovery of knowledge from life sciences datainordertounravelthemysteriesofbiologicalfunction,leadingtonewdrugs andtherapiesforhumandisease.Lifesciencesdatacomeintheformofbiological sequences, structures, pathways, or literature. One major aspect of discovering biological knowledge is to search, predict, or model specific patterns present in agivendatasetandthentointerpretthosepatterns.Computersciencemethods such as evolutionary computation, machine learning, and data mining all have a great deal to offer the field of bioinformatics. The goal of the Fifth European Conference onEvolutionaryComputation, Machine Learning,and Data Mining in Bioinformatics (EvoBIO 2007) was to bring experts in computer science to- gether with experts in bioinformatics and the biologicalsciences to explore new and novel methods for solving complex biological problems. The fifth EvoBIOconferencewas heldin Valencia,Spain during April11-13, 2007at the UniversidadPolitecnica de Valencia. EvioBIO2007was held jointly with the Tenth EuropeanConference onGenetic Programming(EuroGP2007), the SeventhEuropeanConferenceonEvolutionaryComputationinCombinato- rialOptimisation(EvoCOP2007),andtheEvoWorkshops.Collectively,thecon- ferencesandworkshopsareorganizedunderthenameEvo*(www.evostar.org). EvoBIO,heldannuallyasaworkshopsince2003,becameaconferencein2007 anditisnowthepremiereEuropeaneventforthoseinterestedintheinterfacebe- tweenevolutionarycomputation,machinelearning,datamining,bioinformatics, and computational biology. All papers in this book were presented at EvoBIO 2007 and responded to a call for papers that included topics of interest such as biomarker discovery, cell simulation and modeling, ecological modeling, flux- omics, gene networks, biotechnology, metabolomics, microarray analysis, phy- logenetics, protein interactions, proteomics, sequence analysis and alignment, and systems biology. A total of 60 papers were submitted to the conference for double-blind peer-review. Of those, 28 (46.7%) were accepted. We would first and foremost like to thank all authors who spent time and effortto makeimportantcontributionsto this book.We wouldliketo thankthe membersoftheProgramCommitteefortheirexpertevaluationofthesubmitted papers. Moreover, we would like to thank Jennifer Willies for her tremendous administrative help and coordination, Anna Isabel Esparcia-Alca´zarfor serving as the Local Chair, Leonardo Vanneschi for serving as Evo* Publicity Chair, MarcSchoenauerandthe MyReviewteam(http://myreview.lri.fr/)forthe conference management system. We would also like to acknowledge the following organizations. The Univer- sidad Polit´ecnica de Valencia, Spain for their institutional and financial sup- port, and for providing premises and administrative assistance; the Instituto Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. VI Preface Tecnol´ogico de Inform´atica in Valencia, for cooperation and help with local ar- rangements; the Spanish Ministerio de Educacio´n y Ciencia, for their financial support;andthe Centrefor EmergentComputing atNapier UniversityinEdin- burgh, Scotland for administrative support and event coordination. Finally, we hope that you will consider contributing to EvoBIO 2008. February 2007 Elena Marchiori Jason H. Moore Jagath C. Rajapakse Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Organization EvoBIO 2007 was organized by Evo* (www.evostar.org). Program Chairs Elena Marchiori (IBIVU, VU University Amsterdam, The Netherlands) Jason H. Moore (Dartmouth Medical School in Lebanon, NH,USA) Jagath C. Rajapakse(Nanyang Technological University, Singapore) General Chairs David W. Corne(Heriot-Watt University, Edinburgh, UK) Elena Marchiori (IBIVU, VU University Amsterdam, The Netherlands) Steering Committee David W. Corne (Heriot-Watt University, Edinburgh, UK) Elena Marchiori (IBIVU, VU University Amsterdam, The Netherlands) Carlos Cotta (University of Malaga, Spain) Jason H. Moore (Dartmouth Medical School in Lebanon, NH,USA) Jagath C. Rajapakse(Nanyang Technological University, Singapore) Program Committee Jesus S. Aguilar-Ruiz (Spain) Elena Marchiori (The Netherlands) Francisco J. Azuaje (UK) Andrew Martin (UK) Wolfgang Banzhaf (Canada) Jason Moore (USA) Jacek Blazewicz (Poland) Pablo Moscato (Australia) Marius Codrea (The Netherlands) Jagath Rajapakse (Singapore) Dave Corne (UK) Menaka Rajapakse (Singapore) Carlos Cotta (Spain) Michael Raymer (USA) Alex Freitas (UK) Vic J. Rayward-Smith (UK) Gary Fogel (USA) Jem Rowland (UK) James Foster (USA) Marylyn Ritchie (USA) Rosalba Giugno (Italy) Ugur Sezerman (Turkey) Raul Giraldez (Spain) El-Ghazali Talbi (France) Jin-Kao Hao (France) Andrea Tettamanzi (Italy) Antoine van Kampen Janet Wiles (Australia) (The Netherlands) Andreas Zell (Germany) Natalio Krasnogor(UK) Eckart Zitzler (Switzerland) Ying Liu (USA) Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Table of Contents Identifying Regulatory Sites Using Neighborhood Species ............. 1 Claudia Angelini, Luisa Cutillo, Italia De Feis, Richard van der Wath, and Pietro Lio’ Genetic Programming and Other Machine Learning Approaches to Predict Median Oral Lethal Dose (LD ) and Plasma Protein Binding 50 Levels (%PPB) of Drugs .......................................... 11 Francesco Archetti, Stefano Lanzeni, Enza Messina, and Leonardo Vanneschi Hypothesis Testing with Classifier Systems for Rule-Based Risk Prediction ...................................................... 24 Flavio Baronti and Antonina Starita Robust Peak Detection and Alignment of nanoLC-FT Mass Spectrometry Data............................................... 35 Marius C. Codrea, Connie R. Jim´enez, Sander Piersma, Jaap Heringa, and Elena Marchiori One-Versus-One and One-Versus-All Multiclass SVM-RFE for Gene Selection in Cancer Classification .................................. 47 Kai-Bo Duan, Jagath C. Rajapakse, and Minh N. Nguyen Understanding Signal Sequences with Machine Learning .............. 57 Jean-Luc Falcone, Ren´ee Kreuter, Dominique Belin, and Bastien Chopard Targeting Differentially Co-regulated Genes by Multiobjective and Multimodal Optimization ......................................... 68 Oscar Harari, Cristina Rubio-Escudero, and Igor Zwir Modeling Genetic Networks: Comparison of Static and Dynamic Models ......................................................... 78 Cristina Rubio-Escudero, Oscar Harari, Oscar Cordo´n, and Igor Zwir A Genetic Embedded Approach for Gene Selection and Classification of Microarray Data............................................... 90 Jose Crispin Hernandez Hernandez, B´eatrice Duval, and Jin-Kao Hao Modeling the Shoot Apical Meristem in A. thaliana: Parameter Estimation for Spatial Pattern Formation ........................... 102 Tim Hohm and Eckart Zitzler Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. X Table of Contents Evolutionary Search for Improved Path Diagrams .................... 114 Kim Laurio, Thomas Svensson, Mats Jirstrand, Patric Nilsson, Jonas Gamalielsson, and Bj¨orn Olsson Simplifying Amino Acid Alphabets Using a Genetic Algorithm and Sequence Alignment.............................................. 122 Jacek Lenckowski and Krzysztof Walczak Towards Evolutionary Network Reconstruction Tools for Systems Biology......................................................... 132 Thorsten Lenser, Thomas Hinze, Bashar Ibrahim, and Peter Dittrich A Gaussian Evolutionary Method for Predicting Protein-Protein Interaction Sites ................................................. 143 Kang-Ping Liu and Jinn-Moon Yang Bio-mimetic Evolutionary Reverse Engineering of Genetic Regulatory Networks ....................................................... 155 Daniel Marbach, Claudio Mattiussi, and Dario Floreano Tuning ReliefF for Genome-Wide Genetic Analysis ................... 166 Jason H. Moore and Bill C. White Dinucleotide Step Parameterization of Pre-miRNAs Using Multi-objective Evolutionary Algorithms............................ 176 Jin-Wu Nam, In-Hee Lee, Kyu-Baek Hwang, Seong-Bae Park, and Byoung-Tak Zhang Amino Acid Features for Prediction of Protein-Protein Interface Residues with Support Vector Machines ............................ 187 Minh N. Nguyen, Jagath C. Rajapakse, and Kai-Bo Duan Predicting HIV Protease-Cleavable Peptides by Discrete Support Vector Machines ................................................. 197 Carlotta Orsenigo and Carlo Vercellis Inverse Protein Folding on 2D Off-Lattice Model: Initial Results and Perspectives..................................................... 207 David Pelta and Alberto Carrascal Virtual Error: A New Measure for Evolutionary Biclustering........... 217 Beatriz Pontes, Federico Divina, Rau´l Gira´ldez, and Jesu´s S. Aguilar–Ruiz Characterising DNA/RNA Signals with Crisp Hypermotifs: A Case Study on Core Promoters ......................................... 227 Carey Pridgeon and David Corne Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Table of Contents XI Evaluating Evolutionary Algorithms and Differential Evolution for the Online Optimization of Fermentation Processes ...................... 236 Miguel Rocha, Jos´e P. Pinto, Isabel Rocha, and Eug´enio C. Ferreira The Role of a Priori Information in the Minimization of Contact Potentials by Means of Estimation of Distribution Algorithms ......... 247 Roberto Santana, Pedro Larran˜aga, and Jose A. Lozano Classification of Cell Fates with Support Vector Machine Learning ..... 258 Ofer M. Shir, Vered Raz, Roeland W. Dirks, and Thomas Ba¨ck Reconstructing Linear Gene Regulatory Networks.................... 270 Jochen Supper, Christian Spieth, and Andreas Zell Individual-Based Modeling of Bacterial Foraging with Quorum Sensing in a Time-Varying Environment ................................... 280 W.J. Tang, Q.H. Wu, and J.R. Saunders Substitution Matrix Optimisation for Peptide Classification ........... 291 David C. Trudgian and Zheng Rong Yang Author Index.................................................. 301 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Identifying Regulatory Sites Using Neighborhood Species Claudia Angelini1, Luisa Cutillo1, Italia De Feis1, Richard van der Wath2, and Pietro Lio’2,(cid:2) 1 Istituto perle Applicazioni del Calcolo ”Mauro Picone” CNR,Napoly Italy [email protected], [email protected], [email protected] 2 Computer Laboratory, Universityof Cambridge, Cambridge UK [email protected], [email protected] Abstract. The annotation of transcription binding sites in new se- quenced genomes is an important and challenging problem. We have previously shown how a regression model that linearly relates gene ex- pression levels to the matching scores of nucleotide patterns allows us to identify DNA-binding sites from a collection of co-regulated genes and their nearby non-coding DNA sequences. Our methodology uses Bayesian models and stochastic search techniques to select transcrip- tionfactorbindingsitecandidates.Hereweshowthatthismethodology allowsustoidentifybindingsitesinnearbyspecies.Wepresentexamples ofannotationcrossingfromSchizosaccharomycespombe toSchizosaccha- romyces japonicus.Wefoundthattheeng1motifisalsoregulatingaset of9genesinS. japonicus.Ourframework mayhaveaneffectiveinterest in conveyinginformation in theannotation process ofanewspecies. Fi- nally we discuss a number of statistical and biological issues related to theidentification of bindingsites through covariates of genes expression and sequences. 1 Introduction The identification of the repertoire of regulatory elements in a genome is one of the major challenges in modern biology. Gene transcription is determined by the interaction between transcription factors and their binding sites, called mo- tifs or cis-regulatory elements. In eukaryotes the regulation of gene expression is highly complex and often occurs through the coordinated action of multiple transcription factors. This combinatorial regulation has several advantages; it controlsgeneexpressioninresponsetoavarietyofsignalsfromtheenvironment and allows the use of a limited number of transcription factors to create many combinationsofregulators.Identificationoftheregulatoryelementsisnecessary for understanding mechanisms of cellular processes. In eukaryotes these sites comprise short DNA stretches often found within non-coding upstream regions. DNA microarraysprovide a simple and naturalvehicle for exploring the regula- tion of thousands of genes and their interactions. Genes with similar expression (cid:2) Corresponding author. E.Marchiori,J.H.Moore,andJ.C.Rajapakse(Eds.):EvoBIO2007,LNCS4447,pp.1–10,2007. (cid:2)c Springer-VerlagBerlinHeidelberg2007
Description: