Advances in Biochemical Engineering/Biotechnology 160 Series Editor: T. Scheper Intawat Nookaew Editor Network Biology 160 Advances in Biochemical Engineering/Biotechnology Serieseditor T.Scheper,Hannover,Germany EditorialBoard S.Belkin,Jerusalem,Israel T.Bley,Dresden,Germany J.Bohlmann,Vancouver,Canada M.B.Gu,Seoul,RepublicofKorea W.-S.Hu,Minneapolis,MN,USA B.Mattiasson,Lund,Sweden J.Nielsen,Gothenburg,Sweden H.Seitz,Potsdam,Germany R.Ulber,Kaiserslautern,Germany A.-P.Zeng,Hamburg,Germany J.-J.Zhong,Shanghai,China W.Zhou,Shanghai,China Aims and Scope Thisbookseriesreviewscurrenttrendsinmodernbiotechnologyandbiochemical engineering. Its aim is to cover all aspects of these interdisciplinary disciplines, whereknowledge,methodsandexpertisearerequiredfromchemistry,biochemis- try,microbiology,molecularbiology,chemicalengineeringandcomputerscience. Volumesareorganizedtopicallyandprovideacomprehensivediscussionofdevel- opments in the field over the past 3–5 years. The series also discusses new discoveries and applications. Special volumes are dedicated to selected topics whichfocusonnewbiotechnologicalproductsandnewprocessesfortheirsynthe- sisandpurification. Ingeneral, volumes are edited by well-knownguest editors. Theseries editor and publisherwill,however,alwaysbepleasedtoreceivesuggestionsandsupplemen- taryinformation.ManuscriptsareacceptedinEnglish. Inreferences,AdvancesinBiochemicalEngineering/Biotechnologyisabbreviated asAdv.Biochem.Engin./Biotechnol.andcitedasajournal. Moreinformationaboutthisseriesathttp://www.springer.com/series/10 Intawat Nookaew Editor Network Biology With contributions by (cid:1) (cid:1) (cid:1) (cid:1) P. Ajawatanawong Y. Akiyama S. Dahal G. Hu (cid:1) (cid:1) (cid:1) D. Jacobson S. Kalapanulak A. Klanchui (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) K. Kusonmano Y. Li Y. Li Y. Matsuzaki A. Meechai (cid:1) (cid:1) (cid:1) (cid:1) M. Ohue G. Pavesi S. Poudel P. Prommeenate (cid:1) (cid:1) (cid:1) N. Raethong T. Saithong C. Thammarongtham (cid:1) (cid:1) (cid:1) R.A. Thompson N. Uchikoga W. Vongsangnak (cid:1) D.A. Weighill F. Xiao Editor IntawatNookaew DepartmentofBiomedicalInformatics CollegeofMedicine UniversityofArkansasforMedicalScience LittleRock,Arkansas,USA ComputationalBiomolecularModelingandBioinformaticsGroup ComputerScienceandMathematicsDivision OakRidgeNationalLaboratory OakRidge,Tennessee,USA ISSN0724-6145 ISSN1616-8542 (electronic) AdvancesinBiochemicalEngineering/Biotechnology ISBN978-3-319-56459-3 ISBN978-3-319-56460-9 (eBook) DOI10.1007/978-3-319-56460-9 LibraryofCongressControlNumber:2017938643 ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthis book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained hereinor for anyerrors oromissionsthat may havebeenmade. Thepublisher remainsneutralwith regardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Biologicalsystemsareextremelycomplexandcontainmillionsofmoleculeswithin the system. The rapid development of high throughput technologies enables us to capturethemolecularinterplaysofmoleculesinthesystemasso-called‘OMICS’ data. This leads to the need for systematic cataloguing and organization of the enormous amount of data generated and shared within the scientific community. Linking these molecules and evaluating their interactions following “Network Biology” approaches enable the insightful understanding of cellular functions from the emerging properties of the network. This special volume focuses on the stateoftheart,currentstatus,andapplicationsofNetworkBiology. The volume covers broad topics on network biology such as gene networks, transcription networks, regulatory networks, protein–proteininteraction networks, metabolic networks, and phylogenetic networks. I am very grateful to the authors who have contributed to this special volume by sharing their experience and expertise in the different chapters. These diverse topics should be very useful for readerstogainanoverviewofNetworkBiology. OakRidge,TN,USA IntawatNookaew v Contents ChIP-SeqDataAnalysistoDefineTranscriptionalRegulatory Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 GiulioPavesi GeneExpressionAnalysisThroughNetworkBiology:Bioinformatics Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 KanthidaKusonmano Rigid-DockingApproachestoExploreProtein–ProteinInteraction Space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 YuriMatsuzaki,NobuyukiUchikoga,MasahitoOhue,andYutakaAkiyama Protein–ProteinInterfaceandDisease:PerspectivefromBiomolecular Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 GuangHu,FeiXiao,YuqianLi,YuanLi,andWanwipaVongsangnak CyanobacterialBiofuels:StrategiesandDevelopmentsonNetwork andModeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 AmornpanKlanchui,NachonRaethong,PeeradaPrommeenate, WanwipaVongsangnak,andAsawinMeechai Genome-ScaleModelingofThermophilicMicroorganisms. . . . . . . . . . 103 SanjeevDahal,SureshPoudel,andR.AdamThompson NetworkingOmicDatatoEnvisageSystemsBiologicalRegulation. . . . 121 SaowalakKalapanulak,TreenutSaithong,andChinaeThammarongtham vii viii Contents NetworkMetamodeling:EffectofCorrelationMetricChoiceon PhylogenomicandTranscriptomicNetworkTopology. . . . . . . . . . . . . . 143 DeborahA.WeighillandDanielJacobson MolecularPhylogenetics:ConceptsforaNewcomer. . . . . . . . . . . . . . . 185 PravechAjawatanawong Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 AdvBiochemEngBiotechnol(2017)160:1–14 DOI:10.1007/10_2016_43 ©SpringerInternationalPublishingSwitzerland2016 Publishedonline:10January2017 ChIP-Seq Data Analysis to Define Transcriptional Regulatory Networks GiulioPavesi Abstract Thefirststepinthedefinitionoftranscriptionalregulatorynetworksisto establish correct relationships between transcription factors (TFs) and their target genes, together with the effect of their regulatory activity (activator or repressor). Fundamental advances in this direction have been made possible by the introduc- tion of experimental techniques such as Chromatin Immunoprecipitation, which, coupled with next-generation sequencing technologies (ChIP-Seq), permit the genome-wide identification of TF binding sites. This chapter provides a survey on how data of this kind are to be processed and integrated with expression and othertypesofdatatoinfertranscriptionalregulatoryrulesandcodes. Keywords ChIP-Seq,RNA-Seq,Transcriptionfactors,Transcriptionregulation Contents 1 Introduction:ChromatinImmunoprecipitationandNext-GenerationSequencing........ 2 2 FindingTranscriptionFactorBindingSites................................................. 6 3 AssociatingBindingSiteswithTargetGenes............................................... 7 4 AssessingTFActivityfromExpressionData............................................... 8 5 MiningAvailableData....................................................................... 10 6 Conclusions................................................................................... 10 References........................................................................................ 11 G.Pavesi(*) DepartmentofBiosciences,UniversityofMilan,ViaCeloria26,20133Milan,Italy e-mail:[email protected] 2 G.Pavesi 1 Introduction: Chromatin Immunoprecipitation and Next-Generation Sequencing Theintroductionofnext-generationsequencing(NGS)technologieshasopenedup newavenuesforeverytypeofgeneticandgenomicresearch[1,2].Oneofthefields in which the impact of NGS has been more relevant is perhaps the study of gene regulationatthetranscriptionallevel,andthesubsequentanalysisstepssuchasthe constructionofregulatorynetworks. Itisessentialforthedefinitionoftranscriptionregulatorynetworkstoestablish correctrelationshipsbetweenregulatorssuchastranscriptionfactors(TFs)andthe genestheyregulate[3],togetherwiththeeffectoftheactivityoftheTFs(activator or repressor) [4]. A fundamental step forward in this direction has been made possible by lab techniques enabling the large-scale identification of TF-DNA binding sites on the genome, with experiments simply impossible to perform just afewyearsago. Chromatin is a complex of DNA and proteins that forms chromosomes within the nucleus of eukaryotic cells. Chromatin Immunoprecipitation (ChIP) [5] is a techniqueenablingtheextractionfromthecell nucleus ofaspecific protein-DNA chromatin complex, including DNA binding proteins such as TFs. The different stepsofaChIPexperimentaresummarizedinFig.1.Firstofall,theDNA-bound proteinsare cross-linked, thatis, fixed tothe DNA. Thecross-linkedchromatinis usuallyshearedbysonication,providingfragmentsof300–1,000basepairs(bps)in length.Thenaspecificantibodythatrecognizesonlytheprotein(TF)ofinterestis employed,and the antibody, boundtothe TF which inturn is boundtothe DNA, permits the selective extraction and isolation of the chromatin complex. At this point, DNA is released from the TF by reverse-crosslinking and purified, and the resultisaDNAsampleenrichedinregionscorrespondingtothegenomiclocations of the sites that were bound in vivo by the TF (or, in general, the DNA-binding protein) studied. The experiment is performed on thousands of cells at the same time so as to have a quantity of DNA suitable for further analysis and to have enough “enrichment” in the sample, that is, enough copies of each of the DNA regionsboundbytheTF,todiscriminatethemfromexperimentalnoise. The next phase is quite logically the identification of the DNA regions them- selves – and of their corresponding location in the genome. The introduction of “tilingarrays”hadpermittedforthefirsttimetheanalysisoftheDNAextractedon awhole-genomescale(ChIPonChip[4,6])byusingprobesdesignedtocoverthe sequence of a whole genome, or a subset of genomic regions of interest (such as withpromoterarrays).TheintroductionofNGStechnologieshasenabledthistype ofexperimenttomoveonestepfurtherbyprovidingatreasonablecostperhapsthe simplest solution: to identify the DNA extracted by the cell by immunoprecipita- tion,sequencetheDNAitself(ChIPSequencing,orChIP-Seq[5,7]). Without delving into technical details, given a double-stranded DNA fragment derivedasjustdescribed,sequencingdeterminesthenucleotidesequenceoneither strand,movingfromthe50 to30 direction,orbothstrandssimultaneously (paired-