ebook img

Knowledge discovery in bioinformatics: techniques, methods, and applications PDF

404 Pages·2007·8.849 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Knowledge discovery in bioinformatics: techniques, methods, and applications

KNOWLEDGE DISCOVERY IN BIOINFORMATICS KNOWLEDGE DISCOVERY IN BIOINFORMATICS Techniques, Methods, and Applications Editedby XIAOHUA HU Drexel University, Philadelphia, Pennsylvania YI PAN Georgia State University, Atlanta, Georgia Copyright(cid:2)2007byJohnWiley&Sons,Inc.Allrightsreserved. PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey. PublishedsimultaneouslyinCanada. Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyform orbyanymeans,electronic,mechanical,photocopying,recording,scanning,orotherwise,exceptas permittedunderSection107or108ofthe1976UnitedStatesCopyrightAct,withouteithertheprior writtenpermissionofthePublisher,orauthorizationthroughpaymentoftheappropriateper-copyfeetothe CopyrightClearanceCenter,Inc.,222RosewoodDrive,Danvers,MA01923,(978)750-8400,fax(978) 750-4470,oronthewebatwww.copyright.com.RequeststothePublisherforpermissionshouldbe addressedtothePermissionsDepartment,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030, (201)748-6011,fax(201)748-6008,oronlineathttp://www.wiley.com/go/permission. LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbestefforts inpreparingthisbook,theymakenorepresentationsorwarrantieswithrespecttotheaccuracyor completenessofthecontentsofthisbookandspecificallydisclaimanyimpliedwarrantiesof merchantabilityorfitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysales representativesorwrittensalesmaterials.Theadviceandstrategiescontainedhereinmaynotbesuitable foryoursituation.Youshouldconsultwithaprofessionalwhereappropriate.Neitherthepublishernor authorshallbeliableforanylossofprofitoranyothercommercialdamages,includingbutnotlimitedto special,incidental,consequential,orotherdamages. Forgeneralinformationonourotherproductsandservicesorfortechnicalsupport,pleasecontactour CustomerCareDepartmentwithintheUnitedStatesat(800)762-2974,outsidetheUnitedStatesat (317)572-3993orfax(317)572-4002. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprintmay notbeavailableinelectronicformats.FormoreinformationaboutWileyproducts,visitourwebsiteat www.wiley.com. WileyBicentennialLogo:RichardJ.Pacifico LibraryofCongressCataloging-in-PublicationData: Knowledgediscoveryinbioinformatics:techniques,methods,andapplications /editedbyXiaohuaHu,YiPan. p.cm. ISBN978-0-471-77796-0 1. Bioinformatics. 2. Computationalbiology. I. Hu,Xiaohua(XiaohuaTony) II. Pan,Yi,1960– [DNLM: 1. ComputationalBiology–methods. 2. MedicalInformatics–methods. QU26.5K732007] QH506.K55642007 5700.285–dc22 2006032495 PrintedintheUnitedStatesofAmerica 10987654321 CONTENTS Contributors xiii Preface xvii 1 CurrentMethodsforProteinSecondary-StructurePrediction BasedonSupportVectorMachines 1 Hae-JinHu,Robert W.Harrison, PhangC. Tai, andYiPan 1.1 TraditionalMethods 2 1.1.1 StatisticalApproaches 2 1.1.2 MachineLearningApproaches 2 1.2 SupportVectorMachineMethod 8 1.2.1 IntroductiontoSVM 8 1.2.2 EncodingProfile 10 1.2.3 KernelFunctions 11 1.2.4 TertiaryClassifierDesign 15 1.2.5 AccuracyMeasureofSVM 20 1.3 PerformanceComparisonofSVMMethods 22 1.4 DiscussionandConclusions 23 References 23 2 ComparisonofSevenMethodsforMiningHiddenLinks 27 Xiaohua Hu,XiaodanZhang,andXiaohua Zhou 2.1 AnalysisoftheLiteratureonRaynaud’sDisease 27 2.2 RelatedWork 29 v vi CONTENTS 2.3 Methods 30 2.3.1 InformationMeasures 31 2.3.2 RankingMethods 31 2.3.3 SevenMethods 32 2.4 ExperimentResultsandAnalysis 37 2.4.1 DataSet 37 2.4.2 Chi-Square,Chi-SquareAssociationRule,andMutual InformationLinkABCMethodsCompared 38 2.4.3 Chi-SquareABCMethod:SemanticCheckforMining ImplicitConnections 38 2.4.4 Chi-SquareandMutualInformationLink ABCMethods 40 2.5 DiscussionandConclusions 43 Acknowledgments 43 References 44 3 VotingScheme–BasedEvolutionaryKernelMachines forDrugActivityComparisons 45 BoJinandYan-QingZhang 3.1 GranularKernelandKernelTreeDesign 46 3.1.1 Definitions 46 3.1.2 GranularKernelProperties 47 3.2 GKTSESs 48 3.3 EvolutionaryVotingKernelMachines 51 3.4 Simulations 53 3.4.1 DataSetandExperimentalSetup 53 3.4.2 ExperimentalResultsandComparisons 53 3.5 ConclusionsandFutureWork 54 Acknowledgments 55 References 55 4 BioinformaticsAnalysesofArabidopsisthaliana TilingArrayExpressionData 57 TruptiJoshi, JinrongWan,Curtis J.Palm,KaraJuneau, RonDavis, AudreySouthwick,KatrinaM.Ramonell, Gary Stacey,andDong Xu 4.1 TilingArrayDesignandDataDescription 58 4.1.1 Data 58 4.1.2 TilingArrayExpressionPatterns 59 4.1.3 TilingArrayDataAnalysis 59 4.2 OntologyAnalyses 61 4.3 AntisenseRegulationIdentification 63 4.3.1 AntisenseSilencing 63 4.3.2 AntisenseRegulationIdentification 63 4.4 CorrelatedExpressionBetweenTwoDNAStrands 67 CONTENTS vii 4.5 IdentificationofNonproteinCodingmRNA 68 4.6 Summary 69 Acknowledgments 69 References 70 5 IdentificationofMarkerGenesfromHigh-Dimensional MicroarrayDataforCancerClassification 71 JiexunLi,Hua Su,andHsinchunChen 5.1 FeatureSelection 73 5.1.1 TaxonomyofFeatureSelection 73 5.1.2 EvaluationCriterion 73 5.1.3 GenerationProcedure 76 5.2 GeneSelection 78 5.2.1 IndividualGeneRanking 78 5.2.2 GeneSubsetSelection 79 5.2.3 SummaryofGeneSelection 82 5.3 ComparativeStudyofGeneSelectionMethods 83 5.3.1 MicroarrayDataDescriptions 83 5.3.2 GeneSelectionApproaches 83 5.3.3 ExperimentalResults 84 5.4 ConclusionsandDiscussion 85 Acknowledgments 85 References 85 6 PatientSurvivalPredictionfromGeneExpressionData 89 Huiqing Liu,LimsoonWong,andYingXu 6.1 GeneralMethods 91 6.1.1 Kaplan–MeierSurvivalAnalysis 91 6.1.2 CoxProportional-HazardsRegression 93 6.2 Applications 95 6.2.1 DiffuseLarge-B-CellLymphoma 95 6.2.2 LungAdenocarcinoma 97 6.2.3 Remarks 98 6.3 IncorporatingDataMiningTechniquestoSurvivalPrediction 98 6.3.1 GeneSelectionbyStatisticalProperties 99 6.3.2 CancerSubtypeIdentificationviaSurvival Information 100 6.4 SelectionofExtremePatientSamples 103 6.4.1 Short-andLong-TermSurvivors 103 6.4.2 SVM-BasedRiskScoringFunction 103 6.4.3 Results 104 6.5 SummaryandConcludingRemarks 108 Acknowledgments 109 References 109 viii CONTENTS 7 RNAInterferenceandmicroRNA 113 ShibinQiu andTerranLane 7.1 MechanismsandApplicationsofRNAInterference 114 7.1.1 MechanismofRNAInterference 114 7.1.2 ApplicationsofRNAi 117 7.1.3 RNAiComputationalandModelingIssues 120 7.2 SpecificityofRNAInterference 121 7.2.1 ComputationalRepresentationofRNAi 121 7.2.2 DefinitionofOff-TargetErrorRates 122 7.2.3 FeatureMapsofMismatch,Bulge,andWobble 124 7.2.4 PositionalEffect 125 7.2.5 ResultsforRNAiSpecificity 125 7.2.6 SilencingMultipleGenes 128 7.3 ComputationalMethodsformicroRNAs 129 7.3.1 PredictionofmicroRNAGenes 130 7.3.2 PredictionofmiRNATargets 131 7.4 siRNASilencingEfficacy 132 7.4.1 siRNADesignRules 132 7.4.2 EfficacyPredictionwithSupportVectorRegression 134 7.5 SummaryandOpenQuestions 136 7.5.1 siRNAEfficacyandTargetmRNASecondaryStructures 137 7.5.2 DynamicsofTargetmRNAandsiRNA 137 7.5.3 IntegrationofRNAiintoNetworkModels 137 Appendix:Glossary 138 References 140 8 ProteinStructurePredictionUsingStringKernels 145 Huzefa Rangwala, KevinDeRonne,andGeorge Karypis 8.1 ProteinStructure:Granularities 146 8.1.1 Secondary-StructurePrediction 146 8.1.2 ProteinTertiaryStructure 148 8.2 LearningfromData 149 8.2.1 KernelMethods 150 8.3 StructurePrediction:CapturingtheRightSignals 150 8.4 Secondary-StructurePrediction 151 8.4.1 YASSPPOverview 152 8.4.2 InputSequenceCoding 153 8.4.3 Profile-BasedKernelFunctions 154 8.4.4 PerformanceEvaluation 154 8.5 RemoteHomologyandFoldPrediction 157 8.5.1 Profile-BasedKernelFunctions 158 8.5.2 PerformanceEvaluation 161 8.6 ConcludingRemarks 165 References 165 CONTENTS ix 9 PublicGenomicDatabases:DataRepresentation, Storage,andAccess 169 AndrewRobinson,WennyRahayu, andDavid Taniar 9.1 DataRepresentation 170 9.1.1 FASTAFormat 170 9.1.2 GenbankFormat 171 9.1.3 Swiss-ProtFormat 172 9.1.4 XMLFormat 176 9.2 DataStorage 180 9.2.1 MultidatabaseRepositories 180 9.3 DataAccess 183 9.3.1 Single-DatabaseAccessPoint 183 9.3.2 Cross-ReferenceDatabases 186 9.3.3 Multiple-DatabaseAccessPoints 186 9.3.4 Tool-BasedInterfaces 192 9.4 Discussion 194 9.5 Conclusions 194 References 194 10 AutomaticQueryExpansionwithKeyphrasesandPOS PhraseCategorizationforEffectiveBiomedical TextMining 197 MinSongandIl-YeolSong 10.1 KeyphraseExtraction-BasedPseudo-RelevanceFeedback 198 10.1.1 KeyphraseExtractionProcedures 199 10.1.2 KeyphraseRanking 200 10.1.3 QueryTranslationintoDNF 202 10.2 QueryExpansionwithWordNet 203 10.3 ExperimentsonMedlineDataSets 203 10.4 Conclusions 205 References 206 11 EvolutionaryDynamicsofProtein–ProteinInteractions 209 L. S.Swapna,B. Offmann, andN.Srinivasan 11.1 ClassIGlutamineAmidotransferase–LikeSuperfamily 211 11.1.1 DJ-1/PfpIFamily 213 11.1.2 ComparisonofQuaternaryStructuresofDJ-1 FamilyMembers 214 11.2 DriftsinInterfacesofCloseHomologs 214 11.2.1 ComparisonofQuaternaryStructuresofIntracellular ProteaseandHypotheticalProteinYhbO 216 11.2.2 ComparisonofQuaternaryStructuresofIntracellular ProteaseandDJ-1 218

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.