MANAGING AND MINING GRAPH DATA MANAGING AND MINING GRAPH DATA Editedby CHARUC.AGGARWAL IBMT.J.WatsonResearchCenter,YorktownHeights,NY10598,USA HAIXUNWANG MicrosoftResearchAsia,Beijing,China100190 KluwerAcademicPublishers Boston/Dordrecht/London Contents ListofFigures xv ListofTables xxi Preface xxiii 1 AnIntroductiontoGraphData 1 CharuC.AggarwalandHaixunWang 1. Introduction 1 2. GraphManagementandMiningApplications 3 3. Summary 8 References 9 2 GraphDataManagementandMining:ASurveyofAlgorithmsandApplications 13 CharuC.AggarwalandHaixunWang 1. Introduction 13 2. GraphDataManagementAlgorithms 16 2.1 IndexingandQueryProcessingTechniques 16 2.2 ReachabilityQueries 19 2.3 GraphMatching 21 2.4 KeywordSearch 24 2.5 SynopsisConstructionofMassiveGraphs 27 3. GraphMiningAlgorithms 29 3.1 PatternMininginGraphs 29 3.2 ClusteringAlgorithmsforGraphData 32 3.3 ClassificationAlgorithmsforGraphData 37 3.4 TheDynamicsofTime-EvolvingGraphs 40 4. GraphApplications 43 4.1 ChemicalandBiologicalApplications 43 4.2 WebApplications 45 4.3 SoftwareBugLocalization 51 5. ConclusionsandFutureResearch 55 References 55 3 GraphMining: LawsandGenerators 69 DeepayanChakrabarti,ChristosFaloutsosandMaryMcGlohon 1. Introduction 70 2. GraphPatterns 71 vi MANAGINGANDMININGGRAPHDATA 2.1 PowerLawsandHeavy-TailedDistributions 72 2.2 SmallDiameters 77 2.3 OtherStaticGraphPatterns 79 2.4 PatternsinEvolvingGraphs 82 2.5 TheStructureofSpecificGraphs 84 3. GraphGenerators 86 3.1 RandomGraphModels 88 3.2 PreferentialAttachmentandVariants 92 3.3 Optimization-basedgenerators 101 3.4 Tensor-based 108 3.5 Generatorsforspecificgraphs 113 3.6 GraphGenerators: Asummary 115 4. Conclusions 115 References 117 4 QueryLanguageandAccessMethodsforGraphDatabases 125 HuahaiHeandAmbujK.Singh 1. Introduction 126 1.1 Graphs-at-a-timeQueries 126 1.2 GraphSpecificOptimizations 127 1.3 GraphQL 128 2. OperationsonGraphStructures 129 2.1 Concatenation 130 2.2 Disjunction 131 2.3 Repetition 131 3. GraphQueryLanguage 132 3.1 DataModel 132 3.2 GraphPatterns 133 3.3 GraphAlgebra 134 3.4 FLWRExpressions 137 3.5 ExpressivePower 138 4. ImplementationoftheSelectionOperator 140 4.1 GraphPatternMatching 140 4.2 LocalPruningandRetrievalofFeasibleMates 142 4.3 JointReductionofSearchSpace 144 4.4 OptimizationofSearchOrder 146 5. ExperimentalStudy 148 5.1 BiologicalNetwork 148 5.2 SyntheticGraphs 150 6. RelatedWork 152 6.1 GraphQueryLanguages 152 6.2 GraphIndexing 155 7. FutureResearchDirections 155 8. Conclusion 156 Appendix: QuerySyntaxofGraphQL 156 References 157 5 GraphIndexing 161 XifengYanandJiaweiHan 1. Introduction 161 Contents vii 2. Feature-BasedGraphIndex 162 2.1 Paths 163 2.2 FrequentStructures 164 2.3 DiscriminativeStructures 166 2.4 ClosedFrequentStructures 167 2.5 Trees 167 2.6 HierarchicalIndexing 168 3. StructureSimilaritySearch 169 3.1 Feature-BasedStructuralFiltering 170 3.2 FeatureMissEstimation 171 3.3 FrequencyDifference 172 3.4 FeatureSetSelection 173 3.5 StructureswithGaps 174 4. ReverseSubstructureSearch 175 5. Conclusions 177 References 178 6 GraphReachabilityQueries: ASurvey 181 JeffreyXuYuandJiefengCheng 1. Introduction 181 2. TraversalApproaches 186 2.1 Tree+SSPI 187 2.2 GRIPP 187 3. Dual-Labeling 188 4. TreeCover 190 5. ChainCover 191 5.1 ComputingtheOptimalChainCover 193 6. Path-TreeCover 194 7. 2-HOPCover 196 7.1 AHeuristicRanking 197 7.2 AGeometrical-BasedApproach 198 7.3 GraphPartitioningApproaches 199 7.4 2-HopCoverMaintenance 202 8. 3-HopCover 204 9. Distance-Aware2-HopCover 205 10. GraphPatternMatching 207 10.1 ASpecialCase: A֒ D 208 10.2 TheGeneralCases→ 211 11. ConclusionsandSummary 212 References 212 7 ExactandInexactGraphMatching: MethodologyandApplications 217 KasparRiesen,XiaoyiJiangandHorstBunke 1. Introduction 218 2. BasicNotations 219 3. ExactGraphMatching 221 4. InexactGraphMatching 226 4.1 GraphEditDistance 227 4.2 OtherInexactGraphMatchingTechniques 229 5. GraphMatchingforDataMiningandInformationRetrieval 231 viii MANAGINGANDMININGGRAPHDATA 6. VectorSpaceEmbeddingsofGraphsviaGraphMatching 235 7. Conclusions 239 References 240 8 ASurveyofAlgorithmsforKeywordSearchonGraphData 249 HaixunWangandCharuC.Aggarwal 1. Introduction 250 2. KeywordSearchonXMLData 252 2.1 QuerySemantics 253 2.2 AnswerRanking 254 2.3 AlgorithmsforLCA-basedKeywordSearch 258 3. KeywordSearchonRelationalData 260 3.1 QuerySemantics 260 3.2 DBXplorerandDISCOVER 261 4. KeywordSearchonSchema-FreeGraphs 263 4.1 QuerySemanticsandAnswerRanking 263 4.2 GraphExplorationbyBackwardSearch 265 4.3 GraphExplorationbyBidirectionalSearch 266 4.4 Index-basedGraphExploration–theBLINKSAlgorithm 267 4.5 TheObjectRankAlgorithm 269 5. ConclusionsandFutureResearch 271 References 271 9 ASurveyofClusteringAlgorithmsforGraphData 275 CharuC.AggarwalandHaixunWang 1. Introduction 275 2. NodeClusteringAlgorithms 277 2.1 TheMinimumCutProblem 277 2.2 Multi-wayGraphPartitioning 281 2.3 ConventionalGeneralizationsandNetworkStructureIndices 282 2.4 TheGirvan-NewmanAlgorithm 284 2.5 TheSpectralClusteringMethod 285 2.6 DeterminingQuasi-Cliques 288 2.7 TheCaseofMassiveGraphs 289 3. ClusteringGraphsasObjects 291 3.1 ExtendingClassicalAlgorithmstoStructuralData 291 3.2 TheXProjApproach 293 4. ApplicationsofGraphClusteringAlgorithms 295 4.1 CommunityDetectioninWebApplicationsandSocialNet- works 296 4.2 TelecommunicationNetworks 297 4.3 EmailAnalysis 297 5. ConclusionsandFutureResearch 297 References 299 10 ASurveyofAlgorithmsforDenseSubgraphDiscovery 303 VictorE.Lee,NingRuan,RuomingJinandCharuAggarwal 1. Introduction 304 Contents ix 2. TypesofDenseComponents 305 2.1 Absolutevs. RelativeDensity 305 2.2 GraphTerminology 306 2.3 DefinitionsofDenseComponents 307 2.4 DenseComponentSelection 308 2.5 RelationshipbetweenClustersandDenseComponents 309 3. AlgorithmsforDetectingDenseComponentsinaSingleGraph 311 3.1 ExactEnumerationApproach 311 3.2 HeuristicApproach 314 3.3 ExactandApproximationAlgorithmsforDiscoveringDens- estComponents 322 4. FrequentDenseComponents 327 4.1 FrequentPatternswithDensityConstraints 327 4.2 DenseComponentswithFrequencyConstraint 328 4.3 EnumeratingCross-GraphQuasi-Cliques 328 5. ApplicationsofDenseComponentAnalysis 329 6. ConclusionsandFutureResearch 331 References 333 11 GraphClassification 337 KojiTsudaandHirotoSaigo 1. Introduction 337 2. GraphKernels 340 2.1 RandomWalksonGraphs 341 2.2 LabelSequenceKernel 342 2.3 EfficientComputationofLabelSequenceKernels 343 2.4 Extensions 349 3. GraphBoosting 349 3.1 FormulationofGraphBoosting 351 3.2 OptimalPatternSearch 353 3.3 ComputationalExperiments 354 3.4 RelatedWork 355 4. ApplicationsofGraphClassification 358 5. LabelPropagation 358 6. ConcludingRemarks 359 References 359 12 MiningGraphPatterns 365 HongCheng,XifengYanandJiaweiHan 1. Introduction 366 2. FrequentSubgraphMining 366 2.1 ProblemDefinition 366 2.2 Apriori-basedApproach 367 2.3 Pattern-GrowthApproach 368 2.4 ClosedandMaximalSubgraphs 369 2.5 MiningSubgraphsinaSingleGraph 370 2.6 TheComputationalBottleneck 371 3. MiningSignificantGraphPatterns 372 3.1 ProblemDefinition 372 3.2 gboost: ABranch-and-BoundApproach 373 x MANAGINGANDMININGGRAPHDATA 3.3 gPLS:APartialLeastSquaresRegressionApproach 375 3.4 LEAP:AStructuralLeapSearchApproach 378 3.5 GraphSig: AFeatureRepresentationApproach 382 4. MiningRepresentativeOrthogonalGraphs 385 4.1 ProblemDefinition 386 4.2 RandomizedMaximalSubgraphMining 387 4.3 OrthogonalRepresentativeSetGeneration 389 5. Conclusions 389 References 389 13 ASurveyonStreamingAlgorithmsforMassiveGraphs 393 JianZhang 1. Introduction 393 2. StreamingModelforMassiveGraphs 395 3. StatisticsandCountingTriangles 397 4. GraphMatching 400 4.1 UnweightedMatching 400 4.2 WeightedMatching 403 5. GraphDistance 405 5.1 DistanceApproximationusingMultiplePasses 406 5.2 DistanceApproximationinOnePass 411 6. RandomWalksonGraphs 412 7. Conclusions 416 References 417 14 ASurveyofPrivacy-PreservationofGraphsandSocialNetworks 421 XintaoWu,XiaoweiYing,KunLiuandLeiChen 1. Introduction 422 1.1 PrivacyinPublishingSocialNetworks 422 1.2 BackgroundKnowledge 423 1.3 UtilityPreservation 424 1.4 AnonymizationApproaches 424 1.5 Notations 425 2. PrivacyAttacksonNaiveAnonymizedNetworks 426 2.1 ActiveAttacksandPassiveAttacks 426 2.2 StructuralQueries 427 2.3 OtherAttacks 428 3. K-AnonymityPrivacyPreservationviaEdgeModification 428 3.1 K-DegreeGeneralization 429 3.2 K-NeighborhoodAnonymity 430 3.3 K-AutomorphismAnonymity 431 4. PrivacyPreservationviaRandomization 433 4.1 ResiliencetoStructuralAttacks 434 4.2 LinkDisclosureAnalysis 435 4.3 Reconstruction 437 4.4 FeaturePreservingRandomization 438 5. PrivacyPreservationviaGeneralization 440 6. AnonymizingRichGraphs 441 Contents xi 6.1 LinkProtectioninRichGraphs 442 6.2 AnonymizingBipartiteGraphs 443 6.3 AnonymizingRichInteractionGraphs 444 6.4 AnonymizingEdge-WeightedGraphs 445 7. OtherPrivacyIssuesinOnlineSocialNetworks 446 7.1 DerivingLinkStructureoftheEntireNetwork 446 7.2 DerivingPersonalIdentifyingInformationfromSocialNet- workingSites 448 8. ConclusionandFutureWork 448 Acknowledgments 449 References 449 15 ASurveyofGraphMiningforWebApplications 455 DeboraDonatoandAristidesGionis 1. Introduction 456 2. Preliminaries 457 2.1 LinkAnalysisRankingAlgorithms 459 3. MiningHigh-QualityItems 461 3.1 PredictionofSuccessfulItemsinaCo-citationNetwork 463 3.2 Finding High-Quality Content in Question-Answering Por- tals 465 4. MiningQueryLogs 469 4.1 DescriptionofQueryLogs 470 4.2 QueryLogGraphs 470 4.3 QueryRecommendations 477 5. Conclusions 480 References 481 16 GraphMiningApplicationstoSocialNetworkAnalysis 487 LeiTangandHuanLiu 1. Introduction 487 2. GraphPatternsinLarge-ScaleNetworks 489 2.1 Scale-FreeNetworks 489 2.2 Small-WorldEffect 491 2.3 CommunityStructures 492 2.4 GraphGenerators 494 3. CommunityDetection 494 3.1 Node-CentricCommunityDetection 495 3.2 Group-CentricCommunityDetection 498 3.3 Network-CentricCommunityDetection 499 3.4 Hierarchy-CentricCommunityDetection 504 4. CommunityStructureEvaluation 505 5. ResearchIssues 507 References 508 17 Software-BugLocalizationwithGraphMining 515 FrankEichingerandKlemensBo-hm 1. Introduction 516 2. BasicsofCallGraphBasedBugLocalization 517
Description: