ebook img

Linked Open Data Alignment & Querying PDF

140 Pages·2012·2.77 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Linked Open Data Alignment & Querying

Linked Open Data Alignment & Querying A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy By PRATEEK JAIN B. Tech., DA-IICT, India, 2006 2012 Wright State University COPYRIGHTBY PrateekJain 2012 WRIGHTSTATEUNIVERSITY SCHOOLOFGRADUATESTUDIES August21,2012 I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER MY SU- PERVISION BY Prateek Jain ENTITLED Linked Open Data Alignment & Querying BE AC- CEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DoctorofPhilosophy. AmitP.Sheth,Ph.D. DissertationDirector ArthurA.Goshtasby,Ph.D. Director,ComputerSciencePh.D.Program AndrewT.Hsu,Ph.D. Dean,SchoolofGraduateStudies Committeeon FinalExamination AmitP.Sheth, Ph.D. PascalHitzler, Ph.D. KrishnaprasadThirunarayan, Ph.D. PeterZ.Yeh, Ph.D. KunalVerma, Ph.D. s ABSTRACT Jain, Prateek . Ph.D.,Department of Computer Science & Engineering, Wright State University, 2012. LinkedOpenDataAlignment&Querying. TherecentemergenceoftheLinkedDataapproachforpublishingdatarepresentsamajorstepforward inrealizingtheoriginalvisionofawebthatcan”understandandsatisfytherequestsofpeopleandmachines to use the web content” i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 295 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasetsintheLODCloud aswewillillustrate aretooshallowtorealizemuchofthebenefitspromised. If thislimitationisleftunaddressed,thentheLODCloudwillmerelybemoredatathatsuffersfromthesame kindsofproblems,whichplaguetheWebofDocuments,andhencethevisionoftheSemanticWebwillfall short. This thesis presents a comprehensive solution to address the issue of alignment and relationship iden- tification using a bootstrapping based approach. By alignment we mean the process of determining corre- spondencesbetweenclassesandpropertiesofontologies. Weidentifysubsumption,equivalenceandpart-of relationship between classes. The work identifies part-of relationship between instances. Between proper- ties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. Theworkshowcasesuseofbootstrappingbasedmethodstoidentifyandcreatericherrelationshipsbe- tweenLODdatasets. TheBLOOMSproject(http://wiki.knoesis.org/index.php/BLOOMS)andthePLATO project, bothbuiltaspartofthisresearch, haveprovidedevidencetothefeasibilityandtheapplicabilityof thesolution. Contents 1 Introduction 1 1.1 GoalsofthisDissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 ConceptualContributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Artifacts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 ChapterOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 SemanticWebandStateoftheArt 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 DomainSpecificOntology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 UpperLevelOntology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 BasicRelationshipspresentinOntologies . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 RDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.1 SPARQLQueryTypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 LinkedData 16 3.1 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 LinkedOpenData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.1 AbsenceofSchemaLevelLinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.2 LackofConceptualDescriptionofDatasets . . . . . . . . . . . . . . . . . . . . . . 22 3.4.3 Lackofexpressivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.4 Difficultieswithrespecttoquerying . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4 OntologyAlignmentforConceptsonLinkedOpenData 28 4.1 OntologyMatching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.2 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.2.1 NameMatching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.2.2 DescriptionMatching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.2.3 Constraint-basedMatching . . . . . . . . . . . . . . . . . . . . . . . . . 30 v 4.1.2.4 InstancebasedMatching . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.3 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 BLOOMSApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.1 Evaluation: OntologyAlignmentEvaluationInitiativeOrientedTrack. . . . . . . . . 39 4.3.2 Evaluation: OntologyAlignmentEvaluationInitiativeBenchmarkTrack. . . . . . . 41 4.4 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 ContextualOntologyAlignmentofLOD 44 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 KnowledgeRequirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.1 ConstructBLOOMS+Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.2 ComputeClassSimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.3 ComputeContextualSimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3.4 ComputeOverallSimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.4.1 DataSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.4.2 ExperimentalSetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.4.3 ResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.5 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 PartonomicalRelationshipIdentificationonLinkedOpenData 59 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2 Winston’sApproachtoPart-of Relationships—Ontologized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3.1 CandidateGeneration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3.2 HypothesisGeneration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.3.3 HypothesisTesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.4.1 Intra-DatasetInstance-LevelPartonomyDiscovery . . . . . . . . . . . . . . . . . . 71 6.4.2 Inter-DatasetInstance-LevelPartonomyDiscovery . . . . . . . . . . . . . . . . . . 74 6.4.3 Assertionofschemalevellinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.5 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7 QueryingPartonomicalRelationshiponLODcloud 79 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.3 Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4 PARQApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4.1 SystemArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4.1.1 MappingRepository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4.1.2 TransformationRuleGenerator . . . . . . . . . . . . . . . . . . . . . . . 83 7.4.1.3 QueryRe-writer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.4.2 Meta-levelTransformationRules . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.4.3.1 Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.5.1 GeonamesResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.5.2 AdministrativeGeographyOntologyResultsandDiscussion . . . . . . . . . . . . . 92 7.5.3 SummaryofResultsandLimitations . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.6 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8 LOQUS:LinkedOpenDataSPARQLQueryingSystem 100 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.3 OurApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.5 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.6 ConclusionandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 9 Conclusion 112 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 9.2 FurtherWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 9.2.1 RicherRelationshipIdentificationonLOD . . . . . . . . . . . . . . . . . . . . . . 114 9.2.2 YellowPagesforLOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 9.2.3 FlexibleQuestionAnsweringusingLOD . . . . . . . . . . . . . . . . . . . . . . . 115 9.2.4 PropertyMatchingonLOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 9.2.5 LODIntegrationandEnhancement . . . . . . . . . . . . . . . . . . . . . . . . . . 117 9.3 FinalRemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Bibliography 120 List of Figures 2.1 Exampleofanontology. Source: http://knoesis.org/research/semweb/projects/stt/ . . . . . . 10 2.2 ExampleofanRDFGraph. Source: http://www.w3.org/TR/rdf-primer/ . . . . . . . . . . . 13 3.1 RDFInterlinkingbetweendifferentdatasetsusing . . . . . . . . . . . . . . . . . . . . . . 18 3.2 DatasetsavailableaspartofLODinMay2007 . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 DatasetsavailableaspartofLODin2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 PossibleLODintegrationwithSUMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1 BLOOMS trees for Jazz Festival with sense Jazz Festival and for Event with sense Event. Tosavespace,somecategoriesarenotexpandedtolevel4. . . . . . . . . . . . . . . . . . 34 5.1 BLOOMS+treesforRecordLabelandMusicCompany . . . . . . . . . . . . . . . . . . . 49 6.1 PLATOSystemArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.1 PARQsystemflowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.2 PARQResultsonGeonames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.3 ComparsionPSPARQLandPARQonGeonamesforrespondent4 . . . . . . . . . . . . . . 99 7.4 ComparisonforOrdnanceSurveyDatasetforRespondent4 . . . . . . . . . . . . . . . . . . 99 8.1 LOQUSArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 viii List of Tables 3.1 SomeDatasetsthatarePartofLODCloud . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1 Resultsontheorientedmatchingtrack. ResultsforRiMOMandAROMAhavebeentaken from the OAEI 2009 website. Legends: Prec=Precision, Rec=Recall, A-API=Alignment API,OMV=OMViaUO,NaN=divisionbyzero,likelyduetoemptyalignment. . . . . . . . 39 4.2 Comparisonofvarioussystemsonthebenchmarktrack. ResultsforRiMOMandAROMA havebeenreusedfromtheOAEI2009website. Legends: Prec=Precision,Rec=Recall. . . . 41 5.1 Common nodes between the two trees in Figure 5.3.2, and their depth. The first column givesthecommonnodesbetweenthetwotreesrootedatRecordLabelandMusicIndustry. Thesecondcolumngivesthedepth(thedistancefromroot)ofthesenodesintheBLOOMS+ treerootedatRecordLabel–i.e. thesourcetree. . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 SamplemappingsofLODontologiestoPROTON. . . . . . . . . . . . . . . . . . . . . . . 53 5.3 Results for various solutions on the task of aligning LOD schemas to PROTON. Legend: S-Match-M=ResultofS-MatchMinimalSet,S-Match-C=ResultofS-MatchCompleteSet, Prec=Precision,Rec=Recall,F=F-MeasurePRO=PROTONOntology,FB=FreebaseOntol- ogy,DB=DBpediaOntology,GEO=GeonamesOntology . . . . . . . . . . . . . . . . . . . 55 5.4 SampleofcorrectmappingsfromLODontologiestoPROTONgeneratedbyBLOOMS+. . 55 5.5 SampleofincorrectmappingsfromLODontologiestoPROTONgeneratedbyBLOOMS+. 56 6.1 Sixtypeofpartonomicrelationwithrelationalelements . . . . . . . . . . . . . . . . . . . . 62 6.2 PrecisionofthesixdifferentrelationtypesbetweenDBpediaentities . . . . . . . . . . . . . 73 6.3 ThistableshowsPLATO’sperformanceonprecisionandrecallfortheDish-Ingredienttask, and PLATO’s performance on precision for the Anatomy-Organ task. Recall was not re- portedforthesecondtaskbecauseoftimeandresourcelimitations. . . . . . . . . . . . . . 75 6.4 PrecisionasmeasuredonSchemaLevelLinksBetweenDBpediaentities . . . . . . . . . . 76 7.1 ImportantPropertiesinGeonames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.2 ImportantPropertiesinAdministrativeGeographyOntology . . . . . . . . . . . . . . . . . 89 8.1 Resultexecutionofqueriesovergeonames . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.2 Resultexecutionofqueriesoverdbpedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.3 Resultexecutionofqueriesoverlinkedmdb . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.4 Resultofusersubmittedquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.5 ResultexecutionofqueriesusingLOQUS . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.6 ComaparisonLODSPARQLQueryProcessingSystems . . . . . . . . . . . . . . . . . . . 110 ix Acknowledgement IwouldliketothankmyadvisorAmitP.Shethforallhisadvice, guidanceandsupport. Ihavebeenlucky enough to work with a brilliant professor. Professor Sheth has taught me how to have a long term vision and choose important problems to solve. He has taught me the importance of grounding my research. The commonfactorbetweenthetwoofusisthedesiretopursueresearchthatwillbevaluablebothintheshort term and the long term. Over the years, I have learnt a lot from him and hope to maintain this relationship foralongtime. Iwanttoexpressmygratitudetothemembersofmydissertationcommittee-PascalHitzler,KunalVerma, PeterZ.YehandKrishnaprasadThirunarayan. Itrulyenjoyedmyinteractionwiththemandappreciatethe extremelyvaluablefeedbacktheyhaveprovidedtomeoverthistime. IwouldliketothankPascal,Peterand Kunalspecifically. Ihavebeenveryfortunatetohaveworkedwiththemonseveralwonderfulprojectsthat havecontributedimmenselytowardsthisdissertation. Theyareextremelytalentedandgreatresearchersand wonderfulcollaboratorsaswell. Theyarealsotrueandwonderfulfriends. Theytaughtmetheartofiden- tifyingthecorrectproblemandpresentingsolutions. KunalandPetertookpersonalinterestinmyresearch anditsapplicationinindustry. Iamsincerelythankfultothemforgivingmetheopportunitytobeanintern undertheirguidanceatAccentureTechnologyLabs. I want to thank Cory Henson for a wonderful time in the graduate school. I have throughly enjoyed our discussionsrelatedtoresearch,football,baseballandlifeingeneral. Themanyhelpfulcommentsandfeed- back I received from him have also been extremely valuable. He is a wonderful friend and I wish him my bestforthecareerahead. I would like to express my appreciation to members (both past and present) of the Kno.e.sis Center. In particularIwouldliketothankPavanKapanipathi,SarasiLalithsenaandAjithRanabahufortheirhelpand co-operationduringmystudies. Itwasmypleasuretohaveknownandinteractedwiththesewonderfulfolks. I want to acknowledge Tonya Davis, Valerie Smith, Jennifer Limoli, Paula Price and Wendy Chetcuti and x

Description:
The work showcases use of bootstrapping based methods to identify and create . 5 Contextual Ontology Alignment of LOD. 44 A key enabler of Semantic Web is the notion of relationships between entities [101] in an ontology other typical query language SPARQL has a grammar and format and
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.