ebook img

The semantic web in translational medicine: current applications PDF

15 Pages·2013·1.09 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The semantic web in translational medicine: current applications

BRIEFINGS IN BIOINFORMATICS. VOL16.NO1. 89^103 doi:10.1093/bib/bbt079 Advance Access published on 6 November 2013 The semantic web in translational medicine: current applications and future directions CatiaM.Machado*,DietrichRebholz-Schuhmann, AnaT.FreitasandFranciscoM.Couto Submitted:22ndJuly2013;Received(inrevisedform):8thOctober2013 Abstract D o Semantic web technologies offer an approach to data integration and sharing, even for resources developedinde- w n pendently or broadly distributed across the web.This approach is particularly suitable for scientific domains that loa d profit from large amounts of data thatreside in the public domain and that have to be exploitedin combination. ed Translational medicine is such a domain, which in addition has to integrate private data from the clinical domain fro m with proprietary data from the pharmaceutical domain. In this survey, we present the results of our analysis of h translational medicine solutions that follow a semantic web approach.We assessed these solutions in terms of ttps their targetmedicalusecase;theresourcescoveredto achieve theirobjectives;andtheiruseofexisting semantic ://a c a webresources for thepurposes of data sharing,datainteroperabilityandknowledge discovery.The semantic web d e m technologies seem to fulfill theirroleinfacilitating theintegration andexploration ofdata fromdisparate sources, ic butitis also clear that simplyusing themis notenough.Itis fundamental to reuseresources, to definemappings .ou p betweenresources,tosharedataandknowledge.Alltheseaspectsallowtheinstantiationof translationalmedicine .c o m atthesemantic web-scale,thusresultinginanetworkof solutions thatcanshareresourcesforafaster transferof /b new scientificresultsinto the clinicalpractice.The envisionednetworkof translationalmedicine solutionsis onits ib/a way,butitstillrequiresresolving thechallengesofsharingprotecteddataandofintegrating semantic-driventech- rtic le nologiesinto theclinicalpractice. -a b s Keywords:semanticweb;translationalmedicine;dataintegration;datasharing;datainteroperability;knowledgediscovery tra c t/1 6 /1 INTRODUCTION translational medicine, where multiple types of data /89 /2 Biomedicalresearchhasevolvedintoadata-intensive are involved, often from different sources and in dif- 4 0 2 science, where prodigious amounts of data can be ferent formats, data integration and interoperability 3 6 collected from disparate resources at any time [1]. are key requirements for an efficient data analysis. by g However, the value of data can only be leveraged Translational medicine focuses on the improve- u e s through its analysis, which ultimately results in the ment of human health by bridging the gap between t o n acquisition of knowledge. In domains such as basicscienceresearchandclinicalpractice[2–4].This 0 6 A p ril 2 0 *Correspondingauthor.CatiaM.Machado,DepartamentodeInforma´tica,FaculdadedeCieˆncias,UniversidadedeLisboa,Portugal 1 9 andInstitutodeEngenhariadeSistemaseComputadores-Investigac¸a˜oeDesenvolvimento,UniversidadedeLisboa,Portugal.E-mail: [email protected] CatiaM.MachadoisaPhDstudentattheDepartmentofInformaticsoftheFacultyofSciencesandatINESC-ID,oftheUniversity of Lisbon. Her research interests are data representation and integration, in particular with semantic web technologies, knowledge discovery and translational medicine. DietrichRebholz-Schuhmann(PhD)is‘Oberassistent’(similartoAssociateProfessor)withtheUniversityofZu«rich,Departmentof ComputationalLinguistics.Hisresearchinterestsarebiomedicalliteratureanddataanalysis,dataintegrationandknowledgediscovery. AnaTeresaFreitas(PhD)isanAssociateProfessorwiththeTechnicalUniversityofLisbon,DepartmentofComputerScienceand EngineeringandtheheadofthegroupKnowledgeDiscoveryandBioinformaticsatINESC-ID.Herresearchinterestsareintheareas of Computational Biology, Humangenetics, Algorithms and DataMining. FranciscoM.Couto(PhD)isanAssistantProfessorattheDepartmentofInformaticsofFacultyofSciences(UniversityofLisbon).He isaSeniorResearcherofLASIGEwherehecoordinatestheBiomedicalInformaticsresearchline.Hisresearchinterestsareintheareas of Text and DataMining, Information Retrieval and Extraction, Ontologies and Bioinformatics. (cid:2)TheAuthor2013.PublishedbyOxfordUniversityPress. ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense(http://creativecommons.org/licenses/by- nc/3.0/),whichpermitsnon-commercialre-use,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.Forcommercialre-use, [email protected] 90 Machado et al. bridging is done at two distinct levels: at the level of data are composed of private patient data and pro- basic science research, translating it into new devices prietary data from pharmaceutical and publishing or treatments (‘from the bench to the bedside’); and companies. Translational medicine thus requires at the level of clinical practice, transferring the new appropriatetechnologiesfortheinterpretationofdis- treatments into the daily routine (Figure 1) [4,5]. tributed and disparate data resources, and it is easy to Additionally, knowledge in translational medicine conceivethatsuchalargescaleendeavorwillrequire can also flow in the contrary direction, resulting in a versatile infrastructure that preserves data semantics theinitiationofnewbasicresearchbasedontheclin- at all integration levels. ical observations of a disease development. Included inthefirstbridginglevelisgenomicmedicine,which Using the semantic web fordata consists in exploring the molecular genetics know- integration ledge of diseases and translating it into personalized Theneedfordataintegrationanddatainteroperabil- D treatments with more beneficial treatment responses ity has a long-standing history. The Committee on ow n and with reduced undesired effects [6]. For example, Models for Biomedical Research proposed in 1985 a lo a a clinician may analyze a patient’s mutations to structured and integrated view of biology to cope de d explain observed drug side effects or may retrieve with the available data [7]. Ten years later, in fro m the list of biomarkers and their functions that have 1995, Davidson et al. questioned the feasibility of h been associated with a specific cancer type. data integration, since the resulting data structure ttp s Itisunquestionable thattranslational medicine isa hastofollowchanges in thedataitself andindividual ://a c multidisciplinary research domain that relies both on research groups fail to comply with the integration ad e public and protected data. Public data include structure [8]. In 2007, the challenges identified for m ic resources such as medical guidelines, scientific litera- data integration in genomic medicine were the lack .o u p ture and biomedical databases, whereas protected of clinical data sources; the privacy issues linked to .c o m /b ib /a rtic le -a b s tra c t/1 6 /1 /8 9 /2 4 0 2 3 6 b y g u e s t o n 0 6 A p ril 2 0 1 9 Figure 1: Knowledgeworkflowintranslationalmedicine.Translationalmedicineimprovestheknowledgeonhuman diseasesby translatingbasicscienceresearchresultsintonewexams,devicesandtreatments,whicharethenincor- poratedinto the clinicalpractice.It also explores theknowledge collectedduringpatientcare to identifynewre- searchtopicsandtopics thatneedfurtherresearch. Thesemanticwebintranslationalmedicine 91 clinical data; the inherent complexity of medical data integration and interoperability a standard fea- records; and finally, the lack of data representation ture instead of a requirement. If built on this infra- standards in the clinical domain [6]. These selected structure, many of the technical challenges faced by examples clearly show that data integration remains translational medicine are thus prevented. However, an open research area and that its complexity escal- it is important to bear in mind that, as happens with ates with the increase in number of heterogeneous other technologies, the semantic web is inherently domains to be integrated. constrained by the complexity of the domain of The World Wide Web is the key information knowledge. channel for the communication of public data, par- In this work, we analyzed how the semantic web ticularly for the scientific community, since it allows and its technologies have been used in the transla- the fast publication of methods, results and opinions, tional medicine domain. In particular, we analyzed anditiseasilyreachedbyvirtuallyanyoneanywhere. which technologies are more often exploited and D Thisinformationchannelfulfillstherequirementsfor how they are used. For that purpose, we analyzed ow n efficient data exchange between scientific commu- 11 noncommercial systems integrating genetic and lo a nities and data repositories, and thus should also be medical data, developed from 2007 to 2013. These de d explored in translational medicine for optimal pro- systemsarepresentedintermsofthemedicalcontext fro gress.However,itsusefulnessinthiscontextiscoun- in which they were developed, the resources that m h terbalanced by the lack of data standards across were embedded, their compliance with the semantic ttp s domains, of explicit data representations, and of web principles and, finally, the extent to which the ://a c interoperability of the data resources, which hinder new knowledge can reach the everyday clinical ad e the sharing of data between the biomedical and the practice. m ic clinical domains [9]. .o u p Tim Berners-Lee et al. proposed the vision of the .c o semantic web, where the web of documents is SEMANTIC WEBRESOURCESFOR m /b replaced by the web of data, thus allowing the ma- TRANSLATIONALMEDICINE ib /a nipulationofdataoverdisparatedomainsandsolving Combining resources from public and private repo- rtic most of the problems previously stated for data inte- sitories, either in an open infrastructure or in a clin- le-a b gration[10].Themanipulationofdataisachievedby ical environment, requires data representation s tra substituting the links connecting web pages (i.e. the standards, semantic normalization and ultimately c t/1 documents) with links connecting the data elements data sharing (with appropriate access control poli- 6 /1 themselves and adding semantics to them all. The cies). The infrastructure of the World Wide Web /8 9 data elements in the web thus represent real-world can be exploited to this end, but it has to be focused /24 0 entities andthelinks betweendataelements embody on the semantic representation of data and on the 23 6 the logical relations between those entities. When interoperability of data, or, in other words, it has to b y independent applications share this representation become a semantic web. gu e of the reality, interoperability and effective data in- st o tegration across knowledge domains are achieved. Technological standards inthe n 0 6 The semantic web thus becomes the framework for semantic web A p data integration at the web-scale, independent from Overthepastdecade,thesemanticwebcommunity, ril 2 theknowledgedomainandfocusedonthesemantics and in particular the World Wide Web Consortium 01 9 and context of the data. The result is a network of (W3C), has been developing a set of core technolo- linked data that can be exploited by computers: by gies to realize the vision of the semantic web. Some following the links between data elements, jumping of these technologies have since become de facto from data set to data set; by querying the whole standards, and have brought the semantic web to network, and thus providing an answer based on life [12,13]. otherwise independent data sets; and by reasoning The Resource Description Framework (RDF) is a over the data, based on its formal representation, standard language for data representation and inter- thus identifying new implicit connections between change on the Web [14]. It uses the Universal data elements [11]. Resource Identifier (URI) to identify each data Thesemanticwebreachesbeyonddataintegration element represented [15]. The basic structure of toward data sharing across institutions, and makes RDF is the triple, a statement composed of a subject 92 Machado et al. connected with an object through a predicate, Although necessary, these standards are not suffi- similar to narrative statements in English (e.g. cientfortheimplementationofthewebofdata.This ‘HomoSapiens isA mammal.’, ‘Dopamin treats can be achieved with the representation of domain ParkinsonSyndrome.’). Since either of these elem- knowledge with ontologies and with the semantic ents can be part of different statements, data in characterization of links between resources. RDF are best visualized through a directed graph, where the nodes represent the subjects and objects, Domain knowledge representation and the arcs represent the predicates (or relations). Semantic interoperability is a key requirement in the The RDB to RDF Mapping Language (R2RML, realization of the semantic web and it is mainly in which RDB stands for relational database) is a achieved through the generation of resources that language that expresses customized mappings from reliably represent the abstraction of real-world relational databases to RDF data sets [16]. As such, objects and their interactions. These representations D it assists in the integration of data from relational exist in the form of ontologies and controlled voca- ow n databases by exporting it in RDF. bularies in general. An ontology is ‘an explicit spe- lo a Owing to its basic and simple format, RDF cification of a conceptualization’ that provides a de d restricts the representation of data to low levels of means to formally describe domain knowledge in a fro m expressiveness (e.g. it does not allow the union of structured manner [23]. If an ontology is accepted as h concepts, the definition of hierarchic relations be- a reference by the community (e.g. the Gene ttp s tween concepts or the definition of cardinality in Ontology and the SNOMED-CT), its representa- ://a c nonhierarchical relations). To overcome this limita- tion of the reality becomes a standard, and data in- ad e tion, two other technologies have been proposed: tegration is facilitated [24,25]. This is true even if m ic the RDF Schema (RDFS), a specification language different abstraction levels are provided from unre- .o u p for data properties based on RDF; and the Web lated data sets, since the hierarchical structure of .c o Ontology Language (OWL), a language to formally ontologies supports the identification of a common m /b define semantics, which also enables reasoning based ancestor for any two related concepts, by traversing ib /a on Description Logics [17–19]. Both formal lan- the ontology graph [26]. rtic le guages extend RDF and enable the inference of The representation of ontologies in RDFS or -a b new knowledge. As a result, knowledge can be OWL provides additional advantages, namely, s tra shared and at the same time assessed for formal novel interpretations of the existing data against the c t/1 semantic consistency. ontological knowledge enabled by the mapping of 6 /1 SPARQL, a self-referencing acronym for data elements in RDF representation (‘instances’) to /8 9 SPARQL Protocol and RDF Query Language, is a the ontological concepts (‘classes’ or ‘types’); and /24 0 query language to access RDF data [20]. Since RDF more detailed semantic comparisons of concepts 23 6 data may be distributed over disparate data sources that exploit the expressiveness of these formats [27]. b y (including data stores exporting RDF from The Open Biomedical Ontologies (OBO) format gu e non-RDF relational databases), SPARQL has to also exists for ontology representation, although it is st o retrieve data from all these resources. Due to the notastandardsemanticwebtechnology[28].Dueto n 0 graph structure of RDF, SPARQL queries are trans- its popularity in the health care and life sciences 6 A p formed into graph pattern searches that rely only on domains, extensive work has been done in the con- ril 2 theknowledge abouttherelationsbetweenconcepts versionofontologiesinthisformattoOWL[29–31]. 0 1 9 but not on a particular data model. SPARQL is also able to query RDFS and OWL provided that the Linkingdata graph pattern matching of the SPARQL query is Mappings between resources are another key elem- defined with semantic entailment relations instead ent in the semantic web, enabling interlinked struc- of the explicit graph structures [21]. Although tureddataaccordingtotheprinciplesdefinedbyTim other query languages exist for RDF (e.g. Berners-Lee: (i) use Uniform Resource Identifier(s) RDQL [22]), the availability of a SPARQL end (URIs) as names for things; (ii) use resolvable URIs point (i.e. an interface that provides access to a data (e.g. based on the HTTP protocol) so that those set through SPARQL queries) guarantees the inde- names can be looked up (either by people or pendence from software and implementation machines); (iii) provide useful information for specifications. lookup through the URI, using the standards (e.g. Thesemanticwebintranslationalmedicine 93 RDF, SPARQL); and (iv) include links to other forming a Linked Open Data Cloud. The integrated URIs, so that they can discover more things resources can then be explored by crawling or on- [32–34]. The URI can then be used to define any the-fly exploration, through query federation or a real-world entity (or ‘thing’), be it an object or an virtual knowledge broker [37,38]. Crawling the abstract concept [35]. web means traversing the links between resources Examples of real-world entities in the biomedical in advance, to reduce the response time of queries domain are diseases, drugs, facts related to genes and during run-time. However, it may lead to the protein functions, patient symptoms, biological retrieval of outdated data. On-the-fly exploration measurements and family history. Ideally, each indi- means accessing the data only during run-time, vidual entity should have only one URI, so that which ensures the data are always up-to-date, but every application points to the same source, regard- mayleadtolongerwaitingperiods.Queryfederation less of its domain. This means that if the entity is consists in sending queries, or portions of complex D altered in the original source, all applications point- queries, to a fixed set of resources (e.g. FeDeRate ow n ing to it will be automatically updated. Additionally, [37]). Although this is the most advantageous lo a the correct definition of URIs ensures that map- approachduetotheflexibilityofqueryformulations, de d pings between resources do not lead to semantic it presents the same limitations as data federations, fro inconsistencies. namely, the low performance of complex queries m h The links established between resources can be when considering a large number of data sources. ttp s defined both at instance-level (i.e. between data Finally, the virtual knowledge broker exploits dis- ://a c instances) andatschema-level (i.e.betweenconcepts tributeddataresourcesandmakesuseofthesemantic ad e or properties defined in different vocabularies). data representation to deliver a coherent view to the m ic Heath and Bizer state the existence of three import- endusers,withthepossibilityofbeinginstantiatedin .ou p ant types of instance-level links: ‘Vocabulary Links’ different locations [38]. .c o thatmapaninstancetothedefinitionofthevocabu- The Linking Open Data project, under the tutel- m/b lary concept used to represent it; ‘Identity Links’, age of the W3C Semantic Web Education and ib/a used to indicate when two instances with different Outreach Interest Group, is one key distribution rtic URIs refer to the same real-world entity (defined in channel for the publishing of data sets in the web le-a b O‘RWelaLtiontshhriopugLhinkst’htehatpmroapperatny in‘sstaamnceeAsin’);a daantda uthsiengdethfienisteiomnanotifc wlinekbsstcaonndnaredctliannggutahgeemR[D36F,3a9n]d. stract/1 set to related things in other data sets (e.g. people to Currently(asofAugust2013),337datasetsareavail- 6/1 places) [36]. There are also three types of links able from disparate domains such as geography, gov- /89 that can be defined at schema-level: ‘Equivalence /2 ernance and life sciences [40,41]. The latter includes 4 0 Links’ (similar to the identity links at instance-level) 2 examples such as the Gene Ontology, PubMed and 3 6 used to indicate when two concepts are equivalent UniProt [24,42,43]. by and therefore have the same set of instances g Thenotionofopendataisbasedonthefreeusage u e (owl:equivalentClass)orwhentwopropertiesrepresent s andredistributionofdata.Theargumentssupporting t o the same relationship (owl:equivalentProperty); ‘Hier- n the openness of data are based on the fact that gov- 0 archical Links’ that define a hierarchical relation ernment and scientific data are financed by public 6 A between concepts (defined in RDFS as subClassOf) taxes and therefore should be publicly available. In pril 2 or between properties (rdfs:subPropertyOf); and ‘Rela- 0 the particular case of the translational medicine 1 tionship Links’, which canbeusedto relate concepts 9 domain, the notions of linked data and linked from different data sets through any definable rela- open data are markedly distinct and present a funda- tion (e.g. a concept ‘Gene’ in one vocabulary can be mental limitation in achieving data integration. related through the property associatedToto a concept ‘Disease’ in another vocabulary). Exploringlinkeddata SOLUTIONSFORTRANSLATIONAL If data providers follow the principles of publishing MEDICINE and interlinking structured data on the web as indi- According to our analysis, 11 systems have been re- cated above, including the definition of mappings, ported in the scientific literature that present transla- data will be integrated as in a large-scale database, tional medicine solutions dealing with medical 94 Machado et al. conditions as disparate as cardiovascular diseases, (Receptor Explorer); the repurposing of drugs; cancer and diabetes. Traditional Chinese Medicine (TCM); and congeni- Three systems focused on the cardiovascular tal muscular dystrophy [37,51–53]. system: one on the identification and prioritization Sevenofthe11translationalmedicinesystemssur- ofcandidategenesforcardiovasculardiseases;another veyed integrate public resources (see Figure 2) but oneongeneticassociationstudiesforhypercholester- four of them consider only private data. Figure 3 olemia; and the third one also addressing association shows the distribution of public resources integrated studies but for cerebrovascular diseases [44–46]. in each system. Two systems targeted cancer and its causes: one exploring genetic association studies for cervical Exploitationof semantic web resources cancer (Association Studies aSsisted by Inference and Semantic Technologies (ASSIST)); and the As previously pointed out, exploiting the semantic D other one identifying personalized treatments for web to its full potential requires four key constructs: o w colon cancer patients (MATCH) [47,48]. (i) structured (and ideally shared) knowledge repre- nlo a Two other systems targeted type 2 diabetes melli- sentations; (ii) mappings between resources; (iii) data d e d tus:onefocusedontheunderstandingofitscausesto sharing; and (iv) use of semantic web technology fro discover novel treatment hypotheses (Semantic standards in the previous three constructs. m h Enrichment of the Scientific Literature (SESL)); To evaluate how the translational medicine sys- ttp s and the other one on genetic association studies tems exploited the semantic web and its technolo- ://a c [49,50]. The latter covered hypothyroidism in add- gies, we assessed them in view of three fundamental a d e ition to type 2 diabetes. parameters for data integration: (i) degree of data m ic Each of the remaining four solutions tackled dif- sharing, (ii) data interoperability and (iii) knowledge .o u ferent biomedical tasks: neuroscience research discovery. p.c o m /b ib /a rtic le -a b s tra c t/1 6 /1 /8 9 /2 4 0 2 3 6 b y g u e s t o n 0 6 A p ril 2 0 1 9 Figure 2: The type ofdatausedby the11translationalmedicine systems surveyed.Four systemsusesolelypublic data,threeintegratebothpublicandprivatedataandfouruseonlyprivatedata.ReceptorExplorer,thecerebrovas- cular diseases system and SESL (representedwith dashedborders) are the only systems thatprovide open access to theirintegratedresources. Thesemanticwebintranslationalmedicine 95 Figure 3: Publicresourcesintegratedbythetranslationalmedicinesystemssurveyed.Theresourcesshownonthe left are thoseintegratedby the three systems targeting the cardiovascular system, whereas theresources shown D ontherightsidearethoseintegratedbytheremainingfoursystems.Theresourcesintegratedinthecardiovascular ow n systemsubdomainthatwerealsoconsideredinatleastoneof theother subdomainsareunderlined. lo a MRDçMental Retardation Database; MPOçMammalian Phenotype Ontology; GeneçNCBI Gene Database; d e d OMIMçOnline Mendelian Inheritance in Man; GOçGene Ontology; KEGGçKyoto Enclycopedia of Genes and fro Genomes; SNPçSNP Database; LSDçLocus-Specific Databases; GXAçGene Expression Atlas;UMLSçUnified m h Medical Language System; GOAçGene Ontology Annotation; OBOçOpen Biomedical Ontologies; LODDç ttp LinkedOpenDrugData;BAMSçBrainArchitectureManagementSystem;MeSHçMedicalSubjectHeadings. s://a c a d e m Data integration requires primarily data sharing, resources to achieve data integration (Figure 2). ic .o u which in translational medicine can be achieved However, out of the seven systems that integrate p .c with public resources (e.g. gene and protein data) public data, only three shared their data after inte- om and/or private repositories (e.g. patient data). gration: Receptor Explorer (neuroscience context), /bib Csoounrcveesrscealyn,atlshoeleinadtegtoratdioatna sohfardinatgaiffrtohme rdeisfofeurrecnets SceErSeLbro(vtyapsceul2ar ddiiasebaesteesssymsteelmlitu(sallctohnrteeexts)ystaenmdstahree /article are then made available to a wider audience. It is represented with dashed borders in Figure 2). -ab s important to note that sharing data means that the Receptor Explorer integrates public resources, tra c dataareaccessiblebythird-partymembershavingthe some of which are maintained in their original loca- t/16 appropriate access rights, but not necessarily access- /1 tion(e.g.DBpedia),andexposesthembothaslinked /8 ible by the general public. data and through a SPARQL end point. SESL, on 9/2 4 Data interoperability is achieved with the support 0 the other hand, integrates both public and propri- 2 3 ofsemanticwebtechnologiesandresources,through 6 etary resources in a local triple store, exposing them b the use of the technological standards (e.g. RDF, y through the links established with Wikipedia and a g URIs, RDFS and OWL), the linking of data and ue SPARQL end point. However, it requires specific s the representation of domain knowledge with con- t o accessrights for accessingparts ofthescientific litera- n trolled vocabularies. 0 6 Finally, data integration can lead to knowledge ture. The SESL portal functions as a virtual know- A p discoverybyenablingtheexplorationofapotentially ledge broker [38]. The cerebrovascular diseases ril 2 system works as a bridge (or share point) for re- 0 unlimited set of resources covering different know- 1 9 sources from different institutions, but does not dis- ledge domains, from which new associations can be close the data to the general public. discovered and previously hypothesized associations The four systems that integrate exclusively private can be validated. In the semantic web context, data(seeFigure2)functionasnonpublicsharepoints knowledge discovery is founded on the use of the inthesamewayasthecerebrovasculardiseasessystem. standards (e.g. RDFS and OWL) and the explor- The remaining four systems do not explicitly ation of available linked data resources at a web- scale [11]. state that the integrated data or resultant knowledge is shared in any manner, and thus are assumed to Datasharing instantiate a local and closed translational medicine All of the translational medicine systems surveyed solution available only to the directly involved took advantage of shared data from public or private parties. 96 Machado et al. Datainteroperability systemdefinedlinksbetweendataresourcesandcon- All systems incorporate semantic web technologies trolled vocabularies, and SESL defined both types of enabling data interoperability, which include the links. representationofdomainknowledgewithcontrolled Among the standard technologies, RDF, vocabularies, links between resources and the use of OWL and SPARQL are the most common (see the semantic web standards (see Figure 4). Figure 4), with only three systems not using RDF: Seven of the 11 systems used controlled vocabul- cerebrovascular diseases, cervical and colon cancer. ariestorepresenttheirdomainknowledge:theTCM RDFS is only adopted by the TCM system and system adopting the RDFS language, and the other R2RML by the diabetes/hypothyroidism system. six systems adopting OWL. From these seven sys- Only three systems (diabetes/hypothyroidism, tems, three reused existing vocabularies, whereas the pharmacology and Receptor Explorer) use URIs, other four developed their own. SESL reused exist- eventhoughtheiradvantageswerepraisedbyseveral D ing controlled vocabularies only for data annotation. of the authors of the remaining systems. These three ow n Regarding the implementation of links, Receptor systems use locally defined URIs to represent the lo a Explorer and the muscular dystrophy system defined integrated data elements, but Receptor Explorer de d links between data resources, the cerebrovascular provides open access to the resources, thus making fro m diseases system and the diabetes/hypothyroidism their URIs tractable and exploitable by third parties. h ttp s ://a c a d e m ic .o u p .c o m /b ib /a rtic le -a b s tra c t/1 6 /1 /8 9 /2 4 0 2 3 6 b y g u e s t o n 0 6 A p ril 2 0 1 9 Figure 4: Technicaldescription of the translationalmedicine systems surveyed.This figure shows theuse of con- trolledvocabulariesforknowledgerepresentation,aswellastheirreuseforknowledgerepresentationanddataan- notation (markedwith *).Furthermore, it shows the definition ofmappingsbetweenresources, the consideration ofURIs,andliststheuseofthreesemanticwebstandardtechnologies:RDF,OWLandSPARQL.Allthisinformation isindicatedforallthetranslationalmedicinesystemsdiscussed. Thesemanticwebintranslationalmedicine 97 Knowledgediscovery the exploratory approach and the remaining four Exploring a set of integrated resources by following the inference approach (see Figure 5). Among the existingmappingsisastraightforwardformofknow- seven systems that follow the exploratory approach, ledge discovery. A more complex form involves fouruseRDFS/OWLontologiesforknowledgerep- inference mechanisms that uncover knowledge that resentation,whichmeansthattheydonotexploitthe does not have a previous explicit representation. reasoningpotentialofthoselanguages.Ofthesystems Both approaches contribute to either formulate exploring inference, the muscular dystrophy system new hypotheses or refine and validate existing defined custom rules over RDF instead of using ones,whichcanleadtonewresearchideasandeven- either RDFS or OWL, whereas the pharmacology tually to new treatments for individual patients. system defined custom rules over RDF despite using All surveyed translational medicine systems per- OWL,owingtothefactthattheirchosentriplestore form knowledge discovery, with seven following did not support inference over OWL. D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /b ib /a rtic le -a b s tra c t/1 6 /1 /8 9 /2 4 0 2 3 6 b y g u e s t o n 0 6 A p ril 2 0 1 9 Figure 5: Theknowledgediscoveryapproachesfollowedbythetranslationalmedicinesystemssurveyed.Allofthe systems performed knowledge discovery over their integrated resources: some exploiting inference, and the othersfollowinganexploratoryapproach. 98 Machado et al. Receptor Explorer is an example of a system data between the intervening research communities, implementing the exploratory approach to know- butalsotheadaptationofdataforsafeandpotentially ledge discovery. In this system, a knowledge base unrestricted use by both communities. was created that aggregates the Neurocommons Given the number of intervening parties in the knowledge base and the data sets generated by the translationalmedicinesetting,dataintegrationisfun- W3C’s Linking Open Drug Data task force damental for the evolution of this domain of know- [37,54,55].TheNeurocommonscontainsbiomedical ledge. As we have shown, the semantic web has the databasesandontologiessuchastheOBOandpartsof potential to assist in many ofthe difficulties posed by the SenseLab Neurobiology databases, while the theintegration ofdata fromdisparate sources, as four LinkingOpenDrugDatasetsincludedataconcerning ofitsunderlyingprinciplesacceleratedataintegration clinicaltrialsanddisease–geneassociations[56,57].In and its exploration: addition to these locally stored data sets, Receptor D Explorer integrates data from resources maintained (1) Represent data and knowledge with technolo- ow n at their original location, namely DBpedia, gies that serve as a standard across the entire lo a Bio2RDF and the Linked Clinical Trials project community. de d [58–60].Throughthispipelineofresources,itispos- (2) Define mappings between resources. fro sibletoselectaneuralreceptor,obtainitsdescription, (3) Provide access to the resources so they can be m h the genes involved in it, as well as publications and integrated. ttp s clinical trials involving the receptor. (4) Share the effort of resource integration among ://a c Thepharmacologysystem,ontheotherhand,isan data providers and data users. ad e exampleofasystemimplementingknowledgediscov- m ic ery through inference. The resources it integrates in- The analysis of the translational medicine systems .o u clude DrugBank, Unified Medical Language System, presented in the previous sections provides an over- p.c o KyotoEnclycopediaofGenesandGenomes,National view of how the semantic web resources are being m /b Center for Biotechnology Information’s Entrez Gene exploitedinthisdomainofknowledge.Itshowsthat ib /a database (from which Gene Ontology annotations mosttranslational medicinesystems adhereinearnest rtic were extracted for human genes) and Online to the first principles described above, with RDF for le-a Mendelian Inheritance in Man [61–65]. The authors data structuring, formal semantics and exploratory bs presenta goodexampleofhowknowledge discovery knowledge discovery among the features most com- trac through inference enables the identification of a con- monly used. However, many systems neglect or t/16 /1 nection (until then undefined) between a drug ignore the remaining three principles. /8 9 approvedforthetreatmentofhypertensionandacon- By itself, the use of standard semantic web tech- /24 0 nective tissue disorder. The identification of this con- nologiesdoesnotfulfillthesemanticwebvision.For 23 6 nectionwasonlypossibleowingtotheuseofthedata example,ofthesevensurveyedsystemsthatusecon- b y inferredfrom theGeneOntology. trolledvocabulariesdevelopedinOWLorRDFSfor gu e knowledge representation, only three reuse existing st o controlled vocabularies. The other three systems n 0 ISSEMANTIC WEBTECHNOLOGY have created their own vocabularies, opting for a 6 A p ENABLINGTRANSLATIONAL representation of the domain knowledge not ril 2 MEDICINE? shared by other researchers. Despite using standard 0 1 9 Delivering solutions from the ‘bench to the bedside’ semantic web technologies, these systems do not and incorporating them into the health care practice promote interoperability between applications and requires that the data flow from research in molecu- thus fall short of the semantic web vision. larbiology,geneticsandpharmacologyintotheclin- The definition of mappings between resources is ical domain and in reverse. Within this flow of data also critical in this context, as mappings facilitate the and knowledge, research on the molecular mechan- access to the resources that have them, increase the isms of diseases and drugs can be translated more interoperability between applications that use these quickly into novel treatment approaches, and con- resources and increase the impact of these resources versely, observations aboutpatients canleadtonovel in the knowledge discovery process. Despite the hypotheses and experimental conditions. The setup clear advantages of using mappings, only five of the forthisexchangerequiresnotonlytheintegrationof surveyed systems exploit them.

Description:
Nov 6, 2013 The semantic web in translational medicine: current applications and future directions. Catia M. Machado*, Dietrich Rebholz-Schuhmann, AnaT
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.