ebook img

BioPAX – Biological Pathways Exchange Language Level 3, Release PDF

165 Pages·2011·4.65 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview BioPAX – Biological Pathways Exchange Language Level 3, Release

BioPAX – Biological Pathways Exchange Language Level 3, Release Version 1 Documentation BioPAX Release, July 2010. The BioPAX data exchange format is the joint work of the BioPAX workgroup and Level 3 builds on the work of Level 2 and Level 1. BioPAX Level 3 input from: Mirit Aladjem, Ozgun Babur, Gary D. Bader, Michael Blinov, Burk Braun, Michelle Carrillo, Michael P. Cary, Kei-Hoi Cheung, Julio Collado-Vides, Dan Corwin, Emek Demir, Peter D'Eustachio, Ken Fukuda, Marc Gillespie, Li Gong, Gopal Gopinathrao, Nan Guo, Peter Hornbeck, Michael Hucka, Olivier Hubaut, Geeta Joshi- Tope, Peter Karp, Shiva Krupa, Christian Lemer, Joanne Luciano, Irma Martinez-Flores, Zheng Li, David Merberg, Huaiyu Mi, Ion Moraru, Nicolas Le Novere, Elgar Pichler, Suzanne Paley, Monica Penaloza- Spinola, Victoria Petri, Elgar Pichler, Alex Pico, Harsha Rajasimha, Ranjani Ramakrishnan, Dean Ravenscroft, Jonathan Rees, Liya Ren, Oliver Ruebenacker, Alan Ruttenberg, Matthias Samwald, Chris Sander, Frank Schacherer, Carl Schaefer, James Schaff, Nigam Shah, Andrea Splendiani, Paul Thomas, Imre Vastrik, Ryan Whaley, Edgar Wingender, Guanming Wu, Jeremy Zucker BioPAX Level 2 input from: Mirit Aladjem, Gary D. Bader, Ewan Birney, Michael P. Cary, Dan Corwin, Kam Dahlquist, Emek Demir, Peter D'Eustachio, Ken Fukuda, Frank Gibbons, Marc Gillespie, Michael Hucka, Geeta Joshi-Tope, David Kane, Peter Karp, Christian Lemer, Joanne Luciano, Elgar Pichler, Eric Neumann, Suzanne Paley, Harsha Rajasimha, Jonathan Rees, Alan Ruttenberg, Andrey Rzhetsky, Chris Sander, Frank Schacherer, Andrea Splendiani, Lincoln Stein, Imre Vastrik, Edgar Wingender, Guanming Wu, Jeremy Zucker BioPAX Level 1 input from: Gary D. Bader, Eric Brauner, Michael P. Cary, Emek Demir, Andrew Finney, Ken Fukuda, Robert Goldberg, Susumu Goto, Chris Hogue, Michael Hucka, Peter Karp, Minoru Kanehisa, Stan Letovksy, Joanne Luciano, Debbie Marks, Natalia Maltsev, Elizabeth Marland, Peter Murray-Rust, Eric Neumann, Suzanne Paley, John Pick, Aviv Regev, Andrey Rzhetsky, Chris Sander, Vincent Schachter, Imran Shah, Mustafa Syed, Jeremy Zucker Thanks to the many additional people who contributed to discussions 1 on the various BioPAX mailing lists and at BioPAX meetings. This document was edited by Nadia Anwar, Gary Bader, Emek Demir, Sylva Donaldson and Igor Rodchenkov. Also edited for Level 1 and Level 2 by Michael P. Cary. Copyright © 2009 BioPAX Workgroup. Some rights reserved under the Creative Commons License (http://creativecommons.org/licenses/by/2.0/). 2 Abstract There are over 300 Internet-accessible databases that store biological pathway data. Biologists often need to use information from many of these to support their research, but since each has its own representation conventions and data access methods, integrating data from multiple databases is very difficult. A widely-adopted biological pathway data exchange format will help make data collection and integration easier. BioPAX (Biological Pathway Exchange - http://www.biopax.org) enables the integration of diverse pathway resources by defining an open file format specification for the exchange of biological pathway data. By utilizing the BioPAX format, the problem of data integration reduces to a semantic mapping between the data models of each resource and the data model defined by BioPAX. Widespread adoption of BioPAX for data exchange will increase access to and uniformity of pathway data from varied sources, thus increasing the efficiency of computational pathway research. This document describes BioPAX Level 3, which expands the scope of BioPAX to include states of physical entities, generic physical entities, gene regulation and genetic interactions. BioPAX Level 3 supports the representation of the bulk of pathway data in publicly available databases. Scope of this document This BioPAX documentation is targeted at computational biologists with an interest in biological pathway data. For an overview of BioPAX, read the introduction (section 1). It is expected that readers are familiar with one or more pathway databases and have a basic understanding of both bioinformatics and molecular and cellular biology. This background information is available in a number of textbooks1. This document provides an overview the BioPAX Level 3 ontology. This includes descriptions of the BioPAX ontology classes, sample use cases and best practice recommendations. This document does not provide a full definition of the BioPAX Level 3 ontology, which is given by the BioPAX Level 3 OWL file, located at: http://www.biopax.org/release /biopax-level3.ow l 3 New Features in BioPAX Level 3 The major change in BioPAX Level 3 is that the representation of physical entities (e.g. proteins) has been redesigned to support physical entities in diverse states, and generic physical entities. This has required the removal of some utility classes and the addition of some new ones. Support for new features required backwards incompatible changes compared to the BioPAX Level 1 and 2 formats, however, the majority of the classes and properties are unchanged. Better support for physical entities in diverse states A protein, as recorded in a sequence database like UniProt, is now represented as a ProteinReference, which stores the protein sequence, name, external references, and potential sequence features (this is similar in meaning to the class ‘protein’ in BioPAX Level 1 and 2). The actual protein chemical species post-translationally modified, bound in a complex or present in a specific cellular compartment, that participates in an interaction is now represented as the class Protein (this is similar in meaning to the class physicalEntityParticipant in BioPAX Level 1 and 2, except that stoichiometry is part of Conversion in Level 3 and there is no need to duplicate proteins, as was done with physicalEntityParticipants in Level 1 and 2). This new design makes it easier to create different forms of a protein while not duplicating information common to all forms (e.g. protein sequence) and explicitly linking all forms of a protein together (through the shared ProteinReference). Representation of sequence features and stoichiometry were significantly changed. Other physical entities: DNA, RNA and small molecule, have similarly been redesigned. Only complex (now Complex) has not been changed, since it is composed of other physical entities that have been redesigned. The physicalEntityParticipant class has been removed, as it is no longer needed with the new design. This makes BioPAX easier to use, interactions now reference their participants directly, not through an intermediate physicalEntityParticipant class. Support for generic physical entities Generic physical entities are often used in pathway databases e.g. alcohols, nucleotides (dNTPs), and the Wnt protein family (there are many different Wnt genes and proteins in some genomes). Different types of these physical entity groupings can be used, such as homology groups or groups of small molecules that share the same chemical functional group. These can now be represented using the EntityReference class, instances of which can contain multiple member EntityReferences of the same type (via the memberEntityReference property). Generic features, such as binding 4 sites or post-translational modifications across molecules, are also supported using the EntityFeature class and its memberFeature property. Support for gene regulation networks Gene regulation networks, involving regulators of gene expression (e.g. transcription factors, microRNAs) and their targets can now be represented. The new TemplateReaction class captures polymerization of macromolecule polymers from a DNA or RNA template. It stores the template, product and the regulatory region common to all types of template reaction being described (e.g. promoter for transcription, 3’UTR for translation). A new control class, TemplateReactionRegulation, involving an expression regulator physical entity (e.g. transcription factor), controls a TemplateReaction. Support for genetic interactions Genetic interactions, such as epistasis or synthetic lethality, are important for mapping pathways from organisms like yeast, worm, fly and mouse. This information is increasingly available in pathway and interaction databases. To capture these interactions, there is now a GeneticInteraction class, which contains a set of genes and a phenotype (expressed using PATO or another phenotype controlled vocabulary). Controlled vocabulary terms to support genetic interactions have also been added to the PSI-MI controlled vocabulary. The Gene class has also been added to support genetic interactions. Support for degradation Degradation of physical entities, such as proteins, is important in many regulatory pathways. A new Degradation class, a sub-class of Conversion, has been added to capture this event. The left side of the interaction contains the degradation substrate and the right side is empty, signifying that the degradation products are not tracked within BioPAX and return to an unspecified molecule pool in the cell. 5 Major changes from BioPAX Level 2 Warning! The semantics of the physicalEntity classes have changed, but their names have not (except for the first letter, which is now in upper case). For example, Protein now refers to a protein (as a pool of molecules) in a state, whereas it used to refer to the base definition of the protein, as would be found in a protein sequence database. This base definition is now a sub-class of UtilityClass, called EntityReference. The PathwayStep class has been moved to a new property in pathway to make pathways easier to create (you only need to create pathway step instances if you want to order parts of the pathway). Also, there is a new BiochemicalPathwayStep class, a subclass of PathwayStep, to make ordering of the biochemical processes easier. L2 physicalInteraction, which stores molecular interactions from e.g. proteomics experiments, has been moved to be a child of the Interaction class named MolecularInteraction. This recognizes that it is a different type of interaction than control and conversion, which were previously children of the physicalInteraction. All controlled vocabulary references now have their own class. E.g. BioSource references TissueVocabulary. This makes use of external controlled vocabularies easier. Also, the openControlledVocabulary class has been renamed ControlledVocabulary. The confidence class has been renamed to Score to make it more general and suitable for describing genetic interactions. Cardinality restrictions that documented required and optional properties are now specified. Documentation has been added that states which functional properties are required vs. optional. By popular demand, all class names have been changed to the standard CamelCase and all property names to mixedCase. 6 Pathway Representation Abstraction Supported in BioPAX Level 3 Different pathway representation abstractions are in common use for different types of pathway information. Each abstraction is tailored to make representation of the specific type of pathway data easier. Multiple representation abstractions are supported in BioPAX Level 3. Understanding each abstraction and which classes it uses is the best way to understand how to use BioPAX. Metabolic pathways Metabolic pathways mostly involve biochemical reactions where protein enzymes convert small molecule reactants to small molecule products. While there are many exceptions to this general statement, the majority of metabolic pathway data in databases is covered. BioPAX Level 1 introduced support for this pathway data type. Molecular interactions Molecular interactions typically present in proteomics and functional genomics databases involve mainly pairwise (e.g. from yeast two- hybrid) and set (e.g. from affinity purifications) interactions between proteins (protein-protein interactions), DNA (protein-DNA interactions) and, sometimes, other molecules. Description of experimental details, such as the experiment type, is important for this pathway data type. The molecular interactions are typically known at a low level of detail i.e. we only know the molecules, but often not the binding sites or other details. BioPAX Level 2 introduced support for this pathway data type, adapted from PSI-MI (ref). Signaling Pathways Signaling pathways mostly involve cascades of chemical modifications on protein and other molecule to implement information transfer across the cell. An important difference between these pathways and metabolic or proteomics data is the central role of molecular states, such as protein post-translational modifications, and generic entities, such as the class of Wnt genes. Improved support of this pathway data type was introduced in BioPAX Level 3. Gene Regulatory Networks Gene regulatory networks are composed of regulator-target relationships involved in regulation of gene expression, such as relationships between transcription factors and the genes they regulate. Support for this pathway data type was introduced in BioPAX Level 3. Genetic Interactions A genetic interaction takes place when the action of one gene is 7 modified by one or more genes that assort independently. Genetic interactions are used extensively to map pathways in biology. Support for this pathway data type was introduced in BioPAX Level 3. Key definitions BioPAX ontology: The abstract representation of biological pathway concepts and their relationships developed by the BioPAX workgroup. This is also called the object model. BioPAX format: The file format implementation of the BioPAX ontology that defines the syntax of representation for data. The BioPAX format is currently implemented only in OWL, but other implementations, such as XML Schema may be developed in the future. OWL: Web Ontology Language. OWL is an XML-based language defined by the World Wide Web Consortium (see http://www.w3.org/TR/owl-guide/). OWL can be used to both define an ontology and to store instance data that adheres to that ontology. It is intended that the BioPAX ontology is used to validate that a set of instances provided by a user follows all BioPAX defined syntax and semantic rules. It is recommended that the BioPAX ontology be imported from its location on the biopax.org website, although it may also be defined directly within an instance data document. BioPAX workgroup: Community group designing the BioPAX ontology and format. Status of this document This document is the final BioPAX Level 3 documentation. Comments may be sent to http://groups.google.com/group/biopax-discuss; This is an new mailing list as of 2010. Archives of the old mailing list are available here: http://www.biopax.org/mailman/private/biopax-discuss/ N.B., a subscription to the old list is still required to see these archives. Discussion of certain topics is also on the BioPAX wiki at http://biopaxwiki.org This document and the BioPAX Level 3 OWL file will be updated over time, based on community input. The documentation for the latest version of BioPAX Level 3 can always be found at: http://www.biopax.org/release/biopax-level3-documentation.pdf BioPAX Namespace The following URI is defined to be the BioPAX Level 3 namespace: 8 http://www.biopax.org/release/biopax-level3.owl# This namespace (URI) will always be used to refer to the most recently released version of BioPAX; different URIs will be used for major versions of BioPAX Levels. You will often find this namespace used in conjunction with a prefix, usually bp in owl documents. 9 Document Conventions In general, BioPAX property terms start with a lowercase character in the owl file, and in this document a property is italicized if appears as a single name in the text. In the property lists, object properties are italicized, data type properties are not. In general, class names start with an uppercase character in an owl file (it is now always true for the BioPAX Level3 classes), and they are highlighted in bold in the text of this document. Throughout this document the structure "object:ClassName" refers to the object properties of a class. The object properties diagrams display classes in boxes, and blue arcs between classes are labeled with the object property term. e.g. The class diagrams show the class with all its properties in a box linked with OWL Properties and objectProperties to other classes. e.g. 10

Description:
BioPAX – Biological Pathways Exchange Language Level 3, Release Version 1 Documentation BioPAX Release, July 2010. The BioPAX data exchange format is the joint work
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.