SlideShare a Scribd company logo
1 of 29
Introduction to the Gene Ontology Nic Weber LIS 590 Ontology Development in Natural Sciences 9/24/2010 All works referenced at first use,  all images are CC except where notes
Gene Ontology Why :  “The main opportunity lies in the possibility of automated transfer of biological annotations from the experimentally tractable model organisms to the less tractable organisms based on gene and protein sequence similarity.” Ashburner et al. p 25  *Breakthroughs in sequencing show large fraction of genes specifying core bio functions are shared by all eukaryotes (commonalities at cellular level) *Knowledge of role of shared protein in one organism can often transferred (less duplication of work / saved money) *Sequencing takes place at large scale, new discoveries constant (need for documenting change in controlled way) *Traditional Indexing efforts proved “unwieldy” in fruit fly and mouse sequencing Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics, 25(1), 25-9. doi: 10.1038/75556.
Gene Ontology Goals  Produce a dynamic, controlled vocabulary of that can be applied to eukaryotes. Provide formal structure to document and adopt change. Facilitate the  annotation of and dissemination of annotations for  genes and gene products For problematic reasons with hierarchal models (EC), indexing, and biological terminology like “functions”,  three ontologies were developed 1.Biological Process 2. Molecular Function 3. Cellular Component
Biological Process The biolgical objective to which the gene or gene product contributes. A process is accomplished via one or more ordered assemblies of molecular functions. *(This is an ordered process in that something goes in, something different comes out)
Molecular Function The biochemical activity (incuding binding ) of a gene product. Also applies to the capability that a gene product carries as a potential. Describes only what is done, not when or where.
Cellular Component The place in all cells where a gene product is active. These terms reflect our understanding of eukaryotic cell structure. (i.e. ‘ribosome’ or ‘nuclear membrane’)
Dependent vs. Independent Entities Biological Process: Dependent (“occurrents that require support from some substance in order to allow them to occur.” Smith et al. p4) 2. Molecular Function: Dependent (“which means entities which have a necessary reference to the sub- stances in which they inhere.” ibid)  3. Cellular Component: Independent
GO “Terms” Each “Ontology” defines terms representing gene product properties. Each GO term within the ontology contains the following:  unique alphanumeric identifier term name (which may be a word or string of words) 3.	 definitionwith cited sources  4.	namespace indicating the domain to which it belongs. 	*Terms may also have synonyms, which are classed as being exactly equivalent to the term name, broader, narrower, or related 4. references to equivalent concepts in other databases 5.	 commentson term meaning or usage.
Example GO Term  [Term] id: GO:0000010 name: trans-hexaprenyltranstransferase activity namespace: molecular_function def: "Catalysis of the reaction: all-trans-hexaprenyldiphosphate + isopentenyldiphosphate = diphosphate + all-trans-heptaprenyldiphosphate." [EC:2.5.1.30] subset: gosubset_prok synonym: "all-trans-heptaprenyl-diphosphatesynthase activity" EXACT [EC:2.5.1.30] synonym: "all-trans-hexaprenyl-diphosphate:isopentenyl-diphosphatehexaprenyltranstransferase activity" EXACT [EC:2.5.1.30] synonym: "heptaprenyldiphosphatesynthase activity" EXACT [EC:2.5.1.30] synonym: "heptaprenyl pyrophosphate synthase activity" EXACT [EC:2.5.1.30] synonym: "heptaprenyl pyrophosphate synthetase activity" EXACT [EC:2.5.1.30] xref: EC:2.5.1.30 xref: MetaCyc:TRANS-HEXAPRENYLTRANSTRANSFERASE-RXN is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
How Do GO Terms Work GO terms are connected into nodes of a network, thus the connections between its parents and children are known and form what are technically described as directed acyclic graphs. In a GO DAG- Terms are nodes and Relationships among them are edges.
What the F*@% is a Directed Acyclic Graph?   directed graph- a set A whose elements are called nodes or verticies  and a set E with connecting arcs or edges. So that G = (V,E)      Directed Acyclic Graph-  a directed graph with no directed cycles.  *Formed by a collection of vertices and directed edges *Each edge connecting one vertex to another, so that there is no way to start at some vertex A and follow a sequence of edges that eventually loops back to A again. *Important note : DAGs are distinct from hierarchies, in that each term in a DAG may have more than one parent term; these terms are generally  connected by ‘is-a’ and ‘part-of’ relations. Images via: commons.wikimedia.org
GO Directed Acyclic Graph Image via: commons.wikimedia.org
“Relationships”  Each term has a defined “relationship” to another term in the same ontology or a related ontology (in GO.) is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
Relationship types is_a …part_of Originally only two relationship types. is_a  = subsumption   ; part_of = patromonic inclusion New Types  In last year regulates, positively-regulates, and negatively regulates have been added to distinguish gene products that play a regulatory vs. direct role in a biological process
Problems… is_a Meant to facilitate “instance of ” In practice often used to model as “is a kind of” relationships between universals. The is_a relation in its intended meaning indicates a necessary relationship. That is, when we say “euka- ryotic cell is_a cell”, we mean that every eukaryotic cell is a cell. In practice, cases of non-necessary subsumption (i.e. transport, or cell growth)
Problems…part_of Explained usage = “can be a part of, not is always a part of” In GO,  part_of is used transitively  (e.g. where A = B; and B = C; then also A = C)  Can’t significantly represent an occurrent , meaning the notion of time is not accurately represented in these relations.
Part – Whole …. has_part Also introducedhas_part “…In GO, the relationship A has_part B means that A necessarily (always) has B as a part; i.e., if A exists then B also exists as a part of A. If A does not exist, B may or may not exist.  Example ‘cell envelope’ has_part ‘plasma membrane’”  From: Consortium, G. O. (2010). The Gene Ontology in 2010: extensions and refinements. Nucleic acids research, 38(Database issue), D331-5. doi: 10.1093/nar/gkp1018.
has_part modeled
Annotations (applied terms) Capture data about a gene or gene product, GO provides terms to do so. These annotations allow for genomic information to be uploaded and shared.  When a gene is annotated to a term, associations between the gene and terms’ parents are implicitly inferred.  Annotations are either generated by a curator or automatically through predictive methods (Rhee et al. p 509)
Annotation Structure Gene product identifier  Relevant GO term GO annotations have the following data: Reference of the annotation (e.g. a journal article) Evidence code denoting the type of evidence upon which the annotation is based Date of annotation  Creator of annotation
Evidence Codes Evidence codes are of four types: Experimental  Computational Indirectly derived from exp or comp unknown  95% of annotations are computational, this is problematic in that computational annotations increase coverage but also likely to be false positives
Annotation Qualifiers Colocallizes_with Contributes_to Not (most vital) – indicates a lack of properties.
Annotation in EMBL-EBI http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0006915#term=info 					(In case link fails, this is a quick view from GO) Gene product:  Actin, alpha cardiac muscle 1, UniProtKB:P68032 GO term: heart contraction ; GO:0060047 (biological process) Evidence code:   Inferred from Mutant Phenotype (IMP) Reference: PMID:17611253 Assigned by: UniProtKB, June 06, 2008
Universals and Particulars Universal: species E-coli; function: boost insulin Particulars: E-coli in this petri dish; function: boost insulin in subject X pancreas  “GO terms correspond, in philosophical terminology, to universals…and each universal  corresponding to the term Cell is instantiated by every actual cell.” Smith et al. p 3
Continuants vs. Occurrents Continuants: entities that continue to exist throughout time (cells, organisms, chromosomes) Preserve their identity, while undergoing variety of changes.    Occurrents (events, processes): Unfold through time.
But… “Biological process, molecular function and cellular components are all attributes of genes, gene products or gene-product groups.” p. 27 ..do we usually model attributes as ontologies? Are genes, gene products or gene product groups, “backbone” ontologies, OR Super Classes? If these aren’t Top Level Ontologies, what are they?
Smith et al. ; Yu’s “other” example  *Recall Yu’s Fourth Definition of Ontologies “The Gene Ontology, in spite of its name, is not an ontology as the latter term is commonly used either by information scientists or by philosophers.It is, as the GO Consortium puts it, a ‘controlled vocabulary’…. their efforts have been directed toward providing a practically useful framework for keeping track of the biological annotations that are applied to gene products.” Smith et al. p 1
Problems and Potential Solutions Each new term requires understanding of the whole. Therefore curators must be subject experts in order to perform meaningful enhancement.   Solution: make explicit the criteria used for discriminating subclassifications by introducing a decision-tree methodology into the construction of each hierarchy. ( Is this a good solution?)
Drawbacks to GO  It is unclear what kinds of reasoning are permissible on the basis of GO’s hierarchies.  The rationale of GO’ssubclassificationsis un- clear. The reasoning that went into current choices has not been preserved and thus cannot be explained to or re-examined by a third party.  No procedures are offered by which GO can be validated.  There are insufficient rules for determining how to recognize whether a given concept is or is not present in GO. The use of a mere string search pre- supposes that all concepts already have a single standardized representation, which is not the case.  Smith et al. p6

More Related Content

What's hot (20)

Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Homology
HomologyHomology
Homology
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Msa
MsaMsa
Msa
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
PAM matrices evolution
PAM matrices evolutionPAM matrices evolution
PAM matrices evolution
 
Cath
CathCath
Cath
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Clustal X
Clustal XClustal X
Clustal X
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentation
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Gene Expression Data Analysis
Gene Expression Data AnalysisGene Expression Data Analysis
Gene Expression Data Analysis
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 

Viewers also liked

Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsJoanne Luciano
 
Performance Evaluation Of Ontology And Fuzzybase Cbir
Performance Evaluation Of Ontology And Fuzzybase CbirPerformance Evaluation Of Ontology And Fuzzybase Cbir
Performance Evaluation Of Ontology And Fuzzybase Cbiracijjournal
 
Ontology Engineering: Ontology evaluation
Ontology Engineering: Ontology evaluationOntology Engineering: Ontology evaluation
Ontology Engineering: Ontology evaluationGuus Schreiber
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
The PPP & ESA teaching methods
The PPP & ESA teaching methodsThe PPP & ESA teaching methods
The PPP & ESA teaching methodsCamila Roldán
 
Learners' roles in the different teaching approaches and methods
Learners' roles in the different teaching approaches and methodsLearners' roles in the different teaching approaches and methods
Learners' roles in the different teaching approaches and methodsAbla BEN BELLAL
 
Approach, method and Technique in Language Learning and teaching
Approach, method and Technique in Language Learning and teachingApproach, method and Technique in Language Learning and teaching
Approach, method and Technique in Language Learning and teachingElih Sutisna Yanto
 
Language Teaching Approaches and Methods
Language Teaching Approaches and MethodsLanguage Teaching Approaches and Methods
Language Teaching Approaches and Methodsemma.a
 
The roles of teachers and learners
The roles of teachers and learnersThe roles of teachers and learners
The roles of teachers and learnersNurrul Chorida
 
Grammar Translation Method
Grammar Translation MethodGrammar Translation Method
Grammar Translation MethodMarisol Smith
 
Communicative approach presentation
Communicative approach presentationCommunicative approach presentation
Communicative approach presentationSara
 
Principles of Teaching:Different Methods and Approaches
Principles of Teaching:Different Methods and ApproachesPrinciples of Teaching:Different Methods and Approaches
Principles of Teaching:Different Methods and Approachesjustindoliente
 

Viewers also liked (19)

Luciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metricsLuciano pr 08-849_ontology_evaluation_methods_metrics
Luciano pr 08-849_ontology_evaluation_methods_metrics
 
Performance Evaluation Of Ontology And Fuzzybase Cbir
Performance Evaluation Of Ontology And Fuzzybase CbirPerformance Evaluation Of Ontology And Fuzzybase Cbir
Performance Evaluation Of Ontology And Fuzzybase Cbir
 
Ontology Engineering: Ontology evaluation
Ontology Engineering: Ontology evaluationOntology Engineering: Ontology evaluation
Ontology Engineering: Ontology evaluation
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Suggestopedia
SuggestopediaSuggestopedia
Suggestopedia
 
Suggestopedia
SuggestopediaSuggestopedia
Suggestopedia
 
The PPP & ESA teaching methods
The PPP & ESA teaching methodsThe PPP & ESA teaching methods
The PPP & ESA teaching methods
 
Learners' roles in the different teaching approaches and methods
Learners' roles in the different teaching approaches and methodsLearners' roles in the different teaching approaches and methods
Learners' roles in the different teaching approaches and methods
 
The monitor model
The monitor modelThe monitor model
The monitor model
 
Ontology
OntologyOntology
Ontology
 
Método audiolingual
Método audiolingualMétodo audiolingual
Método audiolingual
 
TOTAL PHYSICAL RESPONSE
TOTAL PHYSICAL RESPONSETOTAL PHYSICAL RESPONSE
TOTAL PHYSICAL RESPONSE
 
Approach, method and Technique in Language Learning and teaching
Approach, method and Technique in Language Learning and teachingApproach, method and Technique in Language Learning and teaching
Approach, method and Technique in Language Learning and teaching
 
Language Teaching Approaches and Methods
Language Teaching Approaches and MethodsLanguage Teaching Approaches and Methods
Language Teaching Approaches and Methods
 
The roles of teachers and learners
The roles of teachers and learnersThe roles of teachers and learners
The roles of teachers and learners
 
Grammar Translation Method
Grammar Translation MethodGrammar Translation Method
Grammar Translation Method
 
Communicative approach presentation
Communicative approach presentationCommunicative approach presentation
Communicative approach presentation
 
Methods, approaches and techniques of teaching english
Methods, approaches and techniques of teaching englishMethods, approaches and techniques of teaching english
Methods, approaches and techniques of teaching english
 
Principles of Teaching:Different Methods and Approaches
Principles of Teaching:Different Methods and ApproachesPrinciples of Teaching:Different Methods and Approaches
Principles of Teaching:Different Methods and Approaches
 

Similar to Light Intro to the Gene Ontology

Ontology - and Reloaded and Revolutions
Ontology - and Reloaded and RevolutionsOntology - and Reloaded and Revolutions
Ontology - and Reloaded and RevolutionsJie Bao
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...CSCJournals
 
Reasoning Requirements for Bioscience
Reasoning Requirements for BioscienceReasoning Requirements for Bioscience
Reasoning Requirements for BioscienceEmanuele Della Valle
 
The Silence Eclipsing Introns
The Silence Eclipsing IntronsThe Silence Eclipsing Introns
The Silence Eclipsing IntronsJohnJulie1
 
Epigenetics /certified fixed orthodontic courses by Indian dental academy
Epigenetics /certified fixed orthodontic courses by Indian dental academy Epigenetics /certified fixed orthodontic courses by Indian dental academy
Epigenetics /certified fixed orthodontic courses by Indian dental academy Indian dental academy
 
Epigenetics /certified fixed orthodontic courses by Indian dental academy
Epigenetics /certified fixed orthodontic courses by Indian dental academy Epigenetics /certified fixed orthodontic courses by Indian dental academy
Epigenetics /certified fixed orthodontic courses by Indian dental academy Indian dental academy
 
Basic Formal Ontology: A Common Standard
Basic Formal Ontology: A Common StandardBasic Formal Ontology: A Common Standard
Basic Formal Ontology: A Common StandardBarry Smith
 
generic optimization techniques lecture slides
generic optimization techniques  lecture slidesgeneric optimization techniques  lecture slides
generic optimization techniques lecture slidesSardarHamidullah
 
adaptation and selection
adaptation and selectionadaptation and selection
adaptation and selectionAftab Badshah
 
Essential Biology 6.6 & 11.1 Reproduction Core & AHL
Essential Biology 6.6 & 11.1 Reproduction Core & AHLEssential Biology 6.6 & 11.1 Reproduction Core & AHL
Essential Biology 6.6 & 11.1 Reproduction Core & AHLStephen Taylor
 
Basic Formal Ontology (BFO) and Disease
 Basic Formal Ontology (BFO) and Disease Basic Formal Ontology (BFO) and Disease
Basic Formal Ontology (BFO) and DiseaseBarry Smith
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
Introduction to biocomputing
 Introduction to biocomputing Introduction to biocomputing
Introduction to biocomputingNatalio Krasnogor
 
Organelles In Animal Cells Essay
Organelles In Animal Cells EssayOrganelles In Animal Cells Essay
Organelles In Animal Cells EssayJennifer Letterman
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 

Similar to Light Intro to the Gene Ontology (20)

Ontology - and Reloaded and Revolutions
Ontology - and Reloaded and RevolutionsOntology - and Reloaded and Revolutions
Ontology - and Reloaded and Revolutions
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
 
Interactomeee
InteractomeeeInteractomeee
Interactomeee
 
Reasoning Requirements for Bioscience
Reasoning Requirements for BioscienceReasoning Requirements for Bioscience
Reasoning Requirements for Bioscience
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
The Silence Eclipsing Introns
The Silence Eclipsing IntronsThe Silence Eclipsing Introns
The Silence Eclipsing Introns
 
The Silence Eclipsing Introns
The Silence Eclipsing IntronsThe Silence Eclipsing Introns
The Silence Eclipsing Introns
 
Epigenetics /certified fixed orthodontic courses by Indian dental academy
Epigenetics /certified fixed orthodontic courses by Indian dental academy Epigenetics /certified fixed orthodontic courses by Indian dental academy
Epigenetics /certified fixed orthodontic courses by Indian dental academy
 
Epigenetics /certified fixed orthodontic courses by Indian dental academy
Epigenetics /certified fixed orthodontic courses by Indian dental academy Epigenetics /certified fixed orthodontic courses by Indian dental academy
Epigenetics /certified fixed orthodontic courses by Indian dental academy
 
OBO Foundry
OBO FoundryOBO Foundry
OBO Foundry
 
Basic Formal Ontology: A Common Standard
Basic Formal Ontology: A Common StandardBasic Formal Ontology: A Common Standard
Basic Formal Ontology: A Common Standard
 
generic optimization techniques lecture slides
generic optimization techniques  lecture slidesgeneric optimization techniques  lecture slides
generic optimization techniques lecture slides
 
adaptation and selection
adaptation and selectionadaptation and selection
adaptation and selection
 
Essential Biology 6.6 & 11.1 Reproduction Core & AHL
Essential Biology 6.6 & 11.1 Reproduction Core & AHLEssential Biology 6.6 & 11.1 Reproduction Core & AHL
Essential Biology 6.6 & 11.1 Reproduction Core & AHL
 
Basic Formal Ontology (BFO) and Disease
 Basic Formal Ontology (BFO) and Disease Basic Formal Ontology (BFO) and Disease
Basic Formal Ontology (BFO) and Disease
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Introduction to biocomputing
 Introduction to biocomputing Introduction to biocomputing
Introduction to biocomputing
 
Organelles In Animal Cells Essay
Organelles In Animal Cells EssayOrganelles In Animal Cells Essay
Organelles In Animal Cells Essay
 
David
DavidDavid
David
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 

Light Intro to the Gene Ontology

  • 1. Introduction to the Gene Ontology Nic Weber LIS 590 Ontology Development in Natural Sciences 9/24/2010 All works referenced at first use, all images are CC except where notes
  • 2. Gene Ontology Why : “The main opportunity lies in the possibility of automated transfer of biological annotations from the experimentally tractable model organisms to the less tractable organisms based on gene and protein sequence similarity.” Ashburner et al. p 25 *Breakthroughs in sequencing show large fraction of genes specifying core bio functions are shared by all eukaryotes (commonalities at cellular level) *Knowledge of role of shared protein in one organism can often transferred (less duplication of work / saved money) *Sequencing takes place at large scale, new discoveries constant (need for documenting change in controlled way) *Traditional Indexing efforts proved “unwieldy” in fruit fly and mouse sequencing Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics, 25(1), 25-9. doi: 10.1038/75556.
  • 3. Gene Ontology Goals Produce a dynamic, controlled vocabulary of that can be applied to eukaryotes. Provide formal structure to document and adopt change. Facilitate the annotation of and dissemination of annotations for genes and gene products For problematic reasons with hierarchal models (EC), indexing, and biological terminology like “functions”, three ontologies were developed 1.Biological Process 2. Molecular Function 3. Cellular Component
  • 4. Biological Process The biolgical objective to which the gene or gene product contributes. A process is accomplished via one or more ordered assemblies of molecular functions. *(This is an ordered process in that something goes in, something different comes out)
  • 5. Molecular Function The biochemical activity (incuding binding ) of a gene product. Also applies to the capability that a gene product carries as a potential. Describes only what is done, not when or where.
  • 6. Cellular Component The place in all cells where a gene product is active. These terms reflect our understanding of eukaryotic cell structure. (i.e. ‘ribosome’ or ‘nuclear membrane’)
  • 7. Dependent vs. Independent Entities Biological Process: Dependent (“occurrents that require support from some substance in order to allow them to occur.” Smith et al. p4) 2. Molecular Function: Dependent (“which means entities which have a necessary reference to the sub- stances in which they inhere.” ibid) 3. Cellular Component: Independent
  • 8. GO “Terms” Each “Ontology” defines terms representing gene product properties. Each GO term within the ontology contains the following: unique alphanumeric identifier term name (which may be a word or string of words) 3. definitionwith cited sources 4. namespace indicating the domain to which it belongs. *Terms may also have synonyms, which are classed as being exactly equivalent to the term name, broader, narrower, or related 4. references to equivalent concepts in other databases 5. commentson term meaning or usage.
  • 9. Example GO Term [Term] id: GO:0000010 name: trans-hexaprenyltranstransferase activity namespace: molecular_function def: "Catalysis of the reaction: all-trans-hexaprenyldiphosphate + isopentenyldiphosphate = diphosphate + all-trans-heptaprenyldiphosphate." [EC:2.5.1.30] subset: gosubset_prok synonym: "all-trans-heptaprenyl-diphosphatesynthase activity" EXACT [EC:2.5.1.30] synonym: "all-trans-hexaprenyl-diphosphate:isopentenyl-diphosphatehexaprenyltranstransferase activity" EXACT [EC:2.5.1.30] synonym: "heptaprenyldiphosphatesynthase activity" EXACT [EC:2.5.1.30] synonym: "heptaprenyl pyrophosphate synthase activity" EXACT [EC:2.5.1.30] synonym: "heptaprenyl pyrophosphate synthetase activity" EXACT [EC:2.5.1.30] xref: EC:2.5.1.30 xref: MetaCyc:TRANS-HEXAPRENYLTRANSTRANSFERASE-RXN is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
  • 10. How Do GO Terms Work GO terms are connected into nodes of a network, thus the connections between its parents and children are known and form what are technically described as directed acyclic graphs. In a GO DAG- Terms are nodes and Relationships among them are edges.
  • 11. What the F*@% is a Directed Acyclic Graph? directed graph- a set A whose elements are called nodes or verticies and a set E with connecting arcs or edges. So that G = (V,E) Directed Acyclic Graph- a directed graph with no directed cycles. *Formed by a collection of vertices and directed edges *Each edge connecting one vertex to another, so that there is no way to start at some vertex A and follow a sequence of edges that eventually loops back to A again. *Important note : DAGs are distinct from hierarchies, in that each term in a DAG may have more than one parent term; these terms are generally connected by ‘is-a’ and ‘part-of’ relations. Images via: commons.wikimedia.org
  • 12. GO Directed Acyclic Graph Image via: commons.wikimedia.org
  • 13. “Relationships” Each term has a defined “relationship” to another term in the same ontology or a related ontology (in GO.) is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
  • 14. Relationship types is_a …part_of Originally only two relationship types. is_a = subsumption ; part_of = patromonic inclusion New Types In last year regulates, positively-regulates, and negatively regulates have been added to distinguish gene products that play a regulatory vs. direct role in a biological process
  • 15. Problems… is_a Meant to facilitate “instance of ” In practice often used to model as “is a kind of” relationships between universals. The is_a relation in its intended meaning indicates a necessary relationship. That is, when we say “euka- ryotic cell is_a cell”, we mean that every eukaryotic cell is a cell. In practice, cases of non-necessary subsumption (i.e. transport, or cell growth)
  • 16. Problems…part_of Explained usage = “can be a part of, not is always a part of” In GO, part_of is used transitively (e.g. where A = B; and B = C; then also A = C) Can’t significantly represent an occurrent , meaning the notion of time is not accurately represented in these relations.
  • 17. Part – Whole …. has_part Also introducedhas_part “…In GO, the relationship A has_part B means that A necessarily (always) has B as a part; i.e., if A exists then B also exists as a part of A. If A does not exist, B may or may not exist. Example ‘cell envelope’ has_part ‘plasma membrane’” From: Consortium, G. O. (2010). The Gene Ontology in 2010: extensions and refinements. Nucleic acids research, 38(Database issue), D331-5. doi: 10.1093/nar/gkp1018.
  • 19. Annotations (applied terms) Capture data about a gene or gene product, GO provides terms to do so. These annotations allow for genomic information to be uploaded and shared. When a gene is annotated to a term, associations between the gene and terms’ parents are implicitly inferred. Annotations are either generated by a curator or automatically through predictive methods (Rhee et al. p 509)
  • 20. Annotation Structure Gene product identifier Relevant GO term GO annotations have the following data: Reference of the annotation (e.g. a journal article) Evidence code denoting the type of evidence upon which the annotation is based Date of annotation Creator of annotation
  • 21. Evidence Codes Evidence codes are of four types: Experimental Computational Indirectly derived from exp or comp unknown 95% of annotations are computational, this is problematic in that computational annotations increase coverage but also likely to be false positives
  • 22. Annotation Qualifiers Colocallizes_with Contributes_to Not (most vital) – indicates a lack of properties.
  • 23. Annotation in EMBL-EBI http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0006915#term=info (In case link fails, this is a quick view from GO) Gene product: Actin, alpha cardiac muscle 1, UniProtKB:P68032 GO term: heart contraction ; GO:0060047 (biological process) Evidence code: Inferred from Mutant Phenotype (IMP) Reference: PMID:17611253 Assigned by: UniProtKB, June 06, 2008
  • 24. Universals and Particulars Universal: species E-coli; function: boost insulin Particulars: E-coli in this petri dish; function: boost insulin in subject X pancreas “GO terms correspond, in philosophical terminology, to universals…and each universal corresponding to the term Cell is instantiated by every actual cell.” Smith et al. p 3
  • 25. Continuants vs. Occurrents Continuants: entities that continue to exist throughout time (cells, organisms, chromosomes) Preserve their identity, while undergoing variety of changes. Occurrents (events, processes): Unfold through time.
  • 26. But… “Biological process, molecular function and cellular components are all attributes of genes, gene products or gene-product groups.” p. 27 ..do we usually model attributes as ontologies? Are genes, gene products or gene product groups, “backbone” ontologies, OR Super Classes? If these aren’t Top Level Ontologies, what are they?
  • 27. Smith et al. ; Yu’s “other” example *Recall Yu’s Fourth Definition of Ontologies “The Gene Ontology, in spite of its name, is not an ontology as the latter term is commonly used either by information scientists or by philosophers.It is, as the GO Consortium puts it, a ‘controlled vocabulary’…. their efforts have been directed toward providing a practically useful framework for keeping track of the biological annotations that are applied to gene products.” Smith et al. p 1
  • 28. Problems and Potential Solutions Each new term requires understanding of the whole. Therefore curators must be subject experts in order to perform meaningful enhancement. Solution: make explicit the criteria used for discriminating subclassifications by introducing a decision-tree methodology into the construction of each hierarchy. ( Is this a good solution?)
  • 29. Drawbacks to GO It is unclear what kinds of reasoning are permissible on the basis of GO’s hierarchies. The rationale of GO’ssubclassificationsis un- clear. The reasoning that went into current choices has not been preserved and thus cannot be explained to or re-examined by a third party. No procedures are offered by which GO can be validated. There are insufficient rules for determining how to recognize whether a given concept is or is not present in GO. The use of a mere string search pre- supposes that all concepts already have a single standardized representation, which is not the case. Smith et al. p6