Ora Lassila and Amit Sheth, "Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability", Invited Talk at ONC-HHS Invitational Workshop on Next Generation Interoperability for Health, Washington DC, January 19-20, 2011.
Global Lehigh Strategic Initiatives (without descriptions)
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability
1.
2. Ora Lassila Amit Sheth
• Principal Architect (Nokia • LexisNexis Ohio Eminent
Mobile Solutions); also an Scholar, Director, Ohio Center of
advisor to Nokia’s top mgmt Excellence in Knowledge-
• Elected member of W3C’s enabled Computing (Kno.e.sis),
Advisory Board since 1998 Wright State University
• Earlier: Research Fellow • Educator, researcher,
(Nokia Research), W3C Fellow entrepreneur – 2 companies,
(MIT), Project Manager (CMU), products, deployed apps, W3C
entrepreneur, etc. and biomedical community
• Ph.D from Helsinki University standards
of Technology (CS) • Earlier: UGA, Telcordia, Unisys,
• http://www.lassila.org/ Honeywell
• http://knoesis.org/amit
3. • Semantic Web
Ora
• some background
• Semantic Web in use
Amit
• examples of applications in
traditional clinical care to
translational medicine
• Challenges (and promise)
Ora (technical)
• what makes this difficult
Amit (health)
• why do we want to pursue it
anyway
4.
5. • Often characterized as the “next
generation of the World Wide Web”
• Web content amenable to automation
• (current content intended for humans…)
6. • Often characterized as the “next
generation of the World Wide Web”
• Web content amenable to automation
• (current content intended for humans…)
• In reality, the Semantic Web is a vision of the
future of (personal) computing
• machines working on behalf of their human users
• more autonomy, handling of unanticipated situations
• Heavy reliance of knowledge representation &
reasoning
• also multi-agent systems, other AI-based technologies
7. • At the core, the Semantic Web is about
• describing things (objects, concepts, services, …)
• querying the descriptions
• reasoning about the descriptions
• As such, it is knowledge representation
• for the Web
• (or KR using standardized Web technologies)
• (in comparison, the “old Web” was really about
documents and finding them…)
8. • Motivated by the need for automation
• automation requires interoperability (via standards)
• heavy process, high up-front investment
• (alternative: hand-crafted but “brittle” programs…)
• Interoperability achieved by exposing meaning
• accessible semantics
• note: interoperability of any two systems can be
achieved via engineering, but this does not scale
• Automation → autonomy
• prevailing paradigm: agent-based systems
• implies reasoning, planning, interoperable
representations of knowledge
9. • Contrary to “Web 2.0”, Semantic Web aims at
achieving many things “ad hoc”
• e.g., ad hoc mash-ups by non-computer savvy people
• Shared (and accessible) semantics is the key to
interoperability
• Semantic Web introduces a fundamentally
different approach to standardization
• standardize how to say things and not what to say
• ontological techniques allow “delayed semantic
commitment”
10. • Semantic Web is built in a layered manner
• Not everybody needs all the layers
…
Queries: SPARQL, Rules: RIF
Semantic Web
Rich ontologies: OWL
Simple data models & taxonomies: RDF Schema
Uniform metamodel: RDF + URI
Encoding structure: XML
Encoding characters : Unicode
11. • Achieve for data what Web did to documents
• Relationship with the original Semantic Web
vision: no AI, no agents, no autonomy
• Interoperability is still very important
• interoperability of formats
• interoperability of semantics
• Enables interchange of large data sets
• (thus very useful in, say, collaborative research)
• Semantic Web vision is largely predicated on
the availability of data
• Linked Data is a movement that gets us there
12. Tech assimilated in life
Web of Sensors, Devices/IoT
Situations, - 40 billion sensors, 5 billion mobile connections
2007
Events Web 3.0
Objects Web of people
- social networks, user-created casual content
Patterns Web of resources Web 2.0
- data, service, data, mashups
Keywords
Web of databases
1997 - dynamically generated pages
- web query interfaces
Web of pages
- text, manually created links Web 1.0
- extensive navigation
13.
14. ...needs a connection Hypothesis Validation
Experiment design
Predictions
Personalized medicine
Biomedical Informatics
Etiology Genome More advanced capabilities for
Pathogenesis Transcriptome
Clinical findings
search,
Proteome
Diagnosis Genbank Metabolome integration,
Pubmed
Prognosis Physiome analysis,
Treatment ...ome linking to new insights
Uniprot
Clinical and discoveries!
Trials.gov
Medical Informatics Bioinformatics
15. text
User-contributed
Scientific Health NCBI
Content (Informal) Clinical Data Laboratory
Literature Information Public Datasets
Experts: Data
Services GeneRifs
WikiGene
PubMed Elsevier Genome,
Lab tests,
300 Documents Consumer: Protein DBs Personal
iConsult RTPCR,
Published Online Blogs new sequences health history
daily Mass spec
each day Social Networks
Search, browsing, complex query, integration, workflow,
analysis, hypothesis validation, decision support.
16. • W3C Semantic Web Health Care & Life
Sciences Interest Group:
http://www.w3.org/2001/sw/hcls/
• Clinical Observations Interoperability: EMR +
Clinical Trials:
http://esw.w3.org/HCLS/
ClinicalObservationsInteroperability
• National Center for Biomedical Ontologies:
http://bioportal.bioontology.org/
17. • Status: In use continuously since 01/2006
• Where: Athens Heart Center & its partners and
labs
• What: Use of semantic Web technologies for
clinical decision support
18. Examples demonstrating use of Semantic Web for Health Care
and Life Sciences research projects and operational clinical or
research applications
22. • Status: Completed research
• Where: NIH
• What: queries across integrated data sources
• Enriching data with ontologies for integration, querying,
and automation
• Ontologies beyond vocabularies: the power of
relationships
23. Gene name
Glycosyltransferase
Interactions GO
gene
Sequence PubMed
OMIM
Congenital muscular dystrophy
Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
http://knoesis.org/library/resource.php?id=00014
25. SELECT DISTINCT ?t ?g ?d {
?t is_a GO:0016757 .
glycosyltransferase
?g has molecular function ?t . GO:0016757
?g has_associated_phenotype ?b2 .
?b2 has_textual_description ?d . isa
FILTER (?d, “muscular distrophy”, “i”) . GO:0008194“congenital”,GO:0016758
FILTER (?d, “i”) }
acetylglucosaminyl-
GO:0008375
transferase
has_molecular_function acetylglucosaminyl-
GO:0008375
transferase
LARGE EG:9215
Muscular dystrophy,
MIM:608840
has_associated_phenotype congenital, type 1D
From medinfo paper.
Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
26. • Status: Completed research
• Where: NIH
• What: Understanding the genetic basis of
nicotine dependence. Integrate gene and
pathway information and show how three complex
biological queries can be answered by the
integrated knowledge base.
• How: Semantic Web technologies (especially RDF,
OWL, and SPARQL) support information
integration and make it easy to create semantic
mashups (semantically integrated resources).
27. • NIDA study on nicotine dependency
• List of candidate genes in humans
• Analysis objectives include:
o Find interactions between genes
o Identification of active genes – maximum number of
pathways
o Identification of genes based on anatomical locations
• Requires integration of genome and biological
pathway information
28. Genome and pathway information
integration
KEGG
Reactome
• pathway
• pathway • protein HumanCyc
• protein • pmid • pathway
• pmid
• protein
• pmid
Entrez Gene
• GO ID
• HomoloGene ID
GeneOntology HomoloGene
http://knoesis.org/library/resource.php?id=00221
32. • Status: Research prototype – in regular lab use
• Where: Center for Tropical and Emerging
Global Diseases (CTEGD), UGA
• What: Semantics and Services Enabled
Problem Solving Environment for Trypanosoma
cruzi
• Who: Kno.e.sis, UGA, NCBO
33. Ohio Center of Excellence in Knowledge-enabled Computing
(Kno.e.sis), Wright State University
Tarleton Research Group, Center for Tropical and Emerging
Global Diseases(CTEGD), University of Georgia
Large Scale Distributed Information Systems (LSDIS).
University of Georgia
National Center for Biological Ontologies (NCBO),
Stanford University
The Wellcome Trust Sanger Institute, Cambridge, UK
The Oswaldo Cruz Institute (Fiocruz), Brazil
34. • T. cruzi is a protozoan parasite
that causes Chagas Disease or
American trypanosomiasis
• Chagas disease is the leading
cause of death in Latin America
where around 18 million people
are infected with this parasite T. Brucei surrounded by red blood cells
in a smear of infected blood.
• Related parasites include, (Copyright: Jürgen Berger and Dr. Peter
Overath, Max Planck Institute for
Trypanosoma brucei and Developmental Biology, Tübengen)
Leishmania major that causes
African trypanosomiasis and
leishmaniasis, respectively.
35. Trykipedia - a Wiki-based platform for collaboration of Parasite Research Community
36. • Data Resources
Internal lab data (from Tarleton Research Group)
Gene Knockout, Strain Creation, Microarray, and Proteome
External databases (TriTrypDB, ProtozoaDB, Drug Bank, etc. )
• Ontologies
Parasite Lifecycle Ontology (PLO)
Parasite Experiment Ontology (PEO)
• PKR supports complex biological queries related to T.cruzi
drugs, vaccination, or gene knockout targets; for example,
Find all genes with proteomic expression in mammalian lifecycle stage with GPI anchor
or signal peptide predictions.
Find genes annotated as potential vaccine candidates.
Find all genes with proteomic expression evidence in the mammalian host lifecycle
stages for T. cruzi
37. Gene
Name
Sequence
Extraction
Gene Knockout and Strain Creation*
Related Queries from Biologists
Drug 3‘ & 5’
Resistant Region
Plasmid
Gene Name
Plasmid
Construction
• List all groups in the lab that used
T.Cruzi
Knockout
Construct
a Target Region Plasmid?
sample
Plasmid
Transfection
• Which ?researcher created a new
strain of the parasite (with ID =
Transfecte
d Sample
66)?
• An experiment was not successful
Drug
Selection
Cloned Sample
Selected
Sample
– has this experiment been
Cell
Cloning
conducted earlier? What were the
Cloned
results?
Sample
*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B.
Weatherly and Flora Logan, Tarleton Lab, University of Georgia
38. Complex queries can also include:
- on-the-fly Web services execution to retrieve additional data
- inference rules to make implicit knowledge explicit
39. 1. Describe drug user’s knowledge, attitudes, and
behaviors related to illicit use of OxyContin®
2. Describe temporal patterns of non-medical use of
OxyContin® tablets as discussed on Web-based
forums
3. Collaboration between Kno.e.sis and CITAR (Center
for Interventions, Treatment and Addictions Research)
at Wright State Univ.
40.
41. • Volatile nature of execution environments
• May have an impact on multiple activities/ tasks in the
workflow
• HF Pathway
• New information about diseases, drugs becomes
available
• Affects treatment plans, drug-drug interactions
• Need to incorporate the new knowledge into
execution
• capture the constraints and relationships between
different tasks activities
42. New knowledge about
treatment found during
the execution of the pathway
New knowledge about drugs,
drug drug interactions
43.
44. Diabetes mellitus adversely affects the outcomes in patients with myocardial infarction (MI), due in part to the exacerbation of left
ventricular (LV) remodeling. Although angiotensin II type 1 receptor blocker (ARB) has been demonstrated to be effective in the
treatment of heart failure, information about the potential benefits of ARB on advanced LV failure associated with diabetes is lacking.
To induce diabetes, male mice were injected intraperitoneally with streptozotocin (200 mg/kg). At 2 weeks, anterior MI was created by
ligating the left coronary artery. These animals received treatment with olmesartan (0.1 mg/kg/day; n = 50) or vehicle (n = 51) for 4
weeks. Diabetes worsened the survival and exaggerated echocardiographic LV dilatation and dysfunction in MI. Treatment of diabetic
MI mice with olmesartan significantly improved the survival rate (42% versus 27%, P < 0.05) without affecting blood glucose, arterial
blood pressure, or infarct size. It also attenuated LV dysfunction in diabetic MI. Likewise, olmesartan attenuated myocyte hypertrophy,
interstitial fibrosis, and the number of apoptotic cells in the noninfarcted LV from diabetic MI. Post-MI LV remodeling and failure in
diabetes were ameliorated by ARB, providing further evidence that angiotensin II plays a pivotal role in the exacerbated heart failure
after diabetic MI.
possibly
ARB plays role in
heart failure
Angiotensin II type 1 receptor blocker attenuates exacerbated left ventricular remodeling and failure in diabetes-associated myocardial infarction.,
Matsusaka H, et. al.
45. Disease
possibly
plays role in
Angiotension
Receptor Blocker
(ARB)
Ontology: A Framework for Schema-Driven Relationship Discovery from Unstructured Text, Ramakrishnan, et. al., ISWC 2006, LNCS 4273, pp. 583-596
46. • Matching medical requirements with availability of
medical resources (Mumbai, India)
• Project HERO Helpline for Emergency Response Operations
• For patients seeking for immediate medical help
• Medical awareness in rural India
• mMitra, info. service during pregnancy and childhood
emergency
Medical
Medical
Information
Emergency
Resourc
bridge
es
47.
48. • Any specific problem (typically) has a specific
solution that does not require Semantic Web
technologies
• Q: Why then is the Semantic Web attractive?
A: For future-proofing
Semantic Web can be a solution to
those problems and situations that
we are yet to define
49. • Cultural resistance (“this smacks of AI…”)
• Unfamiliar technology (e.g., reasoning)
• Often implies complex representational models
• procedural programs vs. declarative data
• Unclear business models
• Also, actual technical challenges
• scalability of query processing
• complexity (and thus scalability) of reasoning
• scalability of access control
• …
50. • (merely an observation of what you may
encounter…)
Source: Mindlab, U of Maryland
• What makes Semantic
Web attractive and worth pursuing is…
51. an Dictionary)
(Source: Oxford Americ
• Serendipity in interoperability
• can we interoperate with systems, devices and/or
services we knew nothing about at design time?
• Serendipity in information reuse
• with accessible semantics, this becomes easier…
• Serendipity in information integration
• can information from independent sources be combined?
• even simple forms of reasoning can help
52. • Semantic Web was designed to
• accommodate different points of view
• be flexible about what it can express (not preferential
towards any particular domain or application)
• Combining information in new ways
• we cannot anticipate all the possible ways in which
information is used, combined
⇒ there is value to merely making information (data)
available
• using Semantic Web technologies lowers the threshold
for “serendipitous reuse”
53. Insurance, Clinical Care
Financial Aspects Follow up,
Lifestyle
Genetic Tests…
Profiles Social Media
Clinical Trials
54. NIH FDA CDC
(Research)
Universities, Pharmaceutical
AMCs Companies
Patients, Public
CROs Hospitals Doctors
Payors From FDA, CDC
Translation 1: Genomic Research and Clinical Practice
Translation 2: Clinical Research and Clinical Practice
Slide by: Vipul Kashyap
55. • For each component in 360-degree health care,
we have data, processes, knowledge and
experience. Interoperability solutions need to
encompass all these!
• Possibly largest growth in data will be in sensors (eg
Body Area Networks, Biosensors) and social content.
Extensive use of mobile phones.
Credit: ece.virginia.edu
56. • Semantic Web is an “interoperability
technology”
• Linked Data is a step in the right direction
• Many examples of viable usage of Semantic
Web technologies
• Words of warning about deployment
• For health, Semantic Web provides the needed
interoperability, and can accommodate all
necessary “points of view”
• Significant research challenges remain as
Health presents the most complex domain
57. • Researchers: Satya Sahoo, Dr. Priti Parikh,
Pablo Mendes, Cartic Ramakrishnan, and
Kno.e.sis team
• Collaborators: Athens Heart Center (Dr.
Agrawal), NLM (Olivier Bodenreider), CCRC-
UGA (Will York), UGA (Tarleton),
Bioinformatics-WSU (Raymer)
• Funding: NIH/NCRR, NIH/NLBHI (R01), NSF
http://knoesis.org
58. 1. A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher,
Active Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006.
2. Satya Sahoo, Olivier Bodenreider, Kelly Zeng, and Amit Sheth,
An Experiment in Integrating Large Biomedical Knowledge Resources with RDF: Application to
Associating Genotype and Phenotype Information
WWW2007 HCLS Workshop, May 2007.
3. Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and Amit Sheth,
From "Glycosyltransferase to Congenital Muscular Dystrophy: Integrating Knowledge from NCBI
Entrez Gene and the Gene Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp.
1260-4
4. Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth,
An ontology-driven semantic mash-up of gene and biological pathway information: Application to
the domain of nicotine dependence, Journal of Biomedical Informatics, 2008.
5. Cartic Ramakrishnan, Krzysztof J. Kochut, and Amit Sheth, "
A Framework for Schema-Driven Relationship Discovery from Unstructured Text", Intl Semantic
Web Conference, 2006, pp. 583-596
6. Satya S. Sahoo, Christopher Thomas, Amit Sheth, William S. York, and Samir Tartir, "
Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th
International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006.
7. Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth and Krishnaprasad
Thirunarayan, '
Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data.’
SSDBM, Heidelberg, Germany 2010.
• Papers: http://knoesis.org/library
• Demos at: http://knoesis.wright.edu/library/demos/