Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
1. Newton's ideas and methods are
preserved forever:
how about yours?
Marco Roos, Kristina Hettne, Jun Zhao, Mark Thompson
Cloud and Workflows for Reproducible
Bioinformatics
Shenzhen, December 19, 2012
3. Reproduced workflows
Mass
Power & Mass
Force
Web Service
Acceleration
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 3
4. Case study
Bioinformatics analysis of Metabolic Syndrome
Kristina Hettne, Harish Dharuri
Genome Wide Association
Studies
What is the genetic basis for
the diseases associated with
Metabolic Syndrome?
6. Reproducible Science?
What is the digital
equivalent?
Is it equally good?
Can we do better?
- or worse?
GroundHog
DB
Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,
http://biosemantics.org , myExperiment.org/workflows/2197
7. Reproducible Science
What is our incentive?
Nobility Greater Good
Good Reproducible Science Serve the public
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 7
8. Reproducible Science
What is our incentive?
I’ll be the first
in Nature
Fame and Glory
Getting on with it...
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 8
9. CHALLENGE
Stimulate preservation and
reproducibility while speeding up
the research process
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 9
10. Enhance the research cycle
What slows us down?
Research
Question
Find Methods Get Understand Format
and Data, + Methods Methods and (Align)
their Owners and Data Data Data
Design
Interpret
the Compute Publish
Results
Analysis
10
11. Bottlenecks
• Loosing track of what you did
• Messy storage
• Preparing material for a publication
• Understanding the computational procedure
• Communication with (non-technical) colleagues
• Keeping tools working
• Getting credit for digital results outside of
traditional publications
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 11
12. Getting on with workflows
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 12
13. Monolithic Tool →
Web Services → Workflows → (Web) Tool
Example: Anni 2.0 → Anni workflows
AnniWF
http://workflow.biosemantics.org/t2web/workflow/2725
14. Digital Repository
myExperiment.org
The recipes store
• Find workflows
• Share workflows & files
• Find people
• Build communities
• Publish packages
• Tag workflows
• Score, rate, comment
15. Instructions for workflow authors
10 Best Practices for creating workflows
1. Make a sketch workflow
2. Use modules
3. Think about the output
4. Provide example inputs and outputs
5. Annotate
6. Test execution from outside local environment
7. Choose services carefully
8. Reuse existing workflows
9. Advertise
10. Maintain
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 15
16. Reproducible Science
Is a workflow sufficient?
Useful Preservation
=
Understandable Objects
Reproduce, Reuse, Repurpose, Repair,
...
What is this
doing?
Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,
http://biosemantics.org , myExperiment.org/workflows/2197
17. Useful preservation 1
myExperiment Packs
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 17
18. Useful preservation
Research Object Model
Research Object Model
Aggregation and Annotation Model for Digital Methods
http://wf4ever.github.com/ro/
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 18
19. Research Object (RO) Model
RO = ORE + AO + vocabularies
Object Re-use and Exchange (OAI-ORE)
Describes aggregations of resources:
data, metadata, papers, etc.
Annotation Ontology (AO)
Associates RDF metadata descriptions with resources
Generic and domain-specific vocabularies
Used in annotation bodies to provide information about
resources (types, dependencies, descriptions, etc.)
Builds on RDF, leading to RDF as a natural implementation choice
Model specification: http://wf4ever.github.com/ro/
21. Research Object: “Hello World”
https://github.com/wf4ever/ro-catalogue/tree/master/v0.1/HelloWorld
22. Help organize the materials and
methods of computational analysis
Research Object Portal
Materials & Methods of
Metabolic Syndrome
Analysis
Kristina Hettne
Harish Dharuri
22
23. Expected on myExperiment
Research Objects inside!
• Packs more prominent
• Start a pack when you
upload a workflow
• Upload wizards, pack
management, export
• Checklists, automated
star ratings
• Add workflow runs and
example data
• Sticky annotations RO-enabled myExperiment mockup
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 23
24. Fame and Glory
It was
me, me,
What HDAC1 interacts with Parvb
me! I Discovered by: me
found Published by: me
Research Object
How I
found
it
24
25. Nanopublication Model
Getting credit for digital results
Nanopublication ID Integrity Key
Assertion Provenance
associa- sio:statis-
is ticalAssociatio
tion n
Supporting Attribution
sio:has-
measure Association_1
this
dcterms:
mentValu _p_value nanopu created
sio: e b
refers-to
opm:
assertio was
n Derived pav:
From authored-
is sio:has-value By
opm:
wasGene-
…
ratedBy
dcterms:
Sio:probability 6.56e-5 DOI
-value ^^xsd:float
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 25
31. Nanopublications of Genetic Variations
visualized on the genome
Zuotian Tatum, Jesse van Dam
Other
Other
Sources
Tools
Nanopublication
Store
31
32. Fame and Glory
It was Nanopublication
me, me,
What <CS7183> <associatedWith> <MetS>
me!
I Discovered by: me
found Published by: me
Research Object
How I
found http://purl.org/nanopub/123
http://purl.org/ResObj/345
it
32
33. Summary (1/2)
• Preservation under the hood of digital research
tools
• Research Object Model: annotated aggregates
• Nanopublication: fine-grained digital credit
Check Nanopub.org to stay updated
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 33
34. Summary (2/2)
• Semantic Web for exchange and interoperability
• In progress: RO-enabling myExperiment
Watch myExperiment.org in 2013!
• Plans to RO-enable
Taverna, Galaxy, GenomeSpace
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 34
36. Thank you for your attention
36
http://biosemantics.org
37. Reproducible Science
Preserved materials
and methods for the
‘wet laboratory’
scientist
From Van Roon-Mom et al., BMC Molecular Biology 2008
doi: 10.1186/1471-2199-9-84.
38. Reproducible Science?
What is the digital
equivalent?
Is it equally good?
Can we do better?
- or worse?
Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,
http://biosemantics.org , myExperiment.org/workflows/2197
39. Reproducible Science
What is the digital
equivalent?
Is it equally good?
Can we do better?
– or worse?
Can you tell
what this is
doing? Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,
http://biosemantics.org , myExperiment.org/workflows/2197
40. Reproducible Science
What is our incentive?
Nobility Greater Good
Good Reproducible Science Serve the public
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 40
41. Reproducible Science
What is our incentive?
I’ll be the first
in Nature
Fame and Glory
Getting on with it...
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 41
42. Our aim
‘Useful’ preservation
Support reproducibility
in tools and by guidelines that
speed up your research
get you acknowledgement
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 42
43. Preservation
What?
How?
Nanopublication
Assertion
Research
Results
Provenance
Attribution
Supporting
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 43
44. Preservation
Deemed Deemed
Valuable
of of
Digital
for
scientific scientific
Value What?
value by
scientists value by How?
scientists scientists
Nanopublication
Assertion
Research
Results
Provenance
Attribution
Supporting
Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 44
45. Acknowledgements http://biosemantics.org/
■ Erik Schultes ■ Paul Groth ■ Christine Chichester
■ Andrew Gibson ■ Frank van ■ Kees Burger - NBIC
■ Reinout van Schouwen Harmelen ■ Spyros Kotoulas - VU
■ Kostas Karasavvas ■ Antonis Loizou - VU
■ Kristina Hettne ■ Valery Tkachenko - RSC
■ Harish Dharuri ■ Andra Waagmeester -
■ Eleni Mina Maastricht
■ Jesse van Dam ■ Erik van Mulligen ■ Sune Askjaer - Lundbeck
■ Herman van Haagen ■ Bharat Singh ■ Steve Pettifer - Manchester
■ Zuotian Tatum ■ Jan Kors ■ Lee Harland - Pfizer/CD
■ Johan den Dunnen ■ Carina Haupt - Fraunhofer
■ Peter-Bram ‘t Hoen ■ Colin Batchelor - RSC
■ Barend Mons ■ Miguel Vazquez - CNIO
■ Gert-Jan van Ommen ■ José María Fernández -
CNIO
■ Jahn Saito - Maastricht
■ Andrew Gibson (Outside
Expert) - Amsterdam
■ Louis Wich - DTU
Melton
Foundation
Editor's Notes
In wet-lab biology and other experimental sciences, we have addressed these questions in what we disseminate and how. The system is not perfect. It is flawed for real reproducibility, but it does give insight into how results were obtained. Sufficient to make up our own minds on whether to use the results for our own hypotheses, or build on the methods.=> Do we have a good digital equivalent?
Workflows could be seen as an equivalent of wet lab protocols. Are they as good as Materials and Methods, better or worse?=> Perhaps worse?
And then: what is our incentive to make it as good or better?Is it nobility, or serving the greater good?=> Getting on with it: publish
Or is it helping me to me next Nature paper?
Some see workflows as a good way to help us get on with it, not just for preservation purposes. This is a discussion by itself, not the focus here.
The research model used to pull together information about an experiment is based substantially on existing technologies, notably Object Re-use and Exchange (ORE) and Annotation Ontology (AO).Domain or application specific vocabularies and ontologies are added into this mix to provide supporting information as needed and available.The structure has been built with RDF in mind, making RDF a natural choice for representing RO structures, but the RO Model is an abstraction which can be implemented with different tools.The main irreducible underpinning is the use of URIs for linking resources and concepts.
A Research Object aggregates resourcesIt also aggregates annotations, which are associated with resourcesThe annotations bodies are RDF documents that use additional, possibly domain-specific vocabularies.
A Research Object aggregates resourcesIt also aggregates annotations, which are associated with resourcesThe annotations bodies are RDF documents that use additional, possibly domain-specific vocabularies.
Attribution is part of the RO model and myExperiment, but we are also developing something specifically to address this aspect of digital preservation and publishing… Nanopublications
In wet-lab biology and other experimental sciences, we have addressed these questions in what we disseminate and how. The system is not perfect. It is flawed for real reproducibility, but it does give insight into how results were obtained. Sufficient to make up our own minds on whether to use the results for our own hypotheses, or build on the methods.=> Do we have a good digital equivalent?
Workflows could be seen as an equivalent of wet lab protocols. Are they as good as Materials and Methods, better or worse?=> Perhaps worse?
For instance, can we all tell what this workflow is doing? - Do we miss things?=> Incentive to do good
And then: what is our incentive to make it as good or better?Is it nobility, or serving the greater good?=> Getting on with it: publish
Or is it helping me to me next Nature paper?
Therefore we like to speak of ‘Useful Preservation’