SlideShare a Scribd company logo
1 of 45
Download to read offline
Newton's ideas and methods are
     preserved forever:
      how about yours?
 Marco Roos, Kristina Hettne, Jun Zhao, Mark Thompson

    Cloud and Workflows for Reproducible
               Bioinformatics
         Shenzhen, December 19, 2012
Wednesday, December 19, 2012   Digital preservation for the modern scientist   2
Reproduced workflows




                  Mass
                                                Power & Mass
                                                                                   Force
                                                 Web Service
       Acceleration




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   3
Case study
Bioinformatics analysis of Metabolic Syndrome
Kristina Hettne, Harish Dharuri




      Genome Wide Association
             Studies

      What is the genetic basis for
      the diseases associated with
          Metabolic Syndrome?
Reproducible Science




Preservation for the
  wet laboratory
  scientist             From Van Roon-Mom et al., BMC Molecular Biology 2008
                                   doi: 10.1186/1471-2199-9-84.
Reproducible Science?




What is the digital
 equivalent?
Is it equally good?
Can we do better?
  - or worse?

                                                                                GroundHog
                                                                                   DB

                         Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,
                               http://biosemantics.org , myExperiment.org/workflows/2197
Reproducible Science
                      What is our incentive?

                      Nobility                                                  Greater Good
 Good Reproducible Science                                                     Serve the public




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments        7
Reproducible Science
                      What is our incentive?


                                              I’ll be the first
                                                  in Nature



       Fame and Glory
  Getting on with it...




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   8
CHALLENGE
    Stimulate preservation and
    reproducibility while speeding up
    the research process
Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   9
Enhance the research cycle
         What slows us down?

 Research
 Question


Find Methods        Get          Understand    Format
 and Data, +      Methods        Methods and   (Align)
their Owners      and Data          Data        Data




  Design
                                  Interpret
    the           Compute                      Publish
                                   Results
  Analysis


                                          10
Bottlenecks


 • Loosing track of what you did
 • Messy storage
 • Preparing material for a publication
 • Understanding the computational procedure
 • Communication with (non-technical) colleagues
 • Keeping tools working
 • Getting credit for digital results outside of
   traditional publications

Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   11
Getting on with workflows




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   12
Monolithic Tool →
              Web Services → Workflows → (Web) Tool
              Example: Anni 2.0 → Anni workflows




                              AnniWF




http://workflow.biosemantics.org/t2web/workflow/2725
Digital Repository
          myExperiment.org


The recipes store
•   Find workflows
•   Share workflows & files
•   Find people
•   Build communities
•   Publish packages
•   Tag workflows
•   Score, rate, comment
Instructions for workflow authors
                      10 Best Practices for creating workflows



        1.         Make a sketch workflow
        2.         Use modules
        3.         Think about the output
        4.         Provide example inputs and outputs
        5.         Annotate
        6.         Test execution from outside local environment
        7.         Choose services carefully
        8.         Reuse existing workflows
        9.         Advertise
        10.        Maintain

Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   15
Reproducible Science
            Is a workflow sufficient?




  Useful Preservation
           =
Understandable Objects


Reproduce, Reuse, Repurpose, Repair,
                ...




                   What is this
                     doing?
                                       Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,
                                             http://biosemantics.org , myExperiment.org/workflows/2197
Useful preservation 1
                      myExperiment Packs




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   17
Useful preservation
                      Research Object Model

                             Research Object Model
                Aggregation and Annotation Model for Digital Methods




                                                                               http://wf4ever.github.com/ro/
Wednesday, December 19, 2012   Towards preserving bioinformatics experiments          18
Research Object (RO) Model

RO = ORE + AO + vocabularies
Object Re-use and Exchange (OAI-ORE)
   Describes aggregations of resources:
   data, metadata, papers, etc.
Annotation Ontology (AO)
  Associates RDF metadata descriptions with resources
Generic and domain-specific vocabularies
   Used in annotation bodies to provide information about
     resources (types, dependencies, descriptions, etc.)
Builds on RDF, leading to RDF as a natural implementation choice
Model specification: http://wf4ever.github.com/ro/
Research Object Model
Research Object: “Hello World”




https://github.com/wf4ever/ro-catalogue/tree/master/v0.1/HelloWorld
Help organize the materials and
methods of computational analysis
Research Object Portal
                                    Materials & Methods of
                                     Metabolic Syndrome
                                           Analysis
                                       Kristina Hettne
                                       Harish Dharuri




                            22
Expected on myExperiment

  Research Objects inside!
  • Packs more prominent
  • Start a pack when you
    upload a workflow
  • Upload wizards, pack
    management, export
  • Checklists, automated
    star ratings
  • Add workflow runs and
    example data
  • Sticky annotations                                            RO-enabled myExperiment mockup

Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   23
Fame and Glory
  It was
  me, me,
                      What    HDAC1 interacts with Parvb
   me!                  I     Discovered by: me
                      found   Published by: me




Research Object
                  How I
                  found
                    it




                                      24
Nanopublication Model
                         Getting credit for digital results


                Nanopublication ID                                                             Integrity Key

                    Assertion                                                            Provenance
                    associa-                            sio:statis-
                                     is              ticalAssociatio
                      tion                                   n
                                                                       Supporting               Attribution
                                  sio:has-
                                 measure             Association_1
                                                                                                 this
                                                                                                         dcterms:
                                 mentValu              _p_value                                nanopu    created
                        sio:          e                                                           b
                     refers-to
                                                                                    opm:
                                                                       assertio      was
                                                                          n        Derived                pav:
                                                                                    From                authored-
                                          is         sio:has-value                                         By

                                                                                    opm:
                                                                                  wasGene-
                                                                                                                    …
                                                                                   ratedBy
                                                                                                        dcterms:
                                   Sio:probability      6.56e-5                                           DOI
                                       -value          ^^xsd:float




Wednesday, December 19, 2012              Towards preserving bioinformatics experiments        25
Nanopub.org




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   26
Examples




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   27
Examples in RDF format




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   28
Validator




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   29
Example: LOVD
Nanopublications of Genetic Variations
          visualized on the genome
                                                   Zuotian Tatum, Jesse van Dam




 Other
                                                              Other
Sources
                                                              Tools
                      Nanopublication
                          Store

                                             31
Fame and Glory
  It was                        Nanopublication
  me, me,
                      What       <CS7183> <associatedWith> <MetS>
   me!
                        I        Discovered by: me
                      found      Published by: me




Research Object
                  How I
                  found       http://purl.org/nanopub/123
                               http://purl.org/ResObj/345
                    it




                                        32
Summary (1/2)




• Preservation under the hood of digital research
  tools
• Research Object Model: annotated aggregates
• Nanopublication: fine-grained digital credit
  Check Nanopub.org to stay updated




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   33
Summary (2/2)




• Semantic Web for exchange and interoperability
• In progress: RO-enabling myExperiment
  Watch myExperiment.org in 2013!
• Plans to RO-enable
  Taverna, Galaxy, GenomeSpace




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   34
Acknowledgements
EU Wf4Ever project (270129)
funded under EU FP7 (ICT- 2009.4.1).
(http://www.wf4ever-project.org)
Thank you for your attention




                          36

http://biosemantics.org
Reproducible Science




Preserved materials
  and methods for the
  ‘wet laboratory’
  scientist



                        From Van Roon-Mom et al., BMC Molecular Biology 2008
                                   doi: 10.1186/1471-2199-9-84.
Reproducible Science?




What is the digital
 equivalent?
Is it equally good?
Can we do better?
  - or worse?



                         Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,
                               http://biosemantics.org , myExperiment.org/workflows/2197
Reproducible Science




What is the digital
 equivalent?
Is it equally good?
Can we do better?
  – or worse?

               Can you tell
               what this is
                 doing?       Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,
                                    http://biosemantics.org , myExperiment.org/workflows/2197
Reproducible Science
                      What is our incentive?

                      Nobility                                                  Greater Good
 Good Reproducible Science                                                     Serve the public




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments        40
Reproducible Science
                      What is our incentive?


                                              I’ll be the first
                                                  in Nature



       Fame and Glory
  Getting on with it...




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments   41
Our aim



                                 ‘Useful’ preservation
                          Support reproducibility
                      in tools and by guidelines that
                                speed up your research
                               get you acknowledgement



Wednesday, December 19, 2012     Towards preserving bioinformatics experiments   42
Preservation




                                                                                What?
                                                                                                    How?


                                                                                       Nanopublication
                                                                               Assertion



         Research
          Results
                                                                               Provenance
                                                                               Attribution

                                                                               Supporting




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments         43
Preservation




                                Deemed                 Deemed
                                Valuable
                                   of                     of
                                                        Digital
                                   for
                               scientific             scientific
                                                        Value                   What?
                                value by
                               scientists              value by                                     How?
                               scientists             scientists
                                                                                       Nanopublication
                                                                               Assertion



         Research
          Results
                                                                               Provenance
                                                                               Attribution

                                                                               Supporting




Wednesday, December 19, 2012   Towards preserving bioinformatics experiments         44
Acknowledgements                                  http://biosemantics.org/




■   Erik Schultes          ■   Paul Groth          ■   Christine Chichester
■   Andrew Gibson          ■   Frank van           ■   Kees Burger - NBIC
■   Reinout van Schouwen       Harmelen            ■   Spyros Kotoulas - VU
■   Kostas Karasavvas                              ■   Antonis Loizou - VU
■   Kristina Hettne                                ■   Valery Tkachenko - RSC
■   Harish Dharuri                                 ■   Andra Waagmeester -
■   Eleni Mina                                         Maastricht
■   Jesse van Dam          ■   Erik van Mulligen   ■   Sune Askjaer - Lundbeck
■   Herman van Haagen      ■   Bharat Singh        ■   Steve Pettifer - Manchester
■   Zuotian Tatum          ■   Jan Kors            ■   Lee Harland - Pfizer/CD
■   Johan den Dunnen                               ■   Carina Haupt - Fraunhofer
■   Peter-Bram ‘t Hoen                             ■   Colin Batchelor - RSC
■   Barend Mons                                    ■   Miguel Vazquez - CNIO
■   Gert-Jan van Ommen                             ■   José María Fernández -
                                                       CNIO
                                                   ■   Jahn Saito - Maastricht
                                                   ■   Andrew Gibson (Outside
                                                       Expert) - Amsterdam
                                                   ■   Louis Wich - DTU


                                                                 Melton
                                                                 Foundation

More Related Content

Similar to Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
Triplifier talk
Triplifier talkTriplifier talk
Triplifier talkJohn Deck
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
3 bitriplifiertalk
3 bitriplifiertalk3 bitriplifiertalk
3 bitriplifiertalkJohn Deck
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataGudmundur Thorisson
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013ECNOfficer
 
Accretive Health - Quality Management in Health Care
Accretive Health - Quality Management in Health CareAccretive Health - Quality Management in Health Care
Accretive Health - Quality Management in Health CareAccretiveHealth
 
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel GeaBm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel GeaManuel GEA - Bio-Modeling Systems
 
Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012srosenblatt
 
Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012srosenblatt
 
discopen
discopendiscopen
discopenJisc
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
 
Osimo crossover md
Osimo crossover mdOsimo crossover md
Osimo crossover mdosimod
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012Lee Dirks
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingGigaScience, BGI Hong Kong
 
eChallenges e2012 18 Oct - Living Lab Innovation Through Pastische by Fulgenc...
eChallenges e2012 18 Oct - Living Lab Innovation Through Pastische by Fulgenc...eChallenges e2012 18 Oct - Living Lab Innovation Through Pastische by Fulgenc...
eChallenges e2012 18 Oct - Living Lab Innovation Through Pastische by Fulgenc...Harry Fulgencio
 

Similar to Marco Roos: Newton's ideas and methods are preserved forever: how about yours? (20)

Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
Triplifier talk
Triplifier talkTriplifier talk
Triplifier talk
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
3 bitriplifiertalk
3 bitriplifiertalk3 bitriplifiertalk
3 bitriplifiertalk
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
 
Open Science Governance and Regulation/Simon Hodson
Open Science Governance and Regulation/Simon HodsonOpen Science Governance and Regulation/Simon Hodson
Open Science Governance and Regulation/Simon Hodson
 
Michener Plenary PPSR2012
Michener Plenary PPSR2012Michener Plenary PPSR2012
Michener Plenary PPSR2012
 
Accretive Health - Quality Management in Health Care
Accretive Health - Quality Management in Health CareAccretive Health - Quality Management in Health Care
Accretive Health - Quality Management in Health Care
 
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel GeaBm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
 
Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012
 
Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012
 
discopen
discopendiscopen
discopen
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
Osimo crossover md
Osimo crossover mdOsimo crossover md
Osimo crossover md
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
 
eChallenges e2012 18 Oct - Living Lab Innovation Through Pastische by Fulgenc...
eChallenges e2012 18 Oct - Living Lab Innovation Through Pastische by Fulgenc...eChallenges e2012 18 Oct - Living Lab Innovation Through Pastische by Fulgenc...
eChallenges e2012 18 Oct - Living Lab Innovation Through Pastische by Fulgenc...
 

More from GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

  • 1. Newton's ideas and methods are preserved forever: how about yours? Marco Roos, Kristina Hettne, Jun Zhao, Mark Thompson Cloud and Workflows for Reproducible Bioinformatics Shenzhen, December 19, 2012
  • 2. Wednesday, December 19, 2012 Digital preservation for the modern scientist 2
  • 3. Reproduced workflows Mass Power & Mass Force Web Service Acceleration Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 3
  • 4. Case study Bioinformatics analysis of Metabolic Syndrome Kristina Hettne, Harish Dharuri Genome Wide Association Studies What is the genetic basis for the diseases associated with Metabolic Syndrome?
  • 5. Reproducible Science Preservation for the wet laboratory scientist From Van Roon-Mom et al., BMC Molecular Biology 2008 doi: 10.1186/1471-2199-9-84.
  • 6. Reproducible Science? What is the digital equivalent? Is it equally good? Can we do better? - or worse? GroundHog DB Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al., http://biosemantics.org , myExperiment.org/workflows/2197
  • 7. Reproducible Science What is our incentive? Nobility Greater Good Good Reproducible Science Serve the public Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 7
  • 8. Reproducible Science What is our incentive? I’ll be the first in Nature Fame and Glory Getting on with it... Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 8
  • 9. CHALLENGE Stimulate preservation and reproducibility while speeding up the research process Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 9
  • 10. Enhance the research cycle What slows us down? Research Question Find Methods Get Understand Format and Data, + Methods Methods and (Align) their Owners and Data Data Data Design Interpret the Compute Publish Results Analysis 10
  • 11. Bottlenecks • Loosing track of what you did • Messy storage • Preparing material for a publication • Understanding the computational procedure • Communication with (non-technical) colleagues • Keeping tools working • Getting credit for digital results outside of traditional publications Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 11
  • 12. Getting on with workflows Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 12
  • 13. Monolithic Tool → Web Services → Workflows → (Web) Tool Example: Anni 2.0 → Anni workflows AnniWF http://workflow.biosemantics.org/t2web/workflow/2725
  • 14. Digital Repository myExperiment.org The recipes store • Find workflows • Share workflows & files • Find people • Build communities • Publish packages • Tag workflows • Score, rate, comment
  • 15. Instructions for workflow authors 10 Best Practices for creating workflows 1. Make a sketch workflow 2. Use modules 3. Think about the output 4. Provide example inputs and outputs 5. Annotate 6. Test execution from outside local environment 7. Choose services carefully 8. Reuse existing workflows 9. Advertise 10. Maintain Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 15
  • 16. Reproducible Science Is a workflow sufficient? Useful Preservation = Understandable Objects Reproduce, Reuse, Repurpose, Repair, ... What is this doing? Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al., http://biosemantics.org , myExperiment.org/workflows/2197
  • 17. Useful preservation 1 myExperiment Packs Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 17
  • 18. Useful preservation Research Object Model Research Object Model Aggregation and Annotation Model for Digital Methods http://wf4ever.github.com/ro/ Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 18
  • 19. Research Object (RO) Model RO = ORE + AO + vocabularies Object Re-use and Exchange (OAI-ORE) Describes aggregations of resources: data, metadata, papers, etc. Annotation Ontology (AO) Associates RDF metadata descriptions with resources Generic and domain-specific vocabularies Used in annotation bodies to provide information about resources (types, dependencies, descriptions, etc.) Builds on RDF, leading to RDF as a natural implementation choice Model specification: http://wf4ever.github.com/ro/
  • 21. Research Object: “Hello World” https://github.com/wf4ever/ro-catalogue/tree/master/v0.1/HelloWorld
  • 22. Help organize the materials and methods of computational analysis Research Object Portal Materials & Methods of Metabolic Syndrome Analysis Kristina Hettne Harish Dharuri 22
  • 23. Expected on myExperiment Research Objects inside! • Packs more prominent • Start a pack when you upload a workflow • Upload wizards, pack management, export • Checklists, automated star ratings • Add workflow runs and example data • Sticky annotations RO-enabled myExperiment mockup Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 23
  • 24. Fame and Glory It was me, me, What HDAC1 interacts with Parvb me! I Discovered by: me found Published by: me Research Object How I found it 24
  • 25. Nanopublication Model Getting credit for digital results Nanopublication ID Integrity Key Assertion Provenance associa- sio:statis- is ticalAssociatio tion n Supporting Attribution sio:has- measure Association_1 this dcterms: mentValu _p_value nanopu created sio: e b refers-to opm: assertio was n Derived pav: From authored- is sio:has-value By opm: wasGene- … ratedBy dcterms: Sio:probability 6.56e-5 DOI -value ^^xsd:float Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 25
  • 26. Nanopub.org Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 26
  • 27. Examples Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 27
  • 28. Examples in RDF format Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 28
  • 29. Validator Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 29
  • 31. Nanopublications of Genetic Variations visualized on the genome Zuotian Tatum, Jesse van Dam Other Other Sources Tools Nanopublication Store 31
  • 32. Fame and Glory It was Nanopublication me, me, What <CS7183> <associatedWith> <MetS> me! I Discovered by: me found Published by: me Research Object How I found http://purl.org/nanopub/123 http://purl.org/ResObj/345 it 32
  • 33. Summary (1/2) • Preservation under the hood of digital research tools • Research Object Model: annotated aggregates • Nanopublication: fine-grained digital credit Check Nanopub.org to stay updated Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 33
  • 34. Summary (2/2) • Semantic Web for exchange and interoperability • In progress: RO-enabling myExperiment Watch myExperiment.org in 2013! • Plans to RO-enable Taverna, Galaxy, GenomeSpace Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 34
  • 35. Acknowledgements EU Wf4Ever project (270129) funded under EU FP7 (ICT- 2009.4.1). (http://www.wf4ever-project.org)
  • 36. Thank you for your attention 36 http://biosemantics.org
  • 37. Reproducible Science Preserved materials and methods for the ‘wet laboratory’ scientist From Van Roon-Mom et al., BMC Molecular Biology 2008 doi: 10.1186/1471-2199-9-84.
  • 38. Reproducible Science? What is the digital equivalent? Is it equally good? Can we do better? - or worse? Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al., http://biosemantics.org , myExperiment.org/workflows/2197
  • 39. Reproducible Science What is the digital equivalent? Is it equally good? Can we do better? – or worse? Can you tell what this is doing? Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al., http://biosemantics.org , myExperiment.org/workflows/2197
  • 40. Reproducible Science What is our incentive? Nobility Greater Good Good Reproducible Science Serve the public Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 40
  • 41. Reproducible Science What is our incentive? I’ll be the first in Nature Fame and Glory Getting on with it... Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 41
  • 42. Our aim ‘Useful’ preservation Support reproducibility in tools and by guidelines that speed up your research get you acknowledgement Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 42
  • 43. Preservation What? How? Nanopublication Assertion Research Results Provenance Attribution Supporting Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 43
  • 44. Preservation Deemed Deemed Valuable of of Digital for scientific scientific Value What? value by scientists value by How? scientists scientists Nanopublication Assertion Research Results Provenance Attribution Supporting Wednesday, December 19, 2012 Towards preserving bioinformatics experiments 44
  • 45. Acknowledgements http://biosemantics.org/ ■ Erik Schultes ■ Paul Groth ■ Christine Chichester ■ Andrew Gibson ■ Frank van ■ Kees Burger - NBIC ■ Reinout van Schouwen Harmelen ■ Spyros Kotoulas - VU ■ Kostas Karasavvas ■ Antonis Loizou - VU ■ Kristina Hettne ■ Valery Tkachenko - RSC ■ Harish Dharuri ■ Andra Waagmeester - ■ Eleni Mina Maastricht ■ Jesse van Dam ■ Erik van Mulligen ■ Sune Askjaer - Lundbeck ■ Herman van Haagen ■ Bharat Singh ■ Steve Pettifer - Manchester ■ Zuotian Tatum ■ Jan Kors ■ Lee Harland - Pfizer/CD ■ Johan den Dunnen ■ Carina Haupt - Fraunhofer ■ Peter-Bram ‘t Hoen ■ Colin Batchelor - RSC ■ Barend Mons ■ Miguel Vazquez - CNIO ■ Gert-Jan van Ommen ■ José María Fernández - CNIO ■ Jahn Saito - Maastricht ■ Andrew Gibson (Outside Expert) - Amsterdam ■ Louis Wich - DTU Melton Foundation

Editor's Notes

  1. In wet-lab biology and other experimental sciences, we have addressed these questions in what we disseminate and how. The system is not perfect. It is flawed for real reproducibility, but it does give insight into how results were obtained. Sufficient to make up our own minds on whether to use the results for our own hypotheses, or build on the methods.=&gt; Do we have a good digital equivalent?
  2. Workflows could be seen as an equivalent of wet lab protocols. Are they as good as Materials and Methods, better or worse?=&gt; Perhaps worse?
  3. And then: what is our incentive to make it as good or better?Is it nobility, or serving the greater good?=&gt; Getting on with it: publish
  4. Or is it helping me to me next Nature paper?
  5. Some see workflows as a good way to help us get on with it, not just for preservation purposes. This is a discussion by itself, not the focus here.
  6. The research model used to pull together information about an experiment is based substantially on existing technologies, notably Object Re-use and Exchange (ORE) and Annotation Ontology (AO).Domain or application specific vocabularies and ontologies are added into this mix to provide supporting information as needed and available.The structure has been built with RDF in mind, making RDF a natural choice for representing RO structures, but the RO Model is an abstraction which can be implemented with different tools.The main irreducible underpinning is the use of URIs for linking resources and concepts.
  7. A Research Object aggregates resourcesIt also aggregates annotations, which are associated with resourcesThe annotations bodies are RDF documents that use additional, possibly domain-specific vocabularies.
  8. A Research Object aggregates resourcesIt also aggregates annotations, which are associated with resourcesThe annotations bodies are RDF documents that use additional, possibly domain-specific vocabularies.
  9. Attribution is part of the RO model and myExperiment, but we are also developing something specifically to address this aspect of digital preservation and publishing… Nanopublications
  10. In wet-lab biology and other experimental sciences, we have addressed these questions in what we disseminate and how. The system is not perfect. It is flawed for real reproducibility, but it does give insight into how results were obtained. Sufficient to make up our own minds on whether to use the results for our own hypotheses, or build on the methods.=&gt; Do we have a good digital equivalent?
  11. Workflows could be seen as an equivalent of wet lab protocols. Are they as good as Materials and Methods, better or worse?=&gt; Perhaps worse?
  12. For instance, can we all tell what this workflow is doing? - Do we miss things?=&gt; Incentive to do good
  13. And then: what is our incentive to make it as good or better?Is it nobility, or serving the greater good?=&gt; Getting on with it: publish
  14. Or is it helping me to me next Nature paper?
  15. Therefore we like to speak of ‘Useful Preservation’