SlideShare a Scribd company logo
1 of 21
Download to read offline
A PROV encoding for provenance analysis using
                                                     deductive rules (Datalog)




                               Paolo Missier                    Khalid Belhajjame
                           Newcastle University, UK         University of Manchester, UK




                                                   IPAW’12
                                         Santa Barbara, CA, June 2012


Wednesday, June 20, 2012
The W3C PROV effort
                           • A set of specifications -- to be finalized by end of 2012
   IPAW 2012 - P.Missier




    2

Wednesday, June 20, 2012
The W3C PROV effort
                           • A set of specifications -- to be finalized by end of 2012
   IPAW 2012 - P.Missier




    2

Wednesday, June 20, 2012
A PROVenance graph
                             Remote past                                                                                                                                                       Recent past

                           Editing phase
                                                                                                      wasDerivedFrom



                                                                                               used
                                                                                 paper3                    reading
                                                                                                                                      wasGeneratedBy


                                                                            specializationOf                                    specializationOf
                                                   Bob-1                                                    Bob                                                        Bob-2

                                                                                                                             type=person
                                                                         type=person                  actedOnBehalfOf
                                  role=author                                                                                role=main_editor                                         role=author
                                                                         role=jr_editor
                                                                                                            Alice                                               wasAssociatedWith
                                                                  wasAttributedTo                                             role=editor
                                              wasAssociatedWith                                       wasAssociatedWith
                             paper1
                                                             wasGeneratedBy         draft      used                       wasGeneratedBy          draft      used                   wasGeneratedBy       draft
                                           used   drafting                                              commenting                                                    editing
                                                                                     v1                                                         comments                                                  v2

                             paper2                                           distribution=internal                                         wasDerivedFrom
                                                                              status=draft                                                                                           distribution=internal
                                                                              version=0.1                                                                                            status=draft
                                                                                                                                                                                     version=0.1




                                    PROV Notation:

                                    entity(draftV1, ["distribution"="internal",
   IPAW 2012 - P.Missier




                                                     "status"="draft", "version"="0.1"])
                                    entity(draftComments)
                                    activity(commenting, comment_start, comment_end)
                                    used(u1; commenting, draftV1, comm_d1_use)
    3                               wasGeneratedBy(g1; draftComments, commenting, comm_dc_gen)

Wednesday, June 20, 2012
PROV-N and Datalog encoding
                           PROV follows a relational data model
                            entity(draftV1, ["distribution"="internal",
                                             "status"="draft", "version"="0.1"])
                            entity(draftComments)
                            activity(commenting, comment_start, comment_end)
                            used(u1; commenting, draftV1, comm_d1_use)
                            wasGeneratedBy(g1; draftComments, commenting, comm_dc_gen)


                           The corresponding Datalog EDB is straightforward:

                           entity(draftV1, draftV1Attrs).
                           attrList(draftV1Attrs, "distribution", "public").
                           attrList(draftV1Attrs, "status", "draft").
                           attrList(draftV1Attrs, "release", "1.0").
                           entity(draftComments, nil).
                           activity(commenting, comment_start, comment_end, nil).
                           used(commenting, draftV1, comm_d1_use, nil).
   IPAW 2012 - P.Missier




                           wasGeneratedBy(draftComments, commenting, comm_dc_gen, nil).


                           Parser implementation in the ProvToolbox (gitHub)
    4
                           (Thanks to Luc Moreau for the master PROV-N parser code)
Wednesday, June 20, 2012
IPAW 2012 - P.Missier   PROV Constraints




    5

Wednesday, June 20, 2012
PROV Constraints
                           • PROV-N provides a syntax
                           • PROV comes with a set of rules for the semantics of the model
                             1.deductive rules




                             2.constraints: they effectively define consistent provenance
   IPAW 2012 - P.Missier




                                                        note:
    6                                                   these constraints are still in flux at the time of this presentation

Wednesday, June 20, 2012
PROV constraints as Datalog rules
                             Goal of this work:
                           • to encode most PROV constraints as Datalog rules
                             – (with some exceptions)

                           • Benefits:
                             –   A declarative specification with a deductive inference model
                             –   Therefore, a validator for PROV graphs
                             –   With a well-understood query model
                             –   Useful for rapid prototyping of graph traversal algorithms for provenance analysis
   IPAW 2012 - P.Missier




    7                        Note: the implementation is done using the DLV: http://www.dlvsystem.com

Wednesday, June 20, 2012
PROV constraints as Datalog rules
                             Goal of this work:
                           • to encode most PROV constraints as Datalog rules
                             – (with some exceptions)

                           • Benefits:
                             –   A declarative specification with a deductive inference model
                             –   Therefore, a validator for PROV graphs
                             –   With a well-understood query model
                             –   Useful for rapid prototyping of graph traversal algorithms for provenance analysis


                                                                 wasStartedBy

                                                        wasGeneratedBy          used
                                                            [t_gen]             [t_u]
                                                 a1                    e1                a2

                                          [a1Start, a1End]                        [a2Start, a2End]
   IPAW 2012 - P.Missier




    7                        Note: the implementation is done using the DLV: http://www.dlvsystem.com

Wednesday, June 20, 2012
Example of deductive rules: traceability




                           [1,2] tracedTo(E2, E1):-     wasDerivedFrom(E2,E1,_,_).
                           [3]   tracedTo(E, Ag)   :-   wasAttributedTo(E,Ag,_,_).
                           [4] tracedTo(E2, Ag1) :-      wasGeneratedBy(E2,A,_,_),
                                                         wasAttributedTo(E2,Ag2,_,_),
                           ! ! !                        actedOnBehalfOf(Ag2,Ag1,A,_).
                           [5] tracedTo(E2, E1)    :-   wasStartedBy(A,E1,_),
                                                        wasGeneratedBy(E2,A,_,_).
   IPAW 2012 - P.Missier




                           [6] tracedTo(E3, E1)    :-   tracedTo(E3, E2), tracedTo(E2,E1).


    8

Wednesday, June 20, 2012
Computing the induced graph
                                                                        model:

                                                                        tracedTo(draftV1, alice)                          [3] attribution, delegation
                           query:                                       tracedTo(draftV1, bob_1)                          [2] (attribution)
                                                                        tracedTo(draftV2, draftV1)                         [1] (derivation)
                           tracedTo(E, E1) ?                            tracedTo(draftV2, alice)                          [5] transitivity
                                                                        tracedTo(draftV2, bob_1)                           [5] transitivity
                                                                        tracedTo(bob_2, bob_1)                            [1] (derivation)
                                                                                          wasDerivedFrom
                                                                                                           wasTracedTo




                                                                   actedOnBehalfOf                          actedOnBehalfOf
                                         Bob-1                                                 Alice                            Bob-2


                                                 wasAttributedTo
                                                                                                                          wasAssociatedWith
                                                                                     wasTracedTo
                                                              wasTracedTo                                                        wasTracedTo

                                    wasAssociatedWith

                                                    wasGeneratedBy               draft                                        wasGeneratedBy   draft
                                        drafting                                                             editing
   IPAW 2012 - P.Missier




                                                                                  v1                                                            v2

                                                                                                                         wasDerivedFrom

                                                                                                                          wasTracedTo



                                                                                                       wasTracedTo
    9

Wednesday, June 20, 2012
Constraints: example




                           % anti-symmetry of specialization
                           false :- specializationOf(E3,E2), specializationOf(E2,E3), E2 != E3.

                           Interpretation:
                            a Datalog program that satisfies the body of the rule has no model



                            Constraints vs. inference

                           “IF wasGeneratedBy(-;e, -, t1) and wasGeneratedBy(-;e, -, t2) hold,
                           THEN t1=t2.”
                                                               ⇓
   IPAW 2012 - P.Missier




                            false :- wasGeneratedBy(E,_,T1,_), wasGeneratedBy(E,_,T2,_), T1 != T2.


  10

Wednesday, June 20, 2012
Limitations of constraints mapping



                                                                                      ag


                                                                                                   wasAttributedTo
                                                              wasAssociatedWith


                                                                                  wasGeneratedBy
                                                         a                                                           e




                           - The rule above generates a set of relations
                           - Existential quantification on a

                           - Also, attributes from relations in the body are not merged into new attributes for the
                           head:
   IPAW 2012 - P.Missier




                                % derivation-use
                                used(A,E1, nil, T) :- wasDerivedFrom(E2, E1,_, Attrs1),
                                                      wasGeneratedBy( E2, A, Attrs2, T).


  11

Wednesday, June 20, 2012
Ad hoc provenance analysis -- examples
                           Find all pairs of agents, along with the length of each of the paths amongst
                           them
                           - an embrionic form of “distance” amongst agents to express how strongly
                             they are related
                           wasInformedBy(A2, A1,nil) :- wasGeneratedBy( E, A1, _, _),
                                                        used( A2, E, _, _).

                           relatedAgents0(Ag2, Ag1)           :- wasInformedBy(A2, A1,_),
                            !                                    wasAssociatedWith(A2,Ag2,_,_),
                            !                                    wasAssociatedWith(A1,Ag1,_,_).
                           relatedAgents(Ag2, Ag1, 1) :- relatedAgents0(Ag2, Ag1).

                           relatedAgents(Ag3, Ag1, N) :- relatedAgents0(Ag3, Ag2),
                                                         relatedAgents(Ag2, Ag1, M),
                                                         #succ(M,N).
   IPAW 2012 - P.Missier




                           - note: this simple version assumes no cycles



  12

Wednesday, June 20, 2012
Related agents

                                                                                                                                  alice, bob_1, 1
                                     relatedAgents(Ag2, Ag1, N) ?                                                                 bob_2, alice, 1
                                                                                                                                  bob_2, bob_1, 2
                                                                                                                                  ...


                                                                                  relatedAgents [2]




                                                       specializationOf                                    specializationOf
                                Bob-1                                                  Bob                                                          Bob-2
                                                                                                                      relatedAgents [1]
                                                  relatedAgents [1]              actedOnBehalfOf


                                                                                       Alice                                                 wasAssociatedWith

                           wasAssociatedWith                                     wasAssociatedWith

                                          wasGeneratedBy    draft         used                       wasGeneratedBy        draft          used                   wasGeneratedBy   draft
                               drafting                                            commenting                                                      editing
                                                             v1                                                          comments                                                  v2
   IPAW 2012 - P.Missier




  13

Wednesday, June 20, 2012
Chain of responsibility
                              responsible(Ag, Act) :-                        wasAssociatedWith(Act,Ag,_,_).

                              responsible(Ag1, Act) :- actedOnBehalfOf(Ag,Ag1,_,_),
                                                       responsible(Ag, Act).


                                                                                                                 alice, drafting
                                                                                                                 alice, commenting
                                                     responsible(Ag, Act)?                                       alice, editing
                                                                                                                 bob, drafting
                                                                                                                 bob, editing
                                                                                                                 bob_1, drafting
                                                                                                                 bob_2, editing
                                           specializationOf                       specializationOf
                                Bob-1                               Bob                                Bob-2

                                                              actedOnBehalfOf


                                                                    Alice                     wasAssociatedWith
                                               responsible
   IPAW 2012 - P.Missier




                                                                                    responsible
                           wasAssociatedWith                  wasAssociatedWith


                               drafting                         commenting                             editing
  14

Wednesday, June 20, 2012
Encoding temporal constraints




                                                 wasStartedBy

                                         wasGeneratedBy         used
                                             [t_gen]            [t_u]
                                  a1                  e1                 a2
   IPAW 2012 - P.Missier




                           [a1Start, a1End]                       [a2Start, a2End]




  15

Wednesday, June 20, 2012
Encoding temporal constraints

                           false :- precedes(T1,T2), precedes(T2,T1), T1 != T2.            % anti-symmetry
                           precedes(T1,T3) :- precedes(T1,T2), precedes(T2,T3). % transitivity

                           % Generation-precedes-usage
                           precedes(T2,T1) :- used( _, E, _,T1), wasGeneratedBy(E, _, _, T2), T1 != nil, T2 != nil.


                           precedes(T1, UT) :- activity(A, T1, _, _), used(A,_, _,UT), T1 != nil, UT != nil.
                           precedes(UT, T2) :- activity(A, _, T2, _), used(A,_, _,UT), T2 != nil, UT != nil.
                           precedes(ST1, ST2) :- wasStartedBy(A2,A1,_), activity(A1, ST1,_,_), activity(A2, ST2, _, _).




                                                            wasStartedBy

                                                    wasGeneratedBy         used
                                                        [t_gen]            [t_u]
                                             a1                  e1                 a2
   IPAW 2012 - P.Missier




                                      [a1Start, a1End]                       [a2Start, a2End]




  15

Wednesday, June 20, 2012
Encoding temporal constraints

                           false :- precedes(T1,T2), precedes(T2,T1), T1 != T2.       % anti-symmetry
                           precedes(T1,T3) :- precedes(T1,T2), precedes(T2,T3). % transitivity

                           % Generation-precedes-usage
                           precedes(T2,T1) :- used( _, E, _,T1), wasGeneratedBy(E, _, _, T2), T1 != nil, T2 != nil.


                           precedes(T1, UT) :- activity(A, T1, _, _), used(A,_, _,UT), T1 != nil, UT != nil.
                           precedes(UT, T2) :- activity(A, _, T2, _), used(A,_, _,UT), T2 != nil, UT != nil.
                           precedes(ST1, ST2) :- wasStartedBy(A2,A1,_), activity(A1, ST1,_,_), activity(A2, ST2, _, _).

                                                                                             precedes(A,B) ?
   IPAW 2012 - P.Missier




  15

Wednesday, June 20, 2012
Detecting illegal cycles
                           Derivation cycles are not allowed:

                           derivable(E2, E1) :-     wasDerivedFrom(E2, E1,_,_).
                           derivable(E2, E1) :-     derivable(E2, E0), derivable(E0, E1).

                           % no-cycles constraint
                           false :- derivable(E2, E1), derivable(E1, E2), E1 != E2.




                                               Query: derivable(A,B) ?
   IPAW 2012 - P.Missier




                                               returns no model



  16

Wednesday, June 20, 2012
Summary
                           • A Datalog encoding for PROVenance graphs
                             – PROV-N mapped to a database of facts
                             – PROV constraints mapped to Datalog rules
                             – implemented using DLV, a former research system with a startup home


                           • Why this is appealing:
                             –   straightforward mapping from PROV-N
                             –   most constraints easily encoded
                             –   well-understood declarative style, well-established computational model
                             –   rapid prototyping of graph traversal rules and queries for provenance analysis


                           • Still only a proof of concept
                             – constraints will evolve, W3C Note to be issued with a final encoding
                             – small scale examples. Relies on DLV optimizations, which are untested
   IPAW 2012 - P.Missier




                           • Potential for a stronger implementation
                             – DLV can be embedded into Java
  17
                             – comes with a variety of front-end reasoners, e.g. constraint solvers

Wednesday, June 20, 2012

More Related Content

Viewers also liked

Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paolo Missier
 
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paolo Missier
 
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Paolo Missier
 
ProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsPaolo Missier
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBTPaolo Missier
 
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Your data won’t stay smart forever:exploring the temporal dimension of (big ...
Your data won’t stay smart forever: exploring the temporal dimension of (big ...Paolo Missier
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...Paolo Missier
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralPaolo Missier
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07Paolo Missier
 

Viewers also liked (9)

Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
 
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
 
ProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphs
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBT
 
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Your data won’t stay smart forever:exploring the temporal dimension of (big ...
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
 

More from Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 

More from Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 

Recently uploaded

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Recently uploaded (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

Ipaw12 datalog paper talk

  • 1. A PROV encoding for provenance analysis using deductive rules (Datalog) Paolo Missier Khalid Belhajjame Newcastle University, UK University of Manchester, UK IPAW’12 Santa Barbara, CA, June 2012 Wednesday, June 20, 2012
  • 2. The W3C PROV effort • A set of specifications -- to be finalized by end of 2012 IPAW 2012 - P.Missier 2 Wednesday, June 20, 2012
  • 3. The W3C PROV effort • A set of specifications -- to be finalized by end of 2012 IPAW 2012 - P.Missier 2 Wednesday, June 20, 2012
  • 4. A PROVenance graph Remote past Recent past Editing phase wasDerivedFrom used paper3 reading wasGeneratedBy specializationOf specializationOf Bob-1 Bob Bob-2 type=person type=person actedOnBehalfOf role=author role=main_editor role=author role=jr_editor Alice wasAssociatedWith wasAttributedTo role=editor wasAssociatedWith wasAssociatedWith paper1 wasGeneratedBy draft used wasGeneratedBy draft used wasGeneratedBy draft used drafting commenting editing v1 comments v2 paper2 distribution=internal wasDerivedFrom status=draft distribution=internal version=0.1 status=draft version=0.1 PROV Notation: entity(draftV1, ["distribution"="internal", IPAW 2012 - P.Missier "status"="draft", "version"="0.1"]) entity(draftComments) activity(commenting, comment_start, comment_end) used(u1; commenting, draftV1, comm_d1_use) 3 wasGeneratedBy(g1; draftComments, commenting, comm_dc_gen) Wednesday, June 20, 2012
  • 5. PROV-N and Datalog encoding PROV follows a relational data model entity(draftV1, ["distribution"="internal", "status"="draft", "version"="0.1"]) entity(draftComments) activity(commenting, comment_start, comment_end) used(u1; commenting, draftV1, comm_d1_use) wasGeneratedBy(g1; draftComments, commenting, comm_dc_gen) The corresponding Datalog EDB is straightforward: entity(draftV1, draftV1Attrs). attrList(draftV1Attrs, "distribution", "public"). attrList(draftV1Attrs, "status", "draft"). attrList(draftV1Attrs, "release", "1.0"). entity(draftComments, nil). activity(commenting, comment_start, comment_end, nil). used(commenting, draftV1, comm_d1_use, nil). IPAW 2012 - P.Missier wasGeneratedBy(draftComments, commenting, comm_dc_gen, nil). Parser implementation in the ProvToolbox (gitHub) 4 (Thanks to Luc Moreau for the master PROV-N parser code) Wednesday, June 20, 2012
  • 6. IPAW 2012 - P.Missier PROV Constraints 5 Wednesday, June 20, 2012
  • 7. PROV Constraints • PROV-N provides a syntax • PROV comes with a set of rules for the semantics of the model 1.deductive rules 2.constraints: they effectively define consistent provenance IPAW 2012 - P.Missier note: 6 these constraints are still in flux at the time of this presentation Wednesday, June 20, 2012
  • 8. PROV constraints as Datalog rules Goal of this work: • to encode most PROV constraints as Datalog rules – (with some exceptions) • Benefits: – A declarative specification with a deductive inference model – Therefore, a validator for PROV graphs – With a well-understood query model – Useful for rapid prototyping of graph traversal algorithms for provenance analysis IPAW 2012 - P.Missier 7 Note: the implementation is done using the DLV: http://www.dlvsystem.com Wednesday, June 20, 2012
  • 9. PROV constraints as Datalog rules Goal of this work: • to encode most PROV constraints as Datalog rules – (with some exceptions) • Benefits: – A declarative specification with a deductive inference model – Therefore, a validator for PROV graphs – With a well-understood query model – Useful for rapid prototyping of graph traversal algorithms for provenance analysis wasStartedBy wasGeneratedBy used [t_gen] [t_u] a1 e1 a2 [a1Start, a1End] [a2Start, a2End] IPAW 2012 - P.Missier 7 Note: the implementation is done using the DLV: http://www.dlvsystem.com Wednesday, June 20, 2012
  • 10. Example of deductive rules: traceability [1,2] tracedTo(E2, E1):- wasDerivedFrom(E2,E1,_,_). [3] tracedTo(E, Ag) :- wasAttributedTo(E,Ag,_,_). [4] tracedTo(E2, Ag1) :- wasGeneratedBy(E2,A,_,_), wasAttributedTo(E2,Ag2,_,_), ! ! ! actedOnBehalfOf(Ag2,Ag1,A,_). [5] tracedTo(E2, E1) :- wasStartedBy(A,E1,_), wasGeneratedBy(E2,A,_,_). IPAW 2012 - P.Missier [6] tracedTo(E3, E1) :- tracedTo(E3, E2), tracedTo(E2,E1). 8 Wednesday, June 20, 2012
  • 11. Computing the induced graph model: tracedTo(draftV1, alice) [3] attribution, delegation query: tracedTo(draftV1, bob_1) [2] (attribution) tracedTo(draftV2, draftV1) [1] (derivation) tracedTo(E, E1) ? tracedTo(draftV2, alice) [5] transitivity tracedTo(draftV2, bob_1) [5] transitivity tracedTo(bob_2, bob_1) [1] (derivation) wasDerivedFrom wasTracedTo actedOnBehalfOf actedOnBehalfOf Bob-1 Alice Bob-2 wasAttributedTo wasAssociatedWith wasTracedTo wasTracedTo wasTracedTo wasAssociatedWith wasGeneratedBy draft wasGeneratedBy draft drafting editing IPAW 2012 - P.Missier v1 v2 wasDerivedFrom wasTracedTo wasTracedTo 9 Wednesday, June 20, 2012
  • 12. Constraints: example % anti-symmetry of specialization false :- specializationOf(E3,E2), specializationOf(E2,E3), E2 != E3. Interpretation: a Datalog program that satisfies the body of the rule has no model Constraints vs. inference “IF wasGeneratedBy(-;e, -, t1) and wasGeneratedBy(-;e, -, t2) hold, THEN t1=t2.” ⇓ IPAW 2012 - P.Missier false :- wasGeneratedBy(E,_,T1,_), wasGeneratedBy(E,_,T2,_), T1 != T2. 10 Wednesday, June 20, 2012
  • 13. Limitations of constraints mapping ag wasAttributedTo wasAssociatedWith wasGeneratedBy a e - The rule above generates a set of relations - Existential quantification on a - Also, attributes from relations in the body are not merged into new attributes for the head: IPAW 2012 - P.Missier % derivation-use used(A,E1, nil, T) :- wasDerivedFrom(E2, E1,_, Attrs1), wasGeneratedBy( E2, A, Attrs2, T). 11 Wednesday, June 20, 2012
  • 14. Ad hoc provenance analysis -- examples Find all pairs of agents, along with the length of each of the paths amongst them - an embrionic form of “distance” amongst agents to express how strongly they are related wasInformedBy(A2, A1,nil) :- wasGeneratedBy( E, A1, _, _), used( A2, E, _, _). relatedAgents0(Ag2, Ag1) :- wasInformedBy(A2, A1,_), ! wasAssociatedWith(A2,Ag2,_,_), ! wasAssociatedWith(A1,Ag1,_,_). relatedAgents(Ag2, Ag1, 1) :- relatedAgents0(Ag2, Ag1). relatedAgents(Ag3, Ag1, N) :- relatedAgents0(Ag3, Ag2), relatedAgents(Ag2, Ag1, M), #succ(M,N). IPAW 2012 - P.Missier - note: this simple version assumes no cycles 12 Wednesday, June 20, 2012
  • 15. Related agents alice, bob_1, 1 relatedAgents(Ag2, Ag1, N) ? bob_2, alice, 1 bob_2, bob_1, 2 ... relatedAgents [2] specializationOf specializationOf Bob-1 Bob Bob-2 relatedAgents [1] relatedAgents [1] actedOnBehalfOf Alice wasAssociatedWith wasAssociatedWith wasAssociatedWith wasGeneratedBy draft used wasGeneratedBy draft used wasGeneratedBy draft drafting commenting editing v1 comments v2 IPAW 2012 - P.Missier 13 Wednesday, June 20, 2012
  • 16. Chain of responsibility responsible(Ag, Act) :- wasAssociatedWith(Act,Ag,_,_). responsible(Ag1, Act) :- actedOnBehalfOf(Ag,Ag1,_,_), responsible(Ag, Act). alice, drafting alice, commenting responsible(Ag, Act)? alice, editing bob, drafting bob, editing bob_1, drafting bob_2, editing specializationOf specializationOf Bob-1 Bob Bob-2 actedOnBehalfOf Alice wasAssociatedWith responsible IPAW 2012 - P.Missier responsible wasAssociatedWith wasAssociatedWith drafting commenting editing 14 Wednesday, June 20, 2012
  • 17. Encoding temporal constraints wasStartedBy wasGeneratedBy used [t_gen] [t_u] a1 e1 a2 IPAW 2012 - P.Missier [a1Start, a1End] [a2Start, a2End] 15 Wednesday, June 20, 2012
  • 18. Encoding temporal constraints false :- precedes(T1,T2), precedes(T2,T1), T1 != T2. % anti-symmetry precedes(T1,T3) :- precedes(T1,T2), precedes(T2,T3). % transitivity % Generation-precedes-usage precedes(T2,T1) :- used( _, E, _,T1), wasGeneratedBy(E, _, _, T2), T1 != nil, T2 != nil. precedes(T1, UT) :- activity(A, T1, _, _), used(A,_, _,UT), T1 != nil, UT != nil. precedes(UT, T2) :- activity(A, _, T2, _), used(A,_, _,UT), T2 != nil, UT != nil. precedes(ST1, ST2) :- wasStartedBy(A2,A1,_), activity(A1, ST1,_,_), activity(A2, ST2, _, _). wasStartedBy wasGeneratedBy used [t_gen] [t_u] a1 e1 a2 IPAW 2012 - P.Missier [a1Start, a1End] [a2Start, a2End] 15 Wednesday, June 20, 2012
  • 19. Encoding temporal constraints false :- precedes(T1,T2), precedes(T2,T1), T1 != T2. % anti-symmetry precedes(T1,T3) :- precedes(T1,T2), precedes(T2,T3). % transitivity % Generation-precedes-usage precedes(T2,T1) :- used( _, E, _,T1), wasGeneratedBy(E, _, _, T2), T1 != nil, T2 != nil. precedes(T1, UT) :- activity(A, T1, _, _), used(A,_, _,UT), T1 != nil, UT != nil. precedes(UT, T2) :- activity(A, _, T2, _), used(A,_, _,UT), T2 != nil, UT != nil. precedes(ST1, ST2) :- wasStartedBy(A2,A1,_), activity(A1, ST1,_,_), activity(A2, ST2, _, _). precedes(A,B) ? IPAW 2012 - P.Missier 15 Wednesday, June 20, 2012
  • 20. Detecting illegal cycles Derivation cycles are not allowed: derivable(E2, E1) :- wasDerivedFrom(E2, E1,_,_). derivable(E2, E1) :- derivable(E2, E0), derivable(E0, E1). % no-cycles constraint false :- derivable(E2, E1), derivable(E1, E2), E1 != E2. Query: derivable(A,B) ? IPAW 2012 - P.Missier returns no model 16 Wednesday, June 20, 2012
  • 21. Summary • A Datalog encoding for PROVenance graphs – PROV-N mapped to a database of facts – PROV constraints mapped to Datalog rules – implemented using DLV, a former research system with a startup home • Why this is appealing: – straightforward mapping from PROV-N – most constraints easily encoded – well-understood declarative style, well-established computational model – rapid prototyping of graph traversal rules and queries for provenance analysis • Still only a proof of concept – constraints will evolve, W3C Note to be issued with a final encoding – small scale examples. Relies on DLV optimizations, which are untested IPAW 2012 - P.Missier • Potential for a stronger implementation – DLV can be embedded into Java 17 – comes with a variety of front-end reasoners, e.g. constraint solvers Wednesday, June 20, 2012