1. ResourceSync - An Introduction
Todd Carpenter
Executive Director, NISO
ALCTS Continuing Resources Standards Forum
Sunday, June 24, 2012
With thanks to Herbert Van de Sompel and Robert Sanderson (LANL)
2. @TAC_NISO Twitter Highlights
• Presenting this morning on the ResourceSync project at ALCTS Continuing Resources Standards
Forum #ALCTSCRS #ala12
• I’m pre-tweeing my slides during #rsync presentation. Slides here: _________ #ala12
• NISO mission is to develop and maintain technical standards related to information, documentation,
discovery and distribution of content #ala12
• Standards are all around us, even if we don't notice them, especially in books. Things like page
numbers, paper, binding, even spelling is standardized. #NISO #ala12
• Machines don’t talk like people do. Then again some people don’t talk like other people do,
particularly teenagers #ala12
• So where did the ResourceSync project start? #NISO approached OAI about updating the PMH
protocol. #ala12
• The #NISO / OAI ResourceSync project was possible through the generous support of the Alfred P.
Sloan Foundation. Thank you! #ala12
• What is RSync trying to solve: Source Server has resources that change. Destination servers want to
leverage some or all of Source on regular ongoing basis in near-real-time & at web scale. #ala12
• Syntonization can be good enough or perfect and synchronization can be fast or fast enough. #ala12
• RSync is studying a number of existing protocols to determine which (or combination of) protocols
can best meet needs. We have an bias against developing new spec from scratch. #ala12
• There are several models for synchronizing content: pull, push, conditional pull, mediated feed and pull,
and a mix of feed/push/pull/service models. #ala12
• The goal of ResourceSync is to find the model that most efficiently distributes the content, while
limiting the tax on the source system. #ala12
• This is very early days in the process of standards development. We’re still in the incubation stage.
Consensus and adoption phases will come in 2013 and beyond. #ala12
• We hope to have a beta specification available by the end of 2012 of ResourceSync #ala12
3. About
Non-profit industry trade association
accredited by ANSI
Mission of developing and maintaining technical
standards related to information,
documentation, discovery and distribution of
published materials and media
Volunteer driven organization: 400+ spread out
across the world
4. Standards
are
familiar,
even
if
you
don’t
no4ce
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 4
5. Machines don’t talk like people do
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 5
6. Machines talk like this
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 6
7. How
did
we
get
here?
• OAI-‐PMH
Protocol
– Developed
in
200X
– Developed
by
Herbert
van
de
Sompel,
Carl
Lagoze
and
the
OAI
team
– Fairly
wide
adopQon
in
scholarly
community
• In
spring
2011,
NISO
approached
OAI
to
discuss
updaQng
PMH
Protocol
• Response
was
“Let’s
try
something
else
more
in
line
with
more
modern
technology”
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 7
8. A partnership is born
• Agreement to launch RSync as a
NISO standards process
• Partnership on grant application
• OAI team comprised core
technology team
• Partnership on grant application
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 8
9. Special
thanks
are
due
to...
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 9
10. What
we
trying
to
solve?
Consideration:
Source (server) A has resources that change over time: they get
created, modified, deleted, moved, …
Destination (servers) X, Y, and Z leverage (some) resources of
Source A.
Problem:
Destinations want to keep in step with the resource changes at
Source A: resource synchronization.
Task of ResourceSync effort:
Design an approach for resource synchronization aligned with the
Web Architecture that has a fair chance of adoption by different
communities.
The approach must scale better than recurrent HTTP HEAD/
GET on resources.
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 10
11. Use
cases
differ
How good is the synchronization?
Perfect Good
enough
How fast is the synchronization?
Fast Fast
enough
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 11
12. 3
disQnct
needs
regarding
resource
synchronizaQon
Baseline
matching:
An
approach
to
allow
a
DesQnaQon
that
wants
to
start
synchronizing
with
a
Source
to
perform
an
iniQal
catch
up
–
Dump.
Incremental
resource
synchronizaQon:
An
approach
to
allow
a
DesQnaQon
to
remain
up-‐to-‐date
regarding
changes
at
the
Source.
Audit:
An
approach
to
allow
checking
whether
a
DesQnaQon
is
in
sync
with
a
Source
–
Inventory.
=>
All
3
are
considered
in
scope
for
the
effort
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 12
13. ResourceSync
Working
Group
Herbert Van de Sompel (Chair) Stuart Lewis
Los Alamos National Laboratory Joint Information Systems Committee (JISC)
Todd Carpenter (Co-Chair) Peter Murray
National Information Standards Organization (NISO) Lyrasis
Nettie Lagace Michael Nelson
National Information Standards Organization (NISO) Old Dominion University
David Rosenthal
Manuel Bernhardt Stanford University
Delving B.V.
Christian Sadilek
Kevin Ford Red Hat
Library of Congress
Shlomo Sanders
Bernhard Haslhofer Ex Libris, Inc.
Cornell University
Robert Sanderson
Richard Jones Los Alamos National Laboratory
Joint Information Systems Committee (JISC)
Sjoerd Siebinga
Martin Klein Delving B.V.
Los Alamos National Laboratory
Ed Summers
Graham Klyne Library of Congress
Joint Information Systems Committee (JISC)
Simeon Warner
Carl Lagoze Cornell University
Cornell University
Jeff Young
OCLC Online Computer Library Center
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 13
15. Change
NoQficaQon
-‐
Protocols
Atom PubSubHubbub (PuSH)
XMPP
PubSub extension
BoSH (XMPP over HTTP)
Comet / HTTP Streaming
Open an HTTP connection and keep reading from it
Bayeux Protocol
Long Polling
Keep HTTP connection open until a message, then reopen
BoSH, Bayeux option
WebSockets
NullMQ / ZeroMQ
XMPP over WebSockets?
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 15
16. Incremental
Synchroniza9on
Change
NoQficaQon
(CN)
Alert
that
something
happened
(create,update,delete)
Content
Transfer
(CT)
Transfer
of
just
the
change
or
the
full
resource
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 16
17. Trivial
versus
OpQmal
Approaches
• Trivial
Approach
-‐
Retrieve
&
Compare
• OpQmal
Approach
-‐
push only the change to only the
destinations monitoring the resource
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 17
18. More
advanced
opQons
• Trivial
Approach
plus
CondiQonal
GET:
– Retrieve
every
resource
if
it
has
changed
– EssenQally
this
is
a
Change
NoQficaQon
Pull
– Not
scalable,
strain
on
Source
Systems,
no
way
to
know
of
new
resources
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 18
19. More
advanced
opQons
Simplest
Workable
Model:
Introduce
a
Feed
of
change
noQficaQons
for
all
resources
Atom,
RSS,
OAI-‐PMH,
SiteMaps,
etc
=>SQll
not
efficient,
no
way
to
know
when
to
pull
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 19
20. More
advanced
opQons
Feed
Extension
SoluQon:
ConQnue
the
Feed
paradigm,
but
introduce
aggregaQng
service
and
ping
noQficaQon
to
re-‐pull
(simulated
push)
Only
advantageous
if
Source
already
supports
a
Feed
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 20
21. The
lifecycle
of
standards
You
are
here
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 21
22. Ongoing
Research
• Change
NoQficaQon
-‐
XMPP
&
XMPP
PubSub
&
bleeps
– LANL
– Ongoing
Experiment
with
Live
DBPedia
• Change
NoQficaQon
-‐
Comet
/
HTTP
Streaming
&
bleeps
– ODU
– Bayeux
Protocol
via
Faye
ImplementaQon
• Change
NoQficaQon
-‐
Change
Simulator
– Cornell
U
– Generate
configurable
change
noQficaQons
– Use
as
standardized
input
to
different
systems
for
tesQng
• Baseline
Matching
&
Audit
– Cornell
U
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 22
23. Timeline
• Project
Launch
=
November
2011
• Approved
work
item
=
December
2011
• Working
Group
formed
=
February
2012
• Webinar
on
project
=
March
2012
• JCDL
meeQng,
Washington
DC
=
June
2012
• Alpha
=
??
September
2012
• Beta/Dran
for
trail
use
=
??
December
2012
• Comment
period
=
??
December
2012
-‐
March
2012
• Training
=
??
May
-‐
July
2013
• Approval
=
??
December
2013
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 23
24. Thank you!
Todd Carpenter, Executive Director
tcarpenter@niso.org
National Information Standards Organization (NISO)
One North Charles Street, Suite 1905
Baltimore, MD 21201 USA
+1 (301) 654-2512 NOTE
=>NISO
IS
MOVING
IN
JULY
2012
<=
www.niso.org
June
23,
2012 ALCTS
CRS
Standards
IG
-‐
ALA
Annual
2012 24