SlideShare a Scribd company logo
1 of 89
From Crawling to Walking:
Improving Access to Web Archives
SAA 2014
Session 703
From Crawling to Walking:
Improving Access to Web Archives
1. Jane Zhang
2. Michael Paulmeno
3. Meg Tuomala
4. Benn Joseph
5. Polina Ilieva
6. Jennifer Wright
7. John Bence
8. Olga Virakhovskaya
9. Anna Perricci
10. Rick Fitzgerald
11. Rosalie Lack
Jane Zhang
Catholic University of
America
Web Records,
Web Archived Files, and
Web Archives Access Models
Jane Zhang, Catholic University of America
Session 703 - From Crawling to Walking:
Improving Access to Web Archives
SAA 2014, Washington DC
Saturday, August 16
Introduction
 Web as records
 The Web ARChive files as
recordkeeping formats
 Web archives access models
Web Archiving Initiatives
• A survey on web archiving initiatives
–Daniel Gomes et al., Foundation for
National Scientific Computing, Portuguese
Web Archive Team
–International Conference on Theory and
Practice of Digital Libraries 2011, 25-29
September 2011
• Wikipedia: List of Web archiving
initiatives
Web Archiving Initiatives
 A survey on web archiving initiatives
(2011)
 42 web archiving initiatives worldwide
 9 initiatives from the United States
 List of Web Archiving Initiatives
(July 2014)
 70 web archiving initiatives worldwide
 17 initiatives from the United States
Web File Formats
 2011 Worldwide Survey
 The ARC and WARC formats are
dominant, being used by 54% of the
initiatives.
 2014 List – USA
 10 out of 17 initiatives identified as
using the ARC and/or WARC formats
 58% of the US Web archiving initiatives
Web Archives Access Models
 2011 Worldwide Survey
 89% support access to URL history
 79% enable searching metadata
 67% provide full-text search over archived
content
 2014 List – USA
 URL history: 12 out of 17 – 70%
 Metadata: 13 out of 17 – 76%
 Full-text: 12 out of 17 – 70%
Metadata:
Theme-based Collections
 Collection overview, name, title,
subject, abstract, language, year
captured
 Site title, subject, place, language
 Collection description, keyword, filter
by site title, and/or file type, topic
group
 Catalog records (collection or website)
Metadata:
Provenance-based Collections
 Site owner, business activity, topic, sub-
topic, region, country, language, year
created, date archived
 Collection/series description, site title
 Keyword search, browse by agency
 Collection description, title keyword,
browse by agency name, government
branch, or agency expiration date
 Browse by region, then site owner
Archival
Jane Zhang @ Catholic University of America
zhangj@cua.edu
Thank You
Michael Paulmeno
Delta State University
Accessing Web Archives Through
the Library Catalog
By
Michael Paulmeno
Overview
• Many challenges to making web archives
accessible
• Archival description not fully compatible with
library catalogs
• Problem not unique to web archives
• Differing metadata and content standards lead
to separation between libraries and archives
(i.e. silos)
• Researchers who access archives through
library systems tend to use them longer
1
1 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel
Hill, and NCSU”, 14
The Current State of Affairs
• Collections accessed through access multiple
points
• Subject headings2
• Many organizations create two descriptions
and link via MARC 856 field; this can cause
confusion3
• Yet significant discovery occurs through search
engines4
2 Michelle Mascaro, “Controlled Access Headings in EAD Finding Aids: Current Practices in Number of and Types of Headings
Assigned,” 223.
3 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel
Hill, and NCSU,” 3 –5.
Challenges to Integration
• MARC records lack detail5 6
• Archivists uncertain about readiness to adopt
new standards 7
• Many different systems (ArchivesSpace, Ebsco
Discovery, Blacklight, various Integrated Library
Systems) and metadata standards
• Other challenges specific to web archives
• Ex. How to represent a continuously
accessioned resource?
5 Caprini and Kelcy Shepherd, “The MARC Standard and Encoded Archival Description,” 19.
6 Karen F. Gracy and Frank Lambert, “Who’s Ready to Surf the Next Wave? A Study of Perceived Challenges to Implementing
New and Revised Standards for Archival Description,” 102.
7 Ibid, 117
Towards the Future
• Increasing efforts to integrated archival
description and library catalogs
– University of Denver Penrose Library
8
– Triangle Research Libraries Network
9
– Library of Congress
– UNC Chapel Hill
• Adaptability key to future collaboration
• What affects archives, affects web archives
as well
8 Gregory C. Colati, Katherine M. Crowe, and Elizabeth S. Meagher, “Better, Faster, Stronger: Integrating Archives Processing
and Technical Services.”
9 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel
Hill, and NCSU.”
Works Cited
• Caprini, Peter, and Kelcy Shepherd. “The MARC Standard and Encoded Archival Description.” Library
Hi-Tech 22, no. 1 (2004): 18 –27. doi:10.1108/07378830410524468.
• Gregory C. Colati, Katherine M. Crowe, and Elizabeth S. Meagher. “Better, Faster, Stronger:
Integrating Archives Processing and Technical Services.” Library Resources and Technical Services 53,
no. 4 (October 2009): 261 – 270.
• Karen F. Gracy, and Frank Lambert. “Who’s Ready to Surf the Next Wave? A Study of Perceived
Challenges to Implementing New and Revised Standards for Archival Description.” The American
Archivist 77, no. 1 (Spring/Summer 2014): 96–132.
• Michelle Mascaro. “Controlled Access Headings in EAD Finding Aids: Current Practices in Number of
and Types of Headings Assigned.” Journal of Archival Organization 9, no. 3–4 (January 2011): 208 –
225. doi:10.1080/15332748.2011.643690.
• Noah Huffman. “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface
at Duke, UNC-Chapel Hill, and NCSU.” Journal for the Society of North Carolina Archivists 8, no. 2
(April 2011): 2 – 17.
Meg Tuomala
Gates Archive
Different strokes for
different folks / Meeting the
descriptive & access needs of multiple web
archive collections / With minimal workflow and
process change
Meg Tuomala
Assistant archivist, Gates Archive
Formerly e-records archivist at UNC-Chapel Hill
Web archiving at UNC: context
● Started in 2013; using Archive-it
● 6 web archive collections
● Extension of / supplement to existing
collections
● Special collections at UNC consolidated;
archival & biblio tech services are one
dept
Different folks: the collections
Biblio
● North Carolina Collection
● Rare Book Collection
● Digital Artists’ File
Different strokes: cataloging/
description & access
Biblio
● Bibliographic cataloging
at the item level
○ Catalog record in
Library catalog
catalog record in library catalog
finding aids
Benn Joseph
Northwestern University
Library
WASsup?: Describing Web
Archives Using Archon
SAA Washington, D.C.
August 16, 2014
Benn Joseph
Manuscript Librarian
Northwestern University Library
b-joseph@northwestern.edu
Image of WAS public interface
Item record for crawled site in WAS
NU version of Archon:
• Only used for collection
management
• Separate blacklight/solr public
interface that searches and displays
the finding aids
• Finding aids all live in a fedora
repository
• “Ingest EAD” button added to
Archon, puts xml into fedora to then
be served via finding aids portal
Pic of entering in archon—container
list
Entering WAS site URL as digital object in Archon
NUWA finding aid
NUWA finding aid
Finding aids exported as MODS and ingested by Primo
Benn Joseph
Manuscript Librarian
Northwestern University Library
b-joseph@northwestern.edu
THE END!
Polina Ilieva
University of California,
San Francisco
August 16, 2014
Polina Ilieva, UCSF Archives & Special Collections
Science Online:
Evaluating usage, impact and
appraisal
 Since it’s so easily
accessible, lab websites
are used as reference
tools by lab members
 Sharing datasets
 Channels for scholarly
communications
 After funding ends
website can be the only
place where the data is
preserved and available
Why collect?
 Not just preserved for
future use, scientists
need instant access
 Websites become
integral part of scientific
scholarly output
Impact
Curation and Appraisal
 How to select from hundreds
of labs?
 Web Archive pilot project in
collaboration with the library’s
Research Informationist:
Research @UCSF collection
 Will use UCSF Profiles:
Research Networking and
Expertise Mining Tool
 Collect and analyze info about
faculty and researchers who
lead labs: the length of
service/title, # of scholarly
publications, availability of
websites, grants and awards.
 Protocols
 Data
 Images
 Lectures (a/v)
 Publications
 List of lab members
What to collect?
Access
 Need to know how data and collections are used to
find an optimal way to provide access
Access
Thank you!
Polina E. Ilieva, CA
Head of Archives and
Special Collections
University of California,
San Francisco
polina.ilieva@ucsf.edu
Jennifer Wright
Smithsonian Institution
Archives
Square Peg in a Round Hole:
Integrating Web Archives into
Existing Descriptive Practices
Jennifer Wright
Archives and Information Management Team
Leader
SAA 2014
Session 703
wrightjm@si.edu
siarchives.si.edu
Accession-based Collections
Management
• Each transfer is separate accession
• Each accession cataloged separately in CMS
• Each accession has own finding aid
Solution for websites:
Crawls with similar dates and the same creator are
combined into one accession
Description and Cataloging
• Describes each
website/blog in
accession
• Notes technical and
other issues
• Includes crawl date(s)
• Indexes subjects,
website/blog/
exhibition titles, and
other creators
EAD Finding Aid
• Includes descriptive
data from CMS
• Lists each
website/blog
included in
accession
• Uses DAO tag to
link to crawl on
Archive-It
Search on “Website Records” at
http://siarchives.si.edu/search/sia_search_findingaids
Archive-It
• Browse URLs
• Search across all
Smithsonian
crawls
• Search by
keyword or
limiting options
• Plan to take
better advantage
of metadata
Smithsonian on Archive-It:
https://archive-it.org/organizations/660
John Bence
Emory University
WAS GOING ON AT
EMORY?
Integration of WAS-CDL web archives with
MARBL online finding aids and web presence
John Bence
jbence@emory.edu
@jdbence
54
“Topics” for browsing
sites by creator or by
institutional
hierarchies (Laney
Graduate School;
‘Administration’)
55
Supplied URL from WAS given
a ID and persistent URL. The
URL is then linked in <dao>
element
56
“Digital Materials
Available” banner
indicates existence
of <dao> element
Choosing “Series 3:
Web Archives”
provides link to WAS
site for relevant
content
57
Website migration in
summer 2013 allowed
for integration of WAS
search interface as a
page on MARBL
website
58
• Next steps
• UX testing on finding aids integration vs. local
search page
• Gather (read: develop) additional use analytics
• For more go to:
• http://marbl.library.emory.edu/collections/archives/web.h
tml
• http://findingaids.library.emory.edu/
Google analytics for
search interface from
Feb 2013 to June
2014. Page went live
in June 2013.
• #1 referral:
Redirected URL
of single web
archive
• #2 referral:
MARBL website
search interface
• #3 referral:
finding aids
database
Thanks!
Olga Virakhovskaya
Bentley Historical Library,
University of Michigan
Describing <archived> web content
from single sites to web archives
Olga Virakhovskaya
volga@umich.edu
http://bentley.umich.edu/
Local subject heading (MARC fields 690)
LC subject headings (MARC fields 6xx)
MARC field 260/264
MARC fields 1xx/7xx
MARC fields 520 &
545 / History & Scope
and Content notes
MARC field 245
– Think BIG
– Automate
– Follow standards
– Be consistently clear
– Communicate
e hU a
…because machines don’t know everything
Anna Perricci
Columbia University
Libraries
MARC records for the Contemporary
Composers Web Archive
Anna Perricci
Columbia University Libraries
SAA Lightning Talk (August 16, 2014)
Web Archiving at Columbia
We’ve only got 5 minutes!
• Columbia University
Libraries web archiving
program precedents
• Current Mellon grant
• Collaborative web archiving
Contemporary Composers Web Archive
Selectors
• Borrow Direct Music Librarians Group: music librarians at Brown,
Columbia, Cornell, Dartmouth, Harvard, Johns Hopkins, Princeton,
and Yale universities, MIT, and the universities of Chicago and
Pennsylvania
Cataloging expertise
• Russell Merritt (cataloger specializing in music resources)
• Kate Harcourt (Director of Original and Special Materials Cataloging)
• Alex Thurman (Web Resources Collection Coordinator)
CCWA in Archive-It
Creating MARC records for web archives
• Creating MARC records for
archived websites is
standard practice at CUL
– MARC records make web
archives discoverable in
CLIO (Columbia Libraries
Information Online)
• Collection level and seed
level records
• Will use Archive-It interface
to make Dublin Core records
Patron view of record in CLIO
Cataloger’s view of record in CLIO
Anticipating wider use of MARC records
• Records have been released
to WorldCat
• Collaborators on cataloging
were attentive to which
fields will ordinarily be
stripped out when a MARC
record is imported to
another institution’s OPAC
Conclusions
• So far sample of 10 records
has taught us…
• Positive feedback from
music librarians
• Next we will add another 44
records for the archived
sites in CCWA soon
Thanks!
Anna Perricci
alp2198@columbia.edu
@AnnaPerricci
Columbia University Libraries
Rick Fitzgerald
Library of Congress
Access in Transition:
Rethinking Descriptive Practices for
the LC Web Archives
Migration effort
• Began in 2013, ongoing
• Move web archives from stand-alone web
application at http://loc.gov/lcwa to library-
wide discovery system at
http://loc.gov/websites/
• Metadata and content migration
• Cross-functional team effort
Interface - before and after
New Possibilities
• Web archives discoverable alongside other LC
collections for first time
• Web archives searchable from LC main page
for first time – greater visibility
• Consistent navigation, look and feel mirrors LC
website
Integrated into search
New Challenges
• Thousands of MODS records already created
for access, how to repurpose?
• Different interfaces, different needs
• Enable new ideas (combined records)
• Keeping useful elements, old and new
Thanks!
Rick Fitzgerald (rfit@loc.gov)
Rosalie Lack
Califronia Digital Library
Web Archiving Service (WAS)
From Crawling to Walking:
Improving Access to Web Archives
SAA 2014
Rosalie Lack
rosalie.lack@ucop.edu
SAA Web Archiving Roundtable
Follow the blog!
• http://webarchivingrt.wordpress.com/
Learn more!
• http://www2.archivists.org/groups/web-
archiving-roundtable
Tearing Down Silos
What We’re Doing
• Creating finding aids for each web archive
• Adding links to existing finding aids for the
relevant archived sites
• Providing a web archive collection search page
• Uploading records into library catalogs
• Sending records to OCLC
• Building collaborative collections and providing
unified access
• Integrating access with other formats in our
discovery systems
What Else Should We Be Doing?
Open Discussion
Image credits
Title: The razing of silos on the former Roy Ranch, San Geronimo,
California, May, 1964 [photograph]
Creator/Contributor: unknown
Date: May, 1964
Contributing Institution: Marin County Free Library
http://content.cdlib.org/ark:/13030/kt3489r96r/?order=1
http://content.cdlib.org/ark:/13030/kt067nf0kk/?order=1
http://content.cdlib.org/ark:/13030/kt467nf1dq/?order=1
Thank you!

More Related Content

What's hot

Digital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project FeedbackDigital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project Feedbackjisc-elearning
 
Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012Anna Perricci
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
 
ALA 2010 -- Jane Burke
ALA 2010 -- Jane BurkeALA 2010 -- Jane Burke
ALA 2010 -- Jane Burkebisg
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsAnna Perricci
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataShenghui Wang
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic librarieslisld
 
What Does It Mean to Have Collections?
What Does It Mean to Have Collections?What Does It Mean to Have Collections?
What Does It Mean to Have Collections?Karen S Calhoun
 
[[edit]] this GLAM
[[edit]] this GLAM[[edit]] this GLAM
[[edit]] this GLAMwittylama
 
Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...lisld
 
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...Getaneh Alemu
 
Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Guus van den Brekel
 
Putting Research Data into Context: A Scholarly Approach to Curating Data for...
Putting Research Data into Context: A Scholarly Approach to Curating Data for...Putting Research Data into Context: A Scholarly Approach to Curating Data for...
Putting Research Data into Context: A Scholarly Approach to Curating Data for...OCLC
 

What's hot (20)

Digital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project FeedbackDigital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project Feedback
 
Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
 
NISO Webinar: The Future of Integrated Library Systems PART 2: User Interaction
NISO Webinar: The Future of Integrated Library Systems PART 2: User InteractionNISO Webinar: The Future of Integrated Library Systems PART 2: User Interaction
NISO Webinar: The Future of Integrated Library Systems PART 2: User Interaction
 
ALA 2010 -- Jane Burke
ALA 2010 -- Jane BurkeALA 2010 -- Jane Burke
ALA 2010 -- Jane Burke
 
What Libraries Still Need from Discovery Layers
What Libraries Still Need from Discovery LayersWhat Libraries Still Need from Discovery Layers
What Libraries Still Need from Discovery Layers
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive Awards
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
 
A Life Well Lived: Looking Backwards and Forwards and Sideways Too
A Life Well Lived: Looking Backwards and Forwards and Sideways TooA Life Well Lived: Looking Backwards and Forwards and Sideways Too
A Life Well Lived: Looking Backwards and Forwards and Sideways Too
 
Marliese Thomas CV
Marliese Thomas CVMarliese Thomas CV
Marliese Thomas CV
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic libraries
 
Discovering Our Way
Discovering Our WayDiscovering Our Way
Discovering Our Way
 
What Does It Mean to Have Collections?
What Does It Mean to Have Collections?What Does It Mean to Have Collections?
What Does It Mean to Have Collections?
 
[[edit]] this GLAM
[[edit]] this GLAM[[edit]] this GLAM
[[edit]] this GLAM
 
Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...
 
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
 
Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0
 
Putting Research Data into Context: A Scholarly Approach to Curating Data for...
Putting Research Data into Context: A Scholarly Approach to Curating Data for...Putting Research Data into Context: A Scholarly Approach to Curating Data for...
Putting Research Data into Context: A Scholarly Approach to Curating Data for...
 

Similar to Improving Access to Web Archives: Models, Metadata, and Integration

Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...Anna Perricci
 
Strategies for LIS research
Strategies for LIS researchStrategies for LIS research
Strategies for LIS researchVaralakshmiRSR
 
Building Web Archiving Collaborations to Save [More of] the Web
Building Web Archiving Collaborations to Save [More of] the WebBuilding Web Archiving Collaborations to Save [More of] the Web
Building Web Archiving Collaborations to Save [More of] the WebAnna Perricci
 
Wikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiWikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiJake Orlowitz
 
LSC Glasgow 061609
LSC Glasgow 061609LSC Glasgow 061609
LSC Glasgow 061609John MacColl
 
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...tfons
 
Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Anna Perricci
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)Charleston Conference
 
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...NASIG
 
The Charlotte Initiative on eBook Principles: A Mellon Funded Project
The Charlotte Initiative on eBook Principles: A Mellon Funded ProjectThe Charlotte Initiative on eBook Principles: A Mellon Funded Project
The Charlotte Initiative on eBook Principles: A Mellon Funded ProjectCharleston Conference
 
Cambridge university library ess update for ucs
Cambridge university library  ess update for ucsCambridge university library  ess update for ucs
Cambridge university library ess update for ucsEdmund Chamberlain
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findingsalc28
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentConstance Malpas
 
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository MeetingNetworking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository MeetingKaren S Calhoun
 
The Evolving Collection and Shift to Open
The Evolving Collection and Shift to OpenThe Evolving Collection and Shift to Open
The Evolving Collection and Shift to OpenLynn Connaway
 
Fuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network FlowFuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network Flowkramsey
 
learning_in_retirement.pptyaaaaaaaaaaaaa
learning_in_retirement.pptyaaaaaaaaaaaaalearning_in_retirement.pptyaaaaaaaaaaaaa
learning_in_retirement.pptyaaaaaaaaaaaaagpdifiladelfiajembat
 

Similar to Improving Access to Web Archives: Models, Metadata, and Integration (20)

"In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today..."In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
 
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
 
Strategies for LIS research
Strategies for LIS researchStrategies for LIS research
Strategies for LIS research
 
Building Web Archiving Collaborations to Save [More of] the Web
Building Web Archiving Collaborations to Save [More of] the WebBuilding Web Archiving Collaborations to Save [More of] the Web
Building Web Archiving Collaborations to Save [More of] the Web
 
Wikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiWikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s Visibilityi
 
LSC Glasgow 061609
LSC Glasgow 061609LSC Glasgow 061609
LSC Glasgow 061609
 
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
 
Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)
 
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
 
The Charlotte Initiative on eBook Principles: A Mellon Funded Project
The Charlotte Initiative on eBook Principles: A Mellon Funded ProjectThe Charlotte Initiative on eBook Principles: A Mellon Funded Project
The Charlotte Initiative on eBook Principles: A Mellon Funded Project
 
Drupal and Libraries
Drupal and LibrariesDrupal and Libraries
Drupal and Libraries
 
Cambridge university library ess update for ucs
Cambridge university library  ess update for ucsCambridge university library  ess update for ucs
Cambridge university library ess update for ucs
 
WorldCat Local: Global Network, Local Results
WorldCat Local: Global Network, Local ResultsWorldCat Local: Global Network, Local Results
WorldCat Local: Global Network, Local Results
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findings
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environment
 
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository MeetingNetworking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
 
The Evolving Collection and Shift to Open
The Evolving Collection and Shift to OpenThe Evolving Collection and Shift to Open
The Evolving Collection and Shift to Open
 
Fuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network FlowFuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network Flow
 
learning_in_retirement.pptyaaaaaaaaaaaaa
learning_in_retirement.pptyaaaaaaaaaaaaalearning_in_retirement.pptyaaaaaaaaaaaaa
learning_in_retirement.pptyaaaaaaaaaaaaa
 

Recently uploaded

Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptxogubuikealex
 
Application of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxApplication of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxRoquia Salam
 
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...Sebastiano Panichella
 
proposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeegerproposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeegerkumenegertelayegrama
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxAsifArshad8
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitysandeepnani2260
 
A Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air CoolerA Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air Coolerenquirieskenstar
 
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptxerickamwana1
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEMCharmi13
 
Internship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SEInternship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SESaleh Ibne Omar
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz
 
GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRRsarwankumar4524
 
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityDon't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityApp Ethena
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Sebastiano Panichella
 
General Elections Final Press Noteas per M
General Elections Final Press Noteas per MGeneral Elections Final Press Noteas per M
General Elections Final Press Noteas per MVidyaAdsule1
 

Recently uploaded (17)

Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptx
 
Application of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxApplication of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptx
 
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
 
proposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeegerproposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeeger
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber security
 
A Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air CoolerA Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air Cooler
 
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEM
 
Internship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SEInternship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SE
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
 
GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
 
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityDon't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
 
General Elections Final Press Noteas per M
General Elections Final Press Noteas per MGeneral Elections Final Press Noteas per M
General Elections Final Press Noteas per M
 

Improving Access to Web Archives: Models, Metadata, and Integration

  • 1. From Crawling to Walking: Improving Access to Web Archives SAA 2014 Session 703
  • 2. From Crawling to Walking: Improving Access to Web Archives 1. Jane Zhang 2. Michael Paulmeno 3. Meg Tuomala 4. Benn Joseph 5. Polina Ilieva 6. Jennifer Wright 7. John Bence 8. Olga Virakhovskaya 9. Anna Perricci 10. Rick Fitzgerald 11. Rosalie Lack
  • 4. Web Records, Web Archived Files, and Web Archives Access Models Jane Zhang, Catholic University of America Session 703 - From Crawling to Walking: Improving Access to Web Archives SAA 2014, Washington DC Saturday, August 16
  • 5. Introduction  Web as records  The Web ARChive files as recordkeeping formats  Web archives access models
  • 6. Web Archiving Initiatives • A survey on web archiving initiatives –Daniel Gomes et al., Foundation for National Scientific Computing, Portuguese Web Archive Team –International Conference on Theory and Practice of Digital Libraries 2011, 25-29 September 2011 • Wikipedia: List of Web archiving initiatives
  • 7. Web Archiving Initiatives  A survey on web archiving initiatives (2011)  42 web archiving initiatives worldwide  9 initiatives from the United States  List of Web Archiving Initiatives (July 2014)  70 web archiving initiatives worldwide  17 initiatives from the United States
  • 8. Web File Formats  2011 Worldwide Survey  The ARC and WARC formats are dominant, being used by 54% of the initiatives.  2014 List – USA  10 out of 17 initiatives identified as using the ARC and/or WARC formats  58% of the US Web archiving initiatives
  • 9. Web Archives Access Models  2011 Worldwide Survey  89% support access to URL history  79% enable searching metadata  67% provide full-text search over archived content  2014 List – USA  URL history: 12 out of 17 – 70%  Metadata: 13 out of 17 – 76%  Full-text: 12 out of 17 – 70%
  • 10. Metadata: Theme-based Collections  Collection overview, name, title, subject, abstract, language, year captured  Site title, subject, place, language  Collection description, keyword, filter by site title, and/or file type, topic group  Catalog records (collection or website)
  • 11. Metadata: Provenance-based Collections  Site owner, business activity, topic, sub- topic, region, country, language, year created, date archived  Collection/series description, site title  Keyword search, browse by agency  Collection description, title keyword, browse by agency name, government branch, or agency expiration date  Browse by region, then site owner
  • 12. Archival Jane Zhang @ Catholic University of America zhangj@cua.edu Thank You
  • 14. Accessing Web Archives Through the Library Catalog By Michael Paulmeno
  • 15. Overview • Many challenges to making web archives accessible • Archival description not fully compatible with library catalogs • Problem not unique to web archives • Differing metadata and content standards lead to separation between libraries and archives (i.e. silos) • Researchers who access archives through library systems tend to use them longer 1 1 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU”, 14
  • 16. The Current State of Affairs • Collections accessed through access multiple points • Subject headings2 • Many organizations create two descriptions and link via MARC 856 field; this can cause confusion3 • Yet significant discovery occurs through search engines4 2 Michelle Mascaro, “Controlled Access Headings in EAD Finding Aids: Current Practices in Number of and Types of Headings Assigned,” 223. 3 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU,” 3 –5.
  • 17. Challenges to Integration • MARC records lack detail5 6 • Archivists uncertain about readiness to adopt new standards 7 • Many different systems (ArchivesSpace, Ebsco Discovery, Blacklight, various Integrated Library Systems) and metadata standards • Other challenges specific to web archives • Ex. How to represent a continuously accessioned resource? 5 Caprini and Kelcy Shepherd, “The MARC Standard and Encoded Archival Description,” 19. 6 Karen F. Gracy and Frank Lambert, “Who’s Ready to Surf the Next Wave? A Study of Perceived Challenges to Implementing New and Revised Standards for Archival Description,” 102. 7 Ibid, 117
  • 18. Towards the Future • Increasing efforts to integrated archival description and library catalogs – University of Denver Penrose Library 8 – Triangle Research Libraries Network 9 – Library of Congress – UNC Chapel Hill • Adaptability key to future collaboration • What affects archives, affects web archives as well 8 Gregory C. Colati, Katherine M. Crowe, and Elizabeth S. Meagher, “Better, Faster, Stronger: Integrating Archives Processing and Technical Services.” 9 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU.”
  • 19. Works Cited • Caprini, Peter, and Kelcy Shepherd. “The MARC Standard and Encoded Archival Description.” Library Hi-Tech 22, no. 1 (2004): 18 –27. doi:10.1108/07378830410524468. • Gregory C. Colati, Katherine M. Crowe, and Elizabeth S. Meagher. “Better, Faster, Stronger: Integrating Archives Processing and Technical Services.” Library Resources and Technical Services 53, no. 4 (October 2009): 261 – 270. • Karen F. Gracy, and Frank Lambert. “Who’s Ready to Surf the Next Wave? A Study of Perceived Challenges to Implementing New and Revised Standards for Archival Description.” The American Archivist 77, no. 1 (Spring/Summer 2014): 96–132. • Michelle Mascaro. “Controlled Access Headings in EAD Finding Aids: Current Practices in Number of and Types of Headings Assigned.” Journal of Archival Organization 9, no. 3–4 (January 2011): 208 – 225. doi:10.1080/15332748.2011.643690. • Noah Huffman. “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU.” Journal for the Society of North Carolina Archivists 8, no. 2 (April 2011): 2 – 17.
  • 21. Different strokes for different folks / Meeting the descriptive & access needs of multiple web archive collections / With minimal workflow and process change Meg Tuomala Assistant archivist, Gates Archive Formerly e-records archivist at UNC-Chapel Hill
  • 22. Web archiving at UNC: context ● Started in 2013; using Archive-it ● 6 web archive collections ● Extension of / supplement to existing collections ● Special collections at UNC consolidated; archival & biblio tech services are one dept
  • 23. Different folks: the collections Biblio ● North Carolina Collection ● Rare Book Collection ● Digital Artists’ File
  • 24. Different strokes: cataloging/ description & access Biblio ● Bibliographic cataloging at the item level ○ Catalog record in Library catalog
  • 25. catalog record in library catalog finding aids
  • 27. WASsup?: Describing Web Archives Using Archon SAA Washington, D.C. August 16, 2014 Benn Joseph Manuscript Librarian Northwestern University Library b-joseph@northwestern.edu
  • 28. Image of WAS public interface
  • 29. Item record for crawled site in WAS
  • 30. NU version of Archon: • Only used for collection management • Separate blacklight/solr public interface that searches and displays the finding aids • Finding aids all live in a fedora repository • “Ingest EAD” button added to Archon, puts xml into fedora to then be served via finding aids portal
  • 31. Pic of entering in archon—container list
  • 32. Entering WAS site URL as digital object in Archon
  • 35. Finding aids exported as MODS and ingested by Primo
  • 36. Benn Joseph Manuscript Librarian Northwestern University Library b-joseph@northwestern.edu THE END!
  • 37. Polina Ilieva University of California, San Francisco
  • 38. August 16, 2014 Polina Ilieva, UCSF Archives & Special Collections Science Online: Evaluating usage, impact and appraisal
  • 39.  Since it’s so easily accessible, lab websites are used as reference tools by lab members  Sharing datasets  Channels for scholarly communications  After funding ends website can be the only place where the data is preserved and available Why collect?
  • 40.  Not just preserved for future use, scientists need instant access  Websites become integral part of scientific scholarly output Impact
  • 41. Curation and Appraisal  How to select from hundreds of labs?  Web Archive pilot project in collaboration with the library’s Research Informationist: Research @UCSF collection  Will use UCSF Profiles: Research Networking and Expertise Mining Tool  Collect and analyze info about faculty and researchers who lead labs: the length of service/title, # of scholarly publications, availability of websites, grants and awards.
  • 42.  Protocols  Data  Images  Lectures (a/v)  Publications  List of lab members What to collect?
  • 44.  Need to know how data and collections are used to find an optimal way to provide access Access
  • 45. Thank you! Polina E. Ilieva, CA Head of Archives and Special Collections University of California, San Francisco polina.ilieva@ucsf.edu
  • 47. Square Peg in a Round Hole: Integrating Web Archives into Existing Descriptive Practices Jennifer Wright Archives and Information Management Team Leader SAA 2014 Session 703 wrightjm@si.edu siarchives.si.edu
  • 48. Accession-based Collections Management • Each transfer is separate accession • Each accession cataloged separately in CMS • Each accession has own finding aid Solution for websites: Crawls with similar dates and the same creator are combined into one accession
  • 49. Description and Cataloging • Describes each website/blog in accession • Notes technical and other issues • Includes crawl date(s) • Indexes subjects, website/blog/ exhibition titles, and other creators
  • 50. EAD Finding Aid • Includes descriptive data from CMS • Lists each website/blog included in accession • Uses DAO tag to link to crawl on Archive-It Search on “Website Records” at http://siarchives.si.edu/search/sia_search_findingaids
  • 51. Archive-It • Browse URLs • Search across all Smithsonian crawls • Search by keyword or limiting options • Plan to take better advantage of metadata Smithsonian on Archive-It: https://archive-it.org/organizations/660
  • 53. WAS GOING ON AT EMORY? Integration of WAS-CDL web archives with MARBL online finding aids and web presence John Bence jbence@emory.edu @jdbence
  • 54. 54 “Topics” for browsing sites by creator or by institutional hierarchies (Laney Graduate School; ‘Administration’)
  • 55. 55 Supplied URL from WAS given a ID and persistent URL. The URL is then linked in <dao> element
  • 56. 56 “Digital Materials Available” banner indicates existence of <dao> element Choosing “Series 3: Web Archives” provides link to WAS site for relevant content
  • 57. 57 Website migration in summer 2013 allowed for integration of WAS search interface as a page on MARBL website
  • 58. 58 • Next steps • UX testing on finding aids integration vs. local search page • Gather (read: develop) additional use analytics • For more go to: • http://marbl.library.emory.edu/collections/archives/web.h tml • http://findingaids.library.emory.edu/ Google analytics for search interface from Feb 2013 to June 2014. Page went live in June 2013. • #1 referral: Redirected URL of single web archive • #2 referral: MARBL website search interface • #3 referral: finding aids database Thanks!
  • 59. Olga Virakhovskaya Bentley Historical Library, University of Michigan
  • 60. Describing <archived> web content from single sites to web archives Olga Virakhovskaya volga@umich.edu http://bentley.umich.edu/
  • 61. Local subject heading (MARC fields 690) LC subject headings (MARC fields 6xx) MARC field 260/264 MARC fields 1xx/7xx MARC fields 520 & 545 / History & Scope and Content notes MARC field 245
  • 62. – Think BIG – Automate – Follow standards – Be consistently clear – Communicate e hU a …because machines don’t know everything
  • 64. MARC records for the Contemporary Composers Web Archive Anna Perricci Columbia University Libraries SAA Lightning Talk (August 16, 2014)
  • 65. Web Archiving at Columbia We’ve only got 5 minutes! • Columbia University Libraries web archiving program precedents • Current Mellon grant • Collaborative web archiving
  • 66. Contemporary Composers Web Archive Selectors • Borrow Direct Music Librarians Group: music librarians at Brown, Columbia, Cornell, Dartmouth, Harvard, Johns Hopkins, Princeton, and Yale universities, MIT, and the universities of Chicago and Pennsylvania Cataloging expertise • Russell Merritt (cataloger specializing in music resources) • Kate Harcourt (Director of Original and Special Materials Cataloging) • Alex Thurman (Web Resources Collection Coordinator)
  • 68. Creating MARC records for web archives • Creating MARC records for archived websites is standard practice at CUL – MARC records make web archives discoverable in CLIO (Columbia Libraries Information Online) • Collection level and seed level records • Will use Archive-It interface to make Dublin Core records
  • 69. Patron view of record in CLIO
  • 70. Cataloger’s view of record in CLIO
  • 71. Anticipating wider use of MARC records • Records have been released to WorldCat • Collaborators on cataloging were attentive to which fields will ordinarily be stripped out when a MARC record is imported to another institution’s OPAC
  • 72. Conclusions • So far sample of 10 records has taught us… • Positive feedback from music librarians • Next we will add another 44 records for the archived sites in CCWA soon
  • 75. Access in Transition: Rethinking Descriptive Practices for the LC Web Archives
  • 76. Migration effort • Began in 2013, ongoing • Move web archives from stand-alone web application at http://loc.gov/lcwa to library- wide discovery system at http://loc.gov/websites/ • Metadata and content migration • Cross-functional team effort
  • 77. Interface - before and after
  • 78. New Possibilities • Web archives discoverable alongside other LC collections for first time • Web archives searchable from LC main page for first time – greater visibility • Consistent navigation, look and feel mirrors LC website
  • 80. New Challenges • Thousands of MODS records already created for access, how to repurpose? • Different interfaces, different needs • Enable new ideas (combined records) • Keeping useful elements, old and new
  • 82. Rosalie Lack Califronia Digital Library Web Archiving Service (WAS)
  • 83. From Crawling to Walking: Improving Access to Web Archives SAA 2014 Rosalie Lack rosalie.lack@ucop.edu
  • 84. SAA Web Archiving Roundtable Follow the blog! • http://webarchivingrt.wordpress.com/ Learn more! • http://www2.archivists.org/groups/web- archiving-roundtable
  • 86. What We’re Doing • Creating finding aids for each web archive • Adding links to existing finding aids for the relevant archived sites • Providing a web archive collection search page • Uploading records into library catalogs • Sending records to OCLC • Building collaborative collections and providing unified access • Integrating access with other formats in our discovery systems
  • 87. What Else Should We Be Doing? Open Discussion
  • 88. Image credits Title: The razing of silos on the former Roy Ranch, San Geronimo, California, May, 1964 [photograph] Creator/Contributor: unknown Date: May, 1964 Contributing Institution: Marin County Free Library http://content.cdlib.org/ark:/13030/kt3489r96r/?order=1 http://content.cdlib.org/ark:/13030/kt067nf0kk/?order=1 http://content.cdlib.org/ark:/13030/kt467nf1dq/?order=1