Gareth millwood interrogating the archived uk web

Interrogating the
archived UK web
“RNIB”
Gareth Millward – gareth.millward@lshtm.ac.uk – Centre for History in Public Health
Improving health worldwide
http:://history.lshtm.ac.uk

“The best-laid schemes
o’ mice an’ men…
• Original plan to investigate
the presence of information
for disabled people on the
UK web
• Also to look at the
accessibility of that info
through Web Accessibility
Standard 1.0 (1998)
• Search for major
organisations and key
disability words
• Run sample through
validation tools
Pieter Bruegel the Elder - The Tower of Babel (Vienna) - Google Art
Project – edited : from Wikipedia

… Gang aft
agley.”
• Far too much stuff!
• Search terms such as “RADAR”,
“SCOPE” and “MIND”
obviously… problematic…
• No discernible pattern from
code validation
• “Experience” of using screen
readers impossible (for now)*
• Defining “information” or
“reach” not a simple task
• Still major problems with
assessing “importance” and
“relevance”
* - At least within design scope of this project… !
Macintosh Performa 5200, a mid-90s Apple
computer. From Wikipedia.

“RNIB”
• A simple four-letter string
• Played a key role in promoting
web standards in Britain
• Just over half a million “hits” –
significant number compared
to other disability
organisations.
RNIB logo © RNIB – RNIB.org.uk

Large number of instances
relative to peers…
Search term Instances
RNIB 516,165
MENCAP 218,439
RNID 217,963
"disability alliance" 22,421
royal association for
disability and
rehabilitation
16,072
BCODP 12,501
UKDPC 2,348
"spinal injuries
association"
45,477
"centre for
independent living"
23,185
"disability benefits
consortium"
2,205
disability 12,909,868
*.* (all) 2,023,288,655
0.00%
0.01%
0.01%
0.02%
0.02%
0.03%
0.03%
0.04%
0.04%
0.05%
0.05%
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Instancesp.a.asperecentageofwholep.a.
Instances of search terms relative to *.*, 1996 - 2010
RNIB MENCAP RNID

… and not all self-
referential
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
Instances per domain as percentage of total for "RNIB"

Predominance of .org.uk
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
.org.uk .co.uk .gov.uk .ac.uk .nhs.uk .parliament.uk
Domains of instances as percentage of total of "RNIB"

The trouble
begins - links
Links to Instances
-> rnib.org.uk 259,421
-> w3.org 71,798
-> mla.gov.uk 34,435
-> openharmonise.org 32,071
-> facebook.com 31,098
• Disaggregated statistics are
basically meaningless
• Second most common link is
to W3.org – had virtually
nothing to do with the actual
activities of RNIB
• openharmonise.org – the CMS
for mla.gov.uk. Reflects
references on MLA site, not
the activity of RNIB

Commensurability goes
out the window..
• Once you start filtering out the
areas that aren’t “really” part
of your search, it becomes
impossible to compare one
search term with another.
• You will lose “useful”
information and keep
“useless” stuff
• Can begin to build a “human
readable” corpus – but what
the heck do I actually have,
here? Certainly not what I
originally intended to look at…
xkcd:Thesis Defence

Whittling down
• REMOVED LINKS TO W3.org (usually just a mention of WAI)
• REMOVED RNIB.org.uk (I can browse the main site – more interested
in external material)
• REMOVED 2009 & 2010 (made the sample smaller, and these use
different crawling system)
• REMOVED RNIB.co.uk
• REMOVED big-print.co.uk
• REMOVED MLA.gov.uk (mentions RNIB a lot, but becomes noise)
• The result of all this? The corpus is down to 71,112
• (Actually, by reducing the date range further and adding a couple of
extra tweaks, now down to 39,270)

What did we learn
today?
• Visible effects of the impact of
RNIB on UK web standards
• Sheer presence suggests RNIB
was better than its peers at
establishing itself on the
internet
• Google has made us me lazy
• An archive without an archivist
or a catalogue is highly
problematic for researchers The British Library – from Wikicommons

Gareth millwood interrogating the archived uk web

Recommended

Recommended

More Related Content

Similar to Gareth millwood interrogating the archived uk web

Similar to Gareth millwood interrogating the archived uk web (20)

More from Digital History

More from Digital History (20)

Recently uploaded

Recently uploaded (20)

Gareth millwood interrogating the archived uk web