A talk given in January 2012 at a wonderful conference organized in Zakopane, Poland, by colleagues from the erstwhile GridLab project. I talked about how increasing data volumes demand radically new approaches to delivering research computing. Lively discussion ensued.
4. Exploding data volumes in climate science
2004: 36 TB
2012: 2,300 TB
Climate
model intercomparison
project (CMIP) of the IPCC
www.ci.anl.gov
4
www.ci.uchicago.edu
5. The challenge of staying competitive
"Well, in our country," said Alice …
"you'd generally get to somewhere
else — if you run very fast for a
long time, as we've been doing.”
"A slow sort of country!" said the
Queen. "Now, here, you see, it
takes all the running you can do, to
keep in the same place. If you want
to get somewhere else, you must run
at least twice as fast as that!"
www.ci.anl.gov
5
www.ci.uchicago.edu
6. Ways of running faster (1)
Civilization advances by
extending the number of
important operations
which we can perform without
thinking about them
Alfred North Whitehead, 1911
Enhance human capabilities
www.ci.anl.gov
6
www.ci.uchicago.edu
7. Ways of running faster (2)
Utility computing
“[t]he computing utility could become the basis for
a new and important industry” – McCarthy, 1960
Outsource
automatable Grid computing
tasks “provide access to computing on demand” – The
Grid: Blueprint for a New Computing Inf., 1999
Cloud computing
“delivery of computing as a service rather than a
product” *Wikipedia, 2012+
Enhance human capabilities
www.ci.anl.gov
7
www.ci.uchicago.edu
8. Ways of running faster (3)
Collaboratories, P2P, crowdsourcing
Virtual organizations
Outsource
“flexible, secure, coordinated resource sharing
automatable
among dynamic collections of individuals,
tasks
institutions, and resources”, Anatomy of Grid, 2001
Join forces
with others
Enhance human capabilities
www.ci.anl.gov
8
www.ci.uchicago.edu
9. Big science has been keeping up
OSG: 1.4M CPU-hours/day,
>90 sites, >3000 users,
>260 pubs in 2010
LIGO: 1 PB data in last science
run, distributed worldwide
Robust production solutions
Substantial teams and expense
Sustained, multi-year effort
Application-specific solutions,
built on common technology ESG: 1.2 PB climate data
delivered to 23,000 users; 600+ pubs
www.ci.anl.gov
9
www.ci.uchicago.edu
10. But small science is struggling
More data, more complex data
Ad-hoc solutions
Inadequate software, hardware
Data plan mandates
www.ci.anl.gov
10
www.ci.uchicago.edu
11. Medium science struggles too
• Dark Energy Survey Blanco 4m on Cerro Tololo
receives 100,000 files
each night in Illinois
• They transmit files to
Texas for analysis …
then move results back
to Illinois
• Process must be reliable,
routine, and efficient
• The IT team is not large Image credit: Roger Smith/NOAO/AURA/NSF
www.ci.anl.gov
11
www.ci.uchicago.edu
12. Science IT crisis demands new approaches
• We have exceptional infrastructure for the 1%
(e.g., supercomputers, LHC, …)
• But not for the 99% (e.g., the vast majority of
the 1.8M publicly funded researchers in the EU)
We need new approaches to providing
science IT, that:
— Reduce barriers to entry
— Are cheaper
— Are sustainable
www.ci.anl.gov
12
www.ci.uchicago.edu
13. You can run a company from a coffee shop
www.ci.anl.gov
13
www.ci.uchicago.edu
14. Because businesses outsource their IT
Web presence
Email (hosted Exchange)
Calendar Software
Telephony (hosted VOIP) as a Service
Human resources and payroll (SaaS)
Accounting
Customer relationship mgmt
www.ci.anl.gov
14
www.ci.uchicago.edu
15. And often their large-scale computing too
Web presence
Email (hosted Exchange)
Calendar Software
Telephony (hosted VOIP) as a Service
Human resources and payroll (SaaS)
Accounting
Customer relationship mgmt
Infrastructure
Data analytics
as a Service
Content distribution
(IaaS)
www.ci.anl.gov
15
www.ci.uchicago.edu
17. Let’s rethink how we provide research IT
Accelerate discovery and innovation worldwide
by providing research IT as a service
Leverage software-as-a-service to
• provide millions of researchers with
unprecedented access to powerful tools;
• enable a massive shortening of cycle times in
time-consuming research processes; and
• reduce research IT costs dramatically via
economies of scale—and address sustainability?
www.ci.anl.gov
17
www.ci.uchicago.edu
18. Also address administrative costs?
42% of the time spent by an average PI
on a federally funded research project was
reported to be expended on administrative
tasks related to that project rather than on
research
— Federal Demonstration Partnership faculty burden survey, 2007
www.ci.anl.gov
18
www.ci.uchicago.edu
19. Time-consuming tasks in science
• Run experiments • Communicate with
• Collect data colleagues
• Manage data • Publish papers
• Move data • Find, configure, install
• Acquire computers relevant software
• Analyze data • Find, access, analyze
relevant data
• Run simulations
• Order supplies
• Compare experiment
with simulation • Write proposals
• Search the literature • Write reports
• …
www.ci.anl.gov
19
www.ci.uchicago.edu
20. Time-consuming tasks in science
• Run experiments • Communicate with
• Collect data colleagues
• Manage data • Publish papers
• Move data • Find, configure, install
• Acquire computers relevant software
• Analyze data • Find, access, analyze
relevant data
• Run simulations
• Order supplies
• Compare experiment
with simulation • Write proposals
• Search the literature • Write reports
• …
www.ci.anl.gov
20
www.ci.uchicago.edu
21. Scientific data delivery, 2012 1980
• “*A+ majority of users at BES facilities … physically transport data
to a home institution using portable media … data volumes are
going to increase significantly in the next few years (to 70 TB/day
or more) – data must be transferred over the network”
• “the effectiveness of data transfer middleware [is] not just on the
transfer speed, but also the time and interruption to other work
required to supervise and check on the success of large data
transfers”
• “It took two weeks and email traffic between network specialists
at NERSC and ORNL, sys-admins at NERSC, … and combustion staff
at ORNL and SNL to move 10 TB from NERSC to ORNL”
Major usability, productivity, performance problems
[ESNet Network Requirements Workshops, 2007-2010]
www.ci.anl.gov
21
www.ci.uchicago.edu
22. The challenge: Moving big data easily
What should be trivial …
“I need my data over there Data Data
– at my _____” ( Source Destination
supercomputing
center, campus server, etc.)
… can be painfully tedious and time-consuming
“GAAAH
!%&@#&
” ! Config issues
Data Data
! Firewall issues
Source Destination
! Unexpected failure
= manual retry
www.ci.anl.gov
22
www.ci.uchicago.edu
24. GO-Transfer: Data transfer as SaaS
• Reliable file transfer.
– Easy “fire-and-forget” transfers
– Automatic fault recovery
– High performance
– Across multiple security domains
• No IT required.
– Software as a Service (SaaS)
• No client software installation
• New features automatically available
– Consolidated support & troubleshooting
– Works with existing GridFTP servers
– Globus Connect solves “last mile problem”
GO-Transfer is the initial offering of the US National
Science Foundation’s XSEDE User Access Services (XUAS)
www.ci.anl.gov
24
www.ci.uchicago.edu
25. Statistics and user feedback
• Launched November 2010 “Last time I needed to fetch
100,000 files from NERSC, a
>3500 users registered graduate student babysat the
>2500 TB user data moved process for a month.”
>130 million user files moved “I expected to spend four
>300 endpoints registered weeks writing code to manage
my data transfers; with Globus
• Widely used on TeraGrid/ Online, I was up and running in
five minutes.”
XSEDE; other centers &
facilities; internationally “Transferred my data in 20
minutes instead of 61 hours.
• >20x faster than SCP Makes these global climate
• Comparable to hand-tuned simulations manageable.”
www.ci.anl.gov
26
www.ci.uchicago.edu
26. Common research data management steps
• Dark Energy Survey • SBGrid structural biology consortium
• Galaxy genomics • NCAR climate data applications
• LIGO observatory • Land use change; economics
www.ci.anl.gov
27
www.ci.uchicago.edu
27. Towards “research IT as a service”
Scientific data management as a service
GO-Store GO-Collaborate GO-Galaxy GO-Transfer
GO-Compute GO-Catalog GO-Team GO-User
www.ci.anl.gov
28
www.ci.uchicago.edu
28. Research data management as a service
• GO-User Today • GO-Store Prototype
– Credentials and other – Access to campus,
profile information cloud, XSEDE storage
• GO-Transfer • GO-Catalog
– On-demand metadata
– Data movement catalogs
• GO-Team Beta • GO-Compute
– Group membership – Access to computers
• GO-Collaborate • GO-Galaxy
– Connect to collaborative – Share, create, run
tools: Jira, Confluence, … workflows
www.ci.anl.gov
29
www.ci.uchicago.edu
34. SaaS economics: A quick tutorial
• Lower per-user cost (x10 $
or more?) via aggregation
onto common
infrastructure
• Initial “cost trough” due
0
to fixed costs Time
• Per-user revenue permits
positive return to scale
Lower per-user costs
• Further reduce per-user suggest new approaches
cost over time to sustainability
www.ci.anl.gov
36
www.ci.uchicago.edu
35. A 21st C science IT infrastructure strategy
Small and medium laboratories and projects
• To provide L L L L L L L L L
more capability for L L P L PL L P L P L L P L
L L L L L L L L L
more people at less cost …
• Create infrastructure
Research data management a
– Robust and universal
Collaboration, computation a
– Economies of scale Research administration S
– Positive returns to scale
• Via the creative use of
– Aggregation (“cloud”)
– Federation (“grid”)
www.ci.anl.gov
37
www.ci.uchicago.edu
36. Acknowledgments
• Colleagues at UChicago and Argonne
– Steve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik,
Rachana Ananthakrisnan,
Raj Kettimuthu, and others listed at
www.globusonline.org/about/goteam/
• Carl Kesselman and other colleagues at other
institutions
• Participants in the recent ICiS workshop on
“Human-Computer Symbiosis: 50 Years On”
• NSF OCI and MPS; DOE ASCR; NIH for support
www.ci.anl.gov
38
www.ci.uchicago.edu
37. For more information
• www.globusonline.org; Twitter: @globusonline
• Foster, I. Globus Online: Accelerating and
democratizing science through cloud-based
services. IEEE Internet
Computing(May/June):70-73, 2011.
• Allen, B., Bresnahan, J., Childers, L., Foster, I.,
Kandaswamy, G., Kettimuthu, R., Kordas, J., Link,
M., Martin, S., Pickett, K. and Tuecke, S.
Software as a Service for Data Scientists.
Communications of the ACM, Feb, 2012.
www.ci.anl.gov
39
www.ci.uchicago.edu
As in other outsourcing: benefits from specialization, economies of scale, reduced cost of meeting peak demand, flexibilityLivny: “I’ve been doing cloud computing since before it was called grid computing”
A particular strength of Grid has been in recognizing the need for infrastructure to support collaborative teaming
The concepts workThe technology worksBut groups still end up assembling verfically integrated solutions
PI and a handful of students and staff
The answer cannot simply be more moneyWe lack both $$ and the people to spend $$ on
Key points: intuitive interfaces, no local software, positive returns to scaleWe live in a strange time technologically. In our homes, we have enormously sophisticated digital media management technology. Intuitive, automated, high-performance discovery and streaming—Netflix and iTunes, for example.
Not (particularly) computing as a serviceBut the IT functions that researchers need to functionInclude collaboration as a service
Note that large-scale computing is an important part of the picture for manyBut the MOST important issues are often more mundane—keeping track of data, sharing data with others, finding relevant software, …
But when we get to work, we go back in time 20 years