SlideShare a Scribd company logo
1 of 38
Download to read offline
Rethinking how we provide science IT
in an era of
    massive data but
    modest budgets


Ian Foster
                                www.ci.anl.gov
                                www.ci.uchicago.edu
Exploding data volumes in biology




                        x107 in 14 years




                                       www.ci.anl.gov
2
                                       www.ci.uchicago.edu
Exploding data volumes in astronomy


      MACHO et al.: 1 TB
     Palomar: 3 TB
    2MASS: 10 TB
    GALEX: 30 TB           100,000 TB
    Sloan: 40 TB
Pan-STARRS:
    40,000 TB

                                        www.ci.anl.gov
3
                                        www.ci.uchicago.edu
Exploding data volumes in climate science
                     2004: 36 TB
                     2012: 2,300 TB




Climate
model intercomparison
project (CMIP) of the IPCC
                                       www.ci.anl.gov
4
                                       www.ci.uchicago.edu
The challenge of staying competitive
"Well, in our country," said Alice …
 "you'd generally get to somewhere
 else — if you run very fast for a
 long time, as we've been doing.”

"A slow sort of country!" said the
 Queen. "Now, here, you see, it
 takes all the running you can do, to
 keep in the same place. If you want
 to get somewhere else, you must run
 at least twice as fast as that!"
                                        www.ci.anl.gov
5
                                        www.ci.uchicago.edu
Ways of running faster (1)

                        Civilization advances by
                        extending the number of
                        important operations
                        which we can perform without
                        thinking about them
                          Alfred North Whitehead, 1911




          Enhance human capabilities

                                             www.ci.anl.gov
6
                                             www.ci.uchicago.edu
Ways of running faster (2)
                 Utility computing
                 “[t]he computing utility could become the basis for
                 a new and important industry” – McCarthy, 1960
  Outsource
automatable      Grid computing
       tasks     “provide access to computing on demand” – The
                 Grid: Blueprint for a New Computing Inf., 1999

                 Cloud computing
                 “delivery of computing as a service rather than a
                 product” *Wikipedia, 2012+

               Enhance human capabilities

                                                         www.ci.anl.gov
 7
                                                         www.ci.uchicago.edu
Ways of running faster (3)
                Collaboratories, P2P, crowdsourcing

                Virtual organizations
  Outsource
                “flexible, secure, coordinated resource sharing
automatable
                among dynamic collections of individuals,
       tasks
                institutions, and resources”, Anatomy of Grid, 2001

                                  Join forces
                                  with others



               Enhance human capabilities

                                                         www.ci.anl.gov
 8
                                                         www.ci.uchicago.edu
Big science has been keeping up


                                  OSG: 1.4M CPU-hours/day,
                                  >90 sites, >3000 users,
                                  >260 pubs in 2010
LIGO: 1 PB data in last science
run, distributed worldwide
 Robust production solutions
 Substantial teams and expense
 Sustained, multi-year effort
 Application-specific solutions,
  built on common technology ESG: 1.2 PB climate data
                                 delivered to 23,000 users; 600+ pubs
                                                             www.ci.anl.gov
 9
                                                             www.ci.uchicago.edu
But small science is struggling




More data, more complex data
Ad-hoc solutions
Inadequate software, hardware
Data plan mandates
                                  www.ci.anl.gov
10
                                  www.ci.uchicago.edu
Medium science struggles too
•        Dark Energy Survey            Blanco 4m on Cerro Tololo
         receives 100,000 files
         each night in Illinois
•        They transmit files to
         Texas for analysis …
         then move results back
         to Illinois
•        Process must be reliable,
         routine, and efficient
•        The IT team is not large    Image credit: Roger Smith/NOAO/AURA/NSF


                                                                       www.ci.anl.gov
    11
                                                                       www.ci.uchicago.edu
Science IT crisis demands new approaches
•    We have exceptional infrastructure for the 1%
     (e.g., supercomputers, LHC, …)
•    But not for the 99% (e.g., the vast majority of
     the 1.8M publicly funded researchers in the EU)

     We need new approaches to providing
     science IT, that:
     — Reduce barriers to entry
     — Are cheaper
     — Are sustainable
                                              www.ci.anl.gov
12
                                              www.ci.uchicago.edu
You can run a company from a coffee shop




                                     www.ci.anl.gov
13
                                     www.ci.uchicago.edu
Because businesses outsource their IT
     Web presence
     Email (hosted Exchange)
     Calendar                       Software
     Telephony (hosted VOIP)       as a Service
     Human resources and payroll      (SaaS)
     Accounting
     Customer relationship mgmt



                                        www.ci.anl.gov
14
                                        www.ci.uchicago.edu
And often their large-scale computing too
     Web presence
     Email (hosted Exchange)
     Calendar                       Software
     Telephony (hosted VOIP)       as a Service
     Human resources and payroll      (SaaS)
     Accounting
     Customer relationship mgmt
                                   Infrastructure
     Data analytics
                                    as a Service
     Content distribution
                                       (IaaS)
                                         www.ci.anl.gov
15
                                         www.ci.uchicago.edu
Consumers also outsource much of their IT
Let’s rethink how we provide research IT

Accelerate discovery and innovation worldwide
by providing research IT as a service
Leverage software-as-a-service to
• provide millions of researchers with
   unprecedented access to powerful tools;
• enable a massive shortening of cycle times in
   time-consuming research processes; and
• reduce research IT costs dramatically via
   economies of scale—and address sustainability?
                                          www.ci.anl.gov
17
                                          www.ci.uchicago.edu
Also address administrative costs?



42% of the time spent by an average PI
on a federally funded research project was
reported to be expended on administrative
tasks related to that project rather than on
research
     — Federal Demonstration Partnership faculty burden survey, 2007




                                                           www.ci.anl.gov
18
                                                           www.ci.uchicago.edu
Time-consuming tasks in science
•    Run experiments         • Communicate with
•    Collect data              colleagues
•    Manage data             • Publish papers
•    Move data               • Find, configure, install
•    Acquire computers         relevant software
•    Analyze data            • Find, access, analyze
                               relevant data
•    Run simulations
                             • Order supplies
•    Compare experiment
     with simulation         • Write proposals
•    Search the literature   • Write reports
                             • …
                                                www.ci.anl.gov
19
                                                www.ci.uchicago.edu
Time-consuming tasks in science
•    Run experiments         • Communicate with
•    Collect data              colleagues
•    Manage data             • Publish papers
•    Move data               • Find, configure, install
•    Acquire computers         relevant software
•    Analyze data            • Find, access, analyze
                               relevant data
•    Run simulations
                             • Order supplies
•    Compare experiment
     with simulation         • Write proposals
•    Search the literature   • Write reports
                             • …
                                                www.ci.anl.gov
20
                                                www.ci.uchicago.edu
Scientific data delivery, 2012 1980
•    “*A+ majority of users at BES facilities … physically transport data
     to a home institution using portable media … data volumes are
     going to increase significantly in the next few years (to 70 TB/day
     or more) – data must be transferred over the network”
•    “the effectiveness of data transfer middleware [is] not just on the
     transfer speed, but also the time and interruption to other work
     required to supervise and check on the success of large data
     transfers”
•    “It took two weeks and email traffic between network specialists
     at NERSC and ORNL, sys-admins at NERSC, … and combustion staff
     at ORNL and SNL to move 10 TB from NERSC to ORNL”
     Major usability, productivity, performance problems
                                [ESNet Network Requirements Workshops, 2007-2010]
                                                                  www.ci.anl.gov
21
                                                                  www.ci.uchicago.edu
The challenge: Moving big data easily
What should be trivial …

        “I need my data over there      Data                            Data
              – at my _____” (         Source                        Destination
              supercomputing
        center, campus server, etc.)




 … can be painfully tedious and time-consuming
          “GAAAH
          !%&@#&
             ”                         ! Config issues
                      Data                                               Data
                                                  ! Firewall issues
                     Source                                           Destination
                                                   ! Unexpected failure
                                                      = manual retry


                                                                      www.ci.anl.gov
22
                                                                      www.ci.uchicago.edu
• GO PICTURE
GO-Transfer: Data transfer as SaaS
• Reliable file transfer.
      –   Easy “fire-and-forget” transfers
      –   Automatic fault recovery
      –   High performance
      –   Across multiple security domains
• No IT required.
      – Software as a Service (SaaS)
            • No client software installation
            • New features automatically available
      – Consolidated support & troubleshooting
      – Works with existing GridFTP servers
      – Globus Connect solves “last mile problem”

GO-Transfer is the initial offering of the US National
Science Foundation’s XSEDE User Access Services (XUAS)
                                                         www.ci.anl.gov
 24
                                                         www.ci.uchicago.edu
Statistics and user feedback
•        Launched November 2010          “Last time I needed to fetch
                                         100,000 files from NERSC, a
         >3500 users registered          graduate student babysat the
         >2500 TB user data moved        process for a month.”
         >130 million user files moved   “I expected to spend four
         >300 endpoints registered       weeks writing code to manage
                                         my data transfers; with Globus
•        Widely used on TeraGrid/        Online, I was up and running in
                                         five minutes.”
         XSEDE; other centers &
         facilities; internationally     “Transferred my data in 20
                                         minutes instead of 61 hours.
•        >20x faster than SCP            Makes these global climate
•        Comparable to hand-tuned        simulations manageable.”
                                                            www.ci.anl.gov
    26
                                                            www.ci.uchicago.edu
Common research data management steps
     •   Dark Energy Survey   •   SBGrid structural biology consortium
     •   Galaxy genomics      •   NCAR climate data applications
     •   LIGO observatory     •   Land use change; economics




                                                              www.ci.anl.gov
27
                                                              www.ci.uchicago.edu
Towards “research IT as a service”
           Scientific data management as a service
     GO-Store      GO-Collaborate       GO-Galaxy     GO-Transfer

          GO-Compute       GO-Catalog       GO-Team      GO-User




                                                                    www.ci.anl.gov
28
                                                                    www.ci.uchicago.edu
Research data management as a service
•    GO-User           Today          •   GO-Store       Prototype
     – Credentials and other              – Access to campus,
       profile information                  cloud, XSEDE storage
•    GO-Transfer                      •   GO-Catalog
                                          –   On-demand metadata
     –   Data movement                        catalogs
•    GO-Team             Beta         •   GO-Compute
     –   Group membership                 –   Access to computers
•    GO-Collaborate                   •   GO-Galaxy
     –   Connect to collaborative         –   Share, create, run
         tools: Jira, Confluence, …           workflows

                                                             www.ci.anl.gov
29
                                                             www.ci.uchicago.edu
Collaboration Management




                          www.ci.anl.gov
                     30   www.ci.uchicago.edu
Other innovative science SaaS projects




                                         www.ci.anl.gov
32
                                         www.ci.uchicago.edu
Other innovative science SaaS projects




                                         www.ci.anl.gov
33
                                         www.ci.uchicago.edu
Other innovative science SaaS projects




                                         www.ci.anl.gov
34
                                         www.ci.uchicago.edu
Other innovative science SaaS projects




                                         www.ci.anl.gov
35
                                         www.ci.uchicago.edu
SaaS economics: A quick tutorial
•    Lower per-user cost (x10 $
     or more?) via aggregation
     onto common
     infrastructure
•    Initial “cost trough” due
                               0
     to fixed costs                                 Time

•    Per-user revenue permits
     positive return to scale
                               Lower per-user costs
•    Further reduce per-user suggest new approaches
     cost over time            to sustainability
                                               www.ci.anl.gov
36
                                               www.ci.uchicago.edu
A 21st C science IT infrastructure strategy
                                Small and medium laboratories and projects
•    To provide                  L L L         L L L           L L L
     more capability for        L L P L PL L P L P L L P L
                                 L     L L L        L L        L     L L
     more people at less cost …
•    Create infrastructure
                                 Research data management a
      – Robust and universal
                                 Collaboration, computation a
      – Economies of scale       Research administration               S
         –   Positive returns to scale
•    Via the creative use of
         – Aggregation (“cloud”)
         – Federation (“grid”)

                                                              www.ci.anl.gov
    37
                                                              www.ci.uchicago.edu
Acknowledgments
•    Colleagues at UChicago and Argonne
     –   Steve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik,
         Rachana Ananthakrisnan,
         Raj Kettimuthu, and others listed at
         www.globusonline.org/about/goteam/
•    Carl Kesselman and other colleagues at other
     institutions
•    Participants in the recent ICiS workshop on
     “Human-Computer Symbiosis: 50 Years On”
•    NSF OCI and MPS; DOE ASCR; NIH for support
                                                     www.ci.anl.gov
38
                                                     www.ci.uchicago.edu
For more information
•    www.globusonline.org; Twitter: @globusonline
•    Foster, I. Globus Online: Accelerating and
     democratizing science through cloud-based
     services. IEEE Internet
     Computing(May/June):70-73, 2011.
•    Allen, B., Bresnahan, J., Childers, L., Foster, I.,
     Kandaswamy, G., Kettimuthu, R., Kordas, J., Link,
     M., Martin, S., Pickett, K. and Tuecke, S.
     Software as a Service for Data Scientists.
     Communications of the ACM, Feb, 2012.
                                                 www.ci.anl.gov
39
                                                 www.ci.uchicago.edu
Thank you!
foster@uchicago.edu

www.globusonline.org
Twitter: @globusonline, @ianfoster


                                     www.ci.anl.gov
                                     www.ci.uchicago.edu

More Related Content

What's hot

Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNADaniel S. Katz
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemLarry Smarr
 
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Amit Sheth
 
Visual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceVisual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceUniversity of Washington
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide WebJames Hendler
 
SemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesSemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesAndrew Woolf
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak openLilian Juma
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-ResearchDavid De Roure
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011Ian Foster
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)James Hendler
 
Social Machines of Scholarly Collaboration
Social Machines of Scholarly CollaborationSocial Machines of Scholarly Collaboration
Social Machines of Scholarly CollaborationDavid De Roure
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesTheContentMine
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
RDSI Project History
RDSI Project HistoryRDSI Project History
RDSI Project HistoryAsher Vennell
 
RDSI Project History
RDSI Project HistoryRDSI Project History
RDSI Project HistoryRDSI
 

What's hot (20)

Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
 
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
 
End-to-End eScience
End-to-End eScienceEnd-to-End eScience
End-to-End eScience
 
Visual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceVisual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory Science
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide Web
 
SemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesSemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challenges
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak open
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-Research
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)
 
Social Machines of Scholarly Collaboration
Social Machines of Scholarly CollaborationSocial Machines of Scholarly Collaboration
Social Machines of Scholarly Collaboration
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
Dh presentation 2018
Dh presentation 2018Dh presentation 2018
Dh presentation 2018
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
RDSI Project History
RDSI Project HistoryRDSI Project History
RDSI Project History
 
RDSI Project History
RDSI Project HistoryRDSI Project History
RDSI Project History
 

Viewers also liked

Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World FosterIan Foster
 
Recruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop SeriesRecruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop Serieshholmes75
 
Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Ian Foster
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Campus Bridging with Globus Services
Campus Bridging with Globus ServicesCampus Bridging with Globus Services
Campus Bridging with Globus ServicesIan Foster
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plansIan Foster
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selectionr m
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformIan Foster
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchIan Foster
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataIan Foster
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshotsIan Foster
 
8085 paper-presentation
8085 paper-presentation8085 paper-presentation
8085 paper-presentationJiMs ChAcko
 

Viewers also liked (13)

Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Recruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop SeriesRecruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop Series
 
Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Campus Bridging with Globus Services
Campus Bridging with Globus ServicesCampus Bridging with Globus Services
Campus Bridging with Globus Services
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selection
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshots
 
8085 paper-presentation
8085 paper-presentation8085 paper-presentation
8085 paper-presentation
 

Similar to Rethinking how we provide science IT in an era of massive data but modest budgets

SOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsSOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsTanu Malik
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012Lee Dirks
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceAndrew Sallans
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
GENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian FosterGENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian FosterIan Foster
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven DiscoveryGlobus
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryIan Foster
 
Bleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingBleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingRobert H. McDonald
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemLarry Smarr
 
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
"Designing for Truth, Scale and Sustainability" - WSSSPE2 KeynoteKaitlin Thaney
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourKNOWeSCAPE2014
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?Graham Pryor
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGGeoffrey Fox
 
Colombia 20140326 v1
Colombia 20140326 v1Colombia 20140326 v1
Colombia 20140326 v1ISSIP
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 

Similar to Rethinking how we provide science IT in an era of massive data but modest budgets (20)

Cifar
CifarCifar
Cifar
 
SOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsSOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science Objects
 
Big Data
Big Data Big Data
Big Data
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
GENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian FosterGENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian Foster
 
Summary of 3DPAS
Summary of 3DPASSummary of 3DPAS
Summary of 3DPAS
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...
Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...
Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...
 
Bleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingBleeding, Leading, or Not Competing
Bleeding, Leading, or Not Competing
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
 
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Colombia 20140326 v1
Colombia 20140326 v1Colombia 20140326 v1
Colombia 20140326 v1
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 

More from Ian Foster

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxIan Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumIan Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsIan Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationIan Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptxIan Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon SummaryIan Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperabilityIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasIan Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 

More from Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 

Recently uploaded

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 

Recently uploaded (20)

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 

Rethinking how we provide science IT in an era of massive data but modest budgets

  • 1. Rethinking how we provide science IT in an era of massive data but modest budgets Ian Foster www.ci.anl.gov www.ci.uchicago.edu
  • 2. Exploding data volumes in biology x107 in 14 years www.ci.anl.gov 2 www.ci.uchicago.edu
  • 3. Exploding data volumes in astronomy MACHO et al.: 1 TB Palomar: 3 TB 2MASS: 10 TB GALEX: 30 TB 100,000 TB Sloan: 40 TB Pan-STARRS: 40,000 TB www.ci.anl.gov 3 www.ci.uchicago.edu
  • 4. Exploding data volumes in climate science 2004: 36 TB 2012: 2,300 TB Climate model intercomparison project (CMIP) of the IPCC www.ci.anl.gov 4 www.ci.uchicago.edu
  • 5. The challenge of staying competitive "Well, in our country," said Alice … "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.” "A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!" www.ci.anl.gov 5 www.ci.uchicago.edu
  • 6. Ways of running faster (1) Civilization advances by extending the number of important operations which we can perform without thinking about them Alfred North Whitehead, 1911 Enhance human capabilities www.ci.anl.gov 6 www.ci.uchicago.edu
  • 7. Ways of running faster (2) Utility computing “[t]he computing utility could become the basis for a new and important industry” – McCarthy, 1960 Outsource automatable Grid computing tasks “provide access to computing on demand” – The Grid: Blueprint for a New Computing Inf., 1999 Cloud computing “delivery of computing as a service rather than a product” *Wikipedia, 2012+ Enhance human capabilities www.ci.anl.gov 7 www.ci.uchicago.edu
  • 8. Ways of running faster (3) Collaboratories, P2P, crowdsourcing Virtual organizations Outsource “flexible, secure, coordinated resource sharing automatable among dynamic collections of individuals, tasks institutions, and resources”, Anatomy of Grid, 2001 Join forces with others Enhance human capabilities www.ci.anl.gov 8 www.ci.uchicago.edu
  • 9. Big science has been keeping up OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010 LIGO: 1 PB data in last science run, distributed worldwide Robust production solutions Substantial teams and expense Sustained, multi-year effort Application-specific solutions, built on common technology ESG: 1.2 PB climate data delivered to 23,000 users; 600+ pubs www.ci.anl.gov 9 www.ci.uchicago.edu
  • 10. But small science is struggling More data, more complex data Ad-hoc solutions Inadequate software, hardware Data plan mandates www.ci.anl.gov 10 www.ci.uchicago.edu
  • 11. Medium science struggles too • Dark Energy Survey Blanco 4m on Cerro Tololo receives 100,000 files each night in Illinois • They transmit files to Texas for analysis … then move results back to Illinois • Process must be reliable, routine, and efficient • The IT team is not large Image credit: Roger Smith/NOAO/AURA/NSF www.ci.anl.gov 11 www.ci.uchicago.edu
  • 12. Science IT crisis demands new approaches • We have exceptional infrastructure for the 1% (e.g., supercomputers, LHC, …) • But not for the 99% (e.g., the vast majority of the 1.8M publicly funded researchers in the EU) We need new approaches to providing science IT, that: — Reduce barriers to entry — Are cheaper — Are sustainable www.ci.anl.gov 12 www.ci.uchicago.edu
  • 13. You can run a company from a coffee shop www.ci.anl.gov 13 www.ci.uchicago.edu
  • 14. Because businesses outsource their IT Web presence Email (hosted Exchange) Calendar Software Telephony (hosted VOIP) as a Service Human resources and payroll (SaaS) Accounting Customer relationship mgmt www.ci.anl.gov 14 www.ci.uchicago.edu
  • 15. And often their large-scale computing too Web presence Email (hosted Exchange) Calendar Software Telephony (hosted VOIP) as a Service Human resources and payroll (SaaS) Accounting Customer relationship mgmt Infrastructure Data analytics as a Service Content distribution (IaaS) www.ci.anl.gov 15 www.ci.uchicago.edu
  • 16. Consumers also outsource much of their IT
  • 17. Let’s rethink how we provide research IT Accelerate discovery and innovation worldwide by providing research IT as a service Leverage software-as-a-service to • provide millions of researchers with unprecedented access to powerful tools; • enable a massive shortening of cycle times in time-consuming research processes; and • reduce research IT costs dramatically via economies of scale—and address sustainability? www.ci.anl.gov 17 www.ci.uchicago.edu
  • 18. Also address administrative costs? 42% of the time spent by an average PI on a federally funded research project was reported to be expended on administrative tasks related to that project rather than on research — Federal Demonstration Partnership faculty burden survey, 2007 www.ci.anl.gov 18 www.ci.uchicago.edu
  • 19. Time-consuming tasks in science • Run experiments • Communicate with • Collect data colleagues • Manage data • Publish papers • Move data • Find, configure, install • Acquire computers relevant software • Analyze data • Find, access, analyze relevant data • Run simulations • Order supplies • Compare experiment with simulation • Write proposals • Search the literature • Write reports • … www.ci.anl.gov 19 www.ci.uchicago.edu
  • 20. Time-consuming tasks in science • Run experiments • Communicate with • Collect data colleagues • Manage data • Publish papers • Move data • Find, configure, install • Acquire computers relevant software • Analyze data • Find, access, analyze relevant data • Run simulations • Order supplies • Compare experiment with simulation • Write proposals • Search the literature • Write reports • … www.ci.anl.gov 20 www.ci.uchicago.edu
  • 21. Scientific data delivery, 2012 1980 • “*A+ majority of users at BES facilities … physically transport data to a home institution using portable media … data volumes are going to increase significantly in the next few years (to 70 TB/day or more) – data must be transferred over the network” • “the effectiveness of data transfer middleware [is] not just on the transfer speed, but also the time and interruption to other work required to supervise and check on the success of large data transfers” • “It took two weeks and email traffic between network specialists at NERSC and ORNL, sys-admins at NERSC, … and combustion staff at ORNL and SNL to move 10 TB from NERSC to ORNL” Major usability, productivity, performance problems [ESNet Network Requirements Workshops, 2007-2010] www.ci.anl.gov 21 www.ci.uchicago.edu
  • 22. The challenge: Moving big data easily What should be trivial … “I need my data over there Data Data – at my _____” ( Source Destination supercomputing center, campus server, etc.) … can be painfully tedious and time-consuming “GAAAH !%&@#& ” ! Config issues Data Data ! Firewall issues Source Destination ! Unexpected failure = manual retry www.ci.anl.gov 22 www.ci.uchicago.edu
  • 24. GO-Transfer: Data transfer as SaaS • Reliable file transfer. – Easy “fire-and-forget” transfers – Automatic fault recovery – High performance – Across multiple security domains • No IT required. – Software as a Service (SaaS) • No client software installation • New features automatically available – Consolidated support & troubleshooting – Works with existing GridFTP servers – Globus Connect solves “last mile problem” GO-Transfer is the initial offering of the US National Science Foundation’s XSEDE User Access Services (XUAS) www.ci.anl.gov 24 www.ci.uchicago.edu
  • 25. Statistics and user feedback • Launched November 2010 “Last time I needed to fetch 100,000 files from NERSC, a >3500 users registered graduate student babysat the >2500 TB user data moved process for a month.” >130 million user files moved “I expected to spend four >300 endpoints registered weeks writing code to manage my data transfers; with Globus • Widely used on TeraGrid/ Online, I was up and running in five minutes.” XSEDE; other centers & facilities; internationally “Transferred my data in 20 minutes instead of 61 hours. • >20x faster than SCP Makes these global climate • Comparable to hand-tuned simulations manageable.” www.ci.anl.gov 26 www.ci.uchicago.edu
  • 26. Common research data management steps • Dark Energy Survey • SBGrid structural biology consortium • Galaxy genomics • NCAR climate data applications • LIGO observatory • Land use change; economics www.ci.anl.gov 27 www.ci.uchicago.edu
  • 27. Towards “research IT as a service” Scientific data management as a service GO-Store GO-Collaborate GO-Galaxy GO-Transfer GO-Compute GO-Catalog GO-Team GO-User www.ci.anl.gov 28 www.ci.uchicago.edu
  • 28. Research data management as a service • GO-User Today • GO-Store Prototype – Credentials and other – Access to campus, profile information cloud, XSEDE storage • GO-Transfer • GO-Catalog – On-demand metadata – Data movement catalogs • GO-Team Beta • GO-Compute – Group membership – Access to computers • GO-Collaborate • GO-Galaxy – Connect to collaborative – Share, create, run tools: Jira, Confluence, … workflows www.ci.anl.gov 29 www.ci.uchicago.edu
  • 29. Collaboration Management www.ci.anl.gov 30 www.ci.uchicago.edu
  • 30. Other innovative science SaaS projects www.ci.anl.gov 32 www.ci.uchicago.edu
  • 31. Other innovative science SaaS projects www.ci.anl.gov 33 www.ci.uchicago.edu
  • 32. Other innovative science SaaS projects www.ci.anl.gov 34 www.ci.uchicago.edu
  • 33. Other innovative science SaaS projects www.ci.anl.gov 35 www.ci.uchicago.edu
  • 34. SaaS economics: A quick tutorial • Lower per-user cost (x10 $ or more?) via aggregation onto common infrastructure • Initial “cost trough” due 0 to fixed costs Time • Per-user revenue permits positive return to scale Lower per-user costs • Further reduce per-user suggest new approaches cost over time to sustainability www.ci.anl.gov 36 www.ci.uchicago.edu
  • 35. A 21st C science IT infrastructure strategy Small and medium laboratories and projects • To provide L L L L L L L L L more capability for L L P L PL L P L P L L P L L L L L L L L L L more people at less cost … • Create infrastructure Research data management a – Robust and universal Collaboration, computation a – Economies of scale Research administration S – Positive returns to scale • Via the creative use of – Aggregation (“cloud”) – Federation (“grid”) www.ci.anl.gov 37 www.ci.uchicago.edu
  • 36. Acknowledgments • Colleagues at UChicago and Argonne – Steve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik, Rachana Ananthakrisnan, Raj Kettimuthu, and others listed at www.globusonline.org/about/goteam/ • Carl Kesselman and other colleagues at other institutions • Participants in the recent ICiS workshop on “Human-Computer Symbiosis: 50 Years On” • NSF OCI and MPS; DOE ASCR; NIH for support www.ci.anl.gov 38 www.ci.uchicago.edu
  • 37. For more information • www.globusonline.org; Twitter: @globusonline • Foster, I. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing(May/June):70-73, 2011. • Allen, B., Bresnahan, J., Childers, L., Foster, I., Kandaswamy, G., Kettimuthu, R., Kordas, J., Link, M., Martin, S., Pickett, K. and Tuecke, S. Software as a Service for Data Scientists. Communications of the ACM, Feb, 2012. www.ci.anl.gov 39 www.ci.uchicago.edu

Editor's Notes

  1. As in other outsourcing: benefits from specialization, economies of scale, reduced cost of meeting peak demand, flexibilityLivny: “I’ve been doing cloud computing since before it was called grid computing”
  2. A particular strength of Grid has been in recognizing the need for infrastructure to support collaborative teaming
  3. The concepts workThe technology worksBut groups still end up assembling verfically integrated solutions
  4. PI and a handful of students and staff
  5. The answer cannot simply be more moneyWe lack both $$ and the people to spend $$ on
  6. Key points: intuitive interfaces, no local software, positive returns to scaleWe live in a strange time technologically. In our homes, we have enormously sophisticated digital media management technology. Intuitive, automated, high-performance discovery and streaming—Netflix and iTunes, for example.
  7. Not (particularly) computing as a serviceBut the IT functions that researchers need to functionInclude collaboration as a service
  8. Note that large-scale computing is an important part of the picture for manyBut the MOST important issues are often more mundane—keeping track of data, sharing data with others, finding relevant software, …
  9. But when we get to work, we go back in time 20 years