SlideShare a Scribd company logo
1 of 28
Download to read offline
Hadoop as the Platform for the
Smartgrid at TVA
August 26, 2010
Topics

•   Introduction
•   Retrospective on the openPDC project
•   What Is Hadoop?
•   Current Smartgrid Obstacles
•   Cloudera Enterprise as The New Smartgrid Platform
•   Summary




                  Copyright 2010 Cloudera Inc. All rights reserved   2
Today’s speaker – Josh Patterson

 • josh@cloudera.com
 • Master’s Thesis: self-organizing mesh networks
    • Published in IAAI-09: TinyTermite: A Secure Routing Algorithm
 • Conceived, built, and led Hadoop integration for the
   openPDC project at TVA
    • Led small team which designed classification techniques
      for timeseries and Map Reduce
    • Open source work at http://openpdc.codeplex.com
 • Now: Solutions Architect at Cloudera



                    Copyright 2010 Cloudera Inc. All rights reserved   3
What is the openPDC?

• The openPDC is a complete set of applications for
  processing streaming time-series data in real-time
   • Measured data is gathered with GPS-time from multiple input
     sources, time-sorted and provided to user defined actions,
     dispersed to custom output destinations for archival
• NERC funded
• Started at the Tennessee Valley Authority (TVA)
• Now in use by many government controlled power
  companies around the world


                   Copyright 2010 Cloudera Inc. All rights reserved   4
openPDC Topology




            Copyright 2010 Cloudera Inc. All rights reserved   5
openPDC: Why?

Northeast Blackout of 2003
• Significant failure of US power grid in 2003 due to cascading
  effects
• SCADA provided a limited at best view of what happened
• NERC mandated that companies collect high resolution data
  and store for later analysis
• Power grid in US is aging rapidly, cost of needed overhaul is
  significant




                  Copyright 2010 Cloudera Inc. All rights reserved   6
How “Big Data” Challenged the openPDC Project

 “We Need More Power, Scotty”



 • Data was sampled 30 times a second
 • Number of sensors (Phasor Measurement Units / PMU) was
   increasing rapidly (was 120, heading towards 1000 over next 2
   years, currently taking in 4.2 billion samples per day)
 • Cost of SAN storage became excessive
 • Little analysis possible on SAN due to poor read rates on large
   amounts (TBs) of data

                   Copyright 2010 Cloudera Inc. All rights reserved   7
Major Themes for Storage and Processing Needs

•   Scale Out, not Up
•   Linear scalability in cost and processing power
•   Robust in the face of hardware failure
•   No vendor lock in




                   Copyright 2010 Cloudera Inc. All rights reserved   8
Storage Needs: The Data Deluge

 • At 1000 PMU sensors we were looking at needing to store 500TB of data
 • The Data Deluge
     • “Eighteen months ago, Li & Fung, a firm that manages supply chains for retailers,
       saw 100 gigabytes of information flow through its network each day. Now the
       amount has increased tenfold.”
     •   http://www.economist.com/opinion/displaystory.cfm?story_id=15579717

 • Internet of Things
     • HP's Peter Hartwell: "one trillion nanoscale sensors and actuators will need the
       equivalent of 1000 internets: the next huge demand for computing!“
Processing Needs: Needle in a Haystack

• The “Haystack” in PMU data typically involved in
  scanning through TBs of info to find the one particular
  event we were interested in
• RDBMs simply do not work with high resolution
  timeseries data
• Need for Ad-Hoc processing on data to explore network
  effects and look at how events cascade across the grid




                 Copyright 2010 Cloudera Inc. All rights reserved   10
The Solution: Hadoop

• A scalable fault-tolerant distributed system for data storage
  and processing (open source under the Apache license)

• Two primary components
   • Hadoop Distributed File System (HDFS): self-healing high-bandwidth
     clustered storage
   • MapReduce: fault-tolerant distributed processing

• Key value
   •   Flexible -> store data without a schema and add it later as needed
   •   Affordable -> cost / TB at a fraction of traditional options
   •   Broadly adopted -> a large and active ecosystem
   •   Proven at scale -> dozens of petabyte + implementations in
       production today
                      Copyright 2010 Cloudera Inc. All Rights Reserved.     11
HDFS As Cheap and Scalable Storage

• HDFS is robust in the face of machine failure
• A big thing was cost – we could linearly grow our cluster
  as needed by just adding new machines
• Ran on commodity hardware – we didn’t have to buy
  expensive (and relatively slow), proprietary SAN setups




                  Copyright 2010 Cloudera Inc. All rights reserved   12
MapReduce Provides a Powerful Parallel Processing
Framework
• We found Map Reduce to be the perfect framework to
  quickly process large amounts of PMU (timeseries) data
• Created a machine learning algorithm in Map Reduce
  which detected “unbounded oscillations” in grid data
• Map Reduce based oscillation scan of a few TBs takes
  minutes
• A scan of comparable data from a SAN would take days
  or weeks



                 Copyright 2010 Cloudera Inc. All rights reserved   13
What is common across Hadoop-able problems?

 Nature of the data
 • Complex data
 • Multiple data sources
 • Lots of it

 Nature of the analysis
 • Batch processing
 • Parallel execution
 • Spread data over a cluster of servers
   and take the computation to the data

                  Copyright 2010 Cloudera Inc. All rights reserved   14
What Analysis is Possible With Hadoop?


 • Text mining                                   • Collaborative filtering
 • Index building                                • Prediction models
 • Graph creation and                            • Sentiment analysis
   analysis
                                                 • Risk assessment
 • Pattern recognition




                 Copyright 2010 Cloudera Inc. All rights reserved            15
Benefits of Analyzing With Hadoop

 • Previously impossible/impractical to do this analysis

 • Analysis conducted at lower cost

 • Analysis conducted in less time

 • Greater flexibility




                 Copyright 2010 Cloudera Inc. All rights reserved   16
The Storm of the Data Deluge is Brewing

• Challenges of the openPDC project were just the first
  wave
• Storage requirements are accelerating
• Disk speeds are relatively constant
• Seeing signs of data deluge, GE now using open sourced
  Hadoop-based timeseries classifiers developed in the
  openPDC project




                 Copyright 2010 Cloudera Inc. All rights reserved   17
Coming Power Grid Stressors

• Larger fluctuations in power demands
   • Ex: Millions of new electric cars all charging in the evenings
• An aging power grid that requires more capital infusion
  than most companies have allocated for these purposes
   • Grid infrastructure is older than most realize
   • Maintenance policies generally only look at age of equipment




                    Copyright 2010 Cloudera Inc. All rights reserved   18
The Power Grid Domain is Slow to Evolve

• Power companies are slow to adopt technology
   • They generally have poor maps of their overall infrastructure
• Coming pressures are going to force power companies
  to have to analyze TBs and PBs of data
• Ad-Hoc analysis will be needed to explore the complex
  relationships in this data




                    Copyright 2010 Cloudera Inc. All rights reserved   19
Broader Emerging Smartgrid Themes

• Simply adding lots of sensors is only a very small part of
  the solution
• Collection, storage, and processing are in themselves all
  difficult problems
• In order to build a more effective Smartgrid, platforms
  are needed that handle these things well
• Smartgrid sensor collection is a subset of the larger
  undercurrent of emerging massive sensor based
  systems


                  Copyright 2010 Cloudera Inc. All rights reserved   20
Even Broader Theme: Internet of Things

• We’re collecting sensor data everywhere, not just the
  Smartgrid
• Many of the techniques described above can be easily
  done with Hadoop
   • Open Source generalized collector system is called “Flume”
• Examples:
   • Weather sensors
   • Mesh networks – battlefield UAVs
   • Cell Phones – Google Android as a collector


                   Copyright 2010 Cloudera Inc. All rights reserved   21
Next Generation Sensor Platform: Hadoop and
Related Projects




              Copyright 2010 Cloudera Inc. All rights reserved   22
The Companies That Provide Real Results for
Sensor Platforms Will Win
• Much of today’s Smartgrid talk is just hype
• Few “solutions” actually fix anything, only put sensors
  on things
• Analysis is where the true value lies
   • But you need a complete platform to be in position to analyze
     the data




                   Copyright 2010 Cloudera Inc. All rights reserved   23
Harnessing Hadoop Has Its Challenges
              Ease of use – command line interface only; data
              import and access requires development skills

    Complexity -- > 12 different components,
    different versions, dependencies and patch
    requirements
             Manageability – Hadoop is challenging
             to configure, upgrade, monitor and
             administer
            Interoperability – limited support for
            popular databases and analytical tools

                Copyright 2010 Cloudera Inc. All Rights Reserved.   24
Cloudera’s Distribution for Hadoop, version 3
The industry’s leading Hadoop distribution


                                                  Hue                               Hue SDK

                               Oozie                              Oozie                Hive
                                                                          Pig/
                                                                          Hive


                Flume, Sqoop                                                          HBase

                                                                                   Zookeeper



•   Open source – 100% Apache licensed
•   Simplified – Component versions & dependencies managed for you
•   Integrated – All components & functions interoperate through standard API’s
•   Reliable – Patched with fixes from future releases to improve stability
•   Supported – Employs project founders and committers for >70% of components
                               Copyright 2010 Cloudera Inc. All Rights Reserved.               25
Who is Cloudera?

• Enterprise software & services company providing the industry’s
  leading Hadoop-based data management platform
   • Founding team came from large Web companies



• Products: Cloudera Enterprise & Cloudera’s Distribution for Hadoop
   • All necessary packages, matched, tested and supported
   • Tools to support production use of Hadoop
   • The leading distribution for the enterprise


• Contributors and committers
   • Fixing, patching and adding features

                                                                    26
Hear More Examples @ Hadoop World 2010
http://www.cloudera.com/company/press-center/hadoop-world-nyc/


 • 2nd annual event focused on practical
   applications of Hadoop

 • Date: October 12th 2010

 • Location: Hilton New York                                                 Confirmed speakers from

 • Keynote from Tim O’Reilly – founder
   O’Reilly Media

 • Pre and post conference training
   available for Hadoop and related projects

 • 36 business and technical focused sessions


                         Copyright 2010 Cloudera Inc. All Rights Reserved.                             27
Questions?




             Copyright 2010 Cloudera Inc. All Rights Reserved.   28

More Related Content

What's hot

Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopDataWorks Summit
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Cloudera, Inc.
 
快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu快速数据快速分析引擎-Kudu
快速数据快速分析引擎-KuduJianwei Li
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricDataWorks Summit
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduCloudera, Inc.
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...Cloudera, Inc.
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in HadoopRommel Garcia
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
EMC Big Data Solutions Overview
EMC Big Data Solutions OverviewEMC Big Data Solutions Overview
EMC Big Data Solutions Overviewwalshe1
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsCloudera, Inc.
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 

What's hot (20)

Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with Hadoop
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data Centric
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Facial recognition
Facial recognitionFacial recognition
Facial recognition
 
A Mayo Clinic Big Data Implementation
A Mayo Clinic Big Data ImplementationA Mayo Clinic Big Data Implementation
A Mayo Clinic Big Data Implementation
 
EMC Big Data Solutions Overview
EMC Big Data Solutions OverviewEMC Big Data Solutions Overview
EMC Big Data Solutions Overview
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber Threats
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 

Similar to Hadoop As The Platform For The Smartgrid At TVA

Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems WebinarCloudera, Inc.
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Cloudera, Inc.
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected BreweryJason Hubbard
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Altair Leveraging Disruptive Cloud Technologies
Altair Leveraging Disruptive Cloud TechnologiesAltair Leveraging Disruptive Cloud Technologies
Altair Leveraging Disruptive Cloud TechnologiesAltair
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio, Inc.
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 

Similar to Hadoop As The Platform For The Smartgrid At TVA (20)

Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Altair Leveraging Disruptive Cloud Technologies
Altair Leveraging Disruptive Cloud TechnologiesAltair Leveraging Disruptive Cloud Technologies
Altair Leveraging Disruptive Cloud Technologies
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future Directions
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 

Recently uploaded

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Recently uploaded (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

Hadoop As The Platform For The Smartgrid At TVA

  • 1. Hadoop as the Platform for the Smartgrid at TVA August 26, 2010
  • 2. Topics • Introduction • Retrospective on the openPDC project • What Is Hadoop? • Current Smartgrid Obstacles • Cloudera Enterprise as The New Smartgrid Platform • Summary Copyright 2010 Cloudera Inc. All rights reserved 2
  • 3. Today’s speaker – Josh Patterson • josh@cloudera.com • Master’s Thesis: self-organizing mesh networks • Published in IAAI-09: TinyTermite: A Secure Routing Algorithm • Conceived, built, and led Hadoop integration for the openPDC project at TVA • Led small team which designed classification techniques for timeseries and Map Reduce • Open source work at http://openpdc.codeplex.com • Now: Solutions Architect at Cloudera Copyright 2010 Cloudera Inc. All rights reserved 3
  • 4. What is the openPDC? • The openPDC is a complete set of applications for processing streaming time-series data in real-time • Measured data is gathered with GPS-time from multiple input sources, time-sorted and provided to user defined actions, dispersed to custom output destinations for archival • NERC funded • Started at the Tennessee Valley Authority (TVA) • Now in use by many government controlled power companies around the world Copyright 2010 Cloudera Inc. All rights reserved 4
  • 5. openPDC Topology Copyright 2010 Cloudera Inc. All rights reserved 5
  • 6. openPDC: Why? Northeast Blackout of 2003 • Significant failure of US power grid in 2003 due to cascading effects • SCADA provided a limited at best view of what happened • NERC mandated that companies collect high resolution data and store for later analysis • Power grid in US is aging rapidly, cost of needed overhaul is significant Copyright 2010 Cloudera Inc. All rights reserved 6
  • 7. How “Big Data” Challenged the openPDC Project “We Need More Power, Scotty” • Data was sampled 30 times a second • Number of sensors (Phasor Measurement Units / PMU) was increasing rapidly (was 120, heading towards 1000 over next 2 years, currently taking in 4.2 billion samples per day) • Cost of SAN storage became excessive • Little analysis possible on SAN due to poor read rates on large amounts (TBs) of data Copyright 2010 Cloudera Inc. All rights reserved 7
  • 8. Major Themes for Storage and Processing Needs • Scale Out, not Up • Linear scalability in cost and processing power • Robust in the face of hardware failure • No vendor lock in Copyright 2010 Cloudera Inc. All rights reserved 8
  • 9. Storage Needs: The Data Deluge • At 1000 PMU sensors we were looking at needing to store 500TB of data • The Data Deluge • “Eighteen months ago, Li & Fung, a firm that manages supply chains for retailers, saw 100 gigabytes of information flow through its network each day. Now the amount has increased tenfold.” • http://www.economist.com/opinion/displaystory.cfm?story_id=15579717 • Internet of Things • HP's Peter Hartwell: "one trillion nanoscale sensors and actuators will need the equivalent of 1000 internets: the next huge demand for computing!“
  • 10. Processing Needs: Needle in a Haystack • The “Haystack” in PMU data typically involved in scanning through TBs of info to find the one particular event we were interested in • RDBMs simply do not work with high resolution timeseries data • Need for Ad-Hoc processing on data to explore network effects and look at how events cascade across the grid Copyright 2010 Cloudera Inc. All rights reserved 10
  • 11. The Solution: Hadoop • A scalable fault-tolerant distributed system for data storage and processing (open source under the Apache license) • Two primary components • Hadoop Distributed File System (HDFS): self-healing high-bandwidth clustered storage • MapReduce: fault-tolerant distributed processing • Key value • Flexible -> store data without a schema and add it later as needed • Affordable -> cost / TB at a fraction of traditional options • Broadly adopted -> a large and active ecosystem • Proven at scale -> dozens of petabyte + implementations in production today Copyright 2010 Cloudera Inc. All Rights Reserved. 11
  • 12. HDFS As Cheap and Scalable Storage • HDFS is robust in the face of machine failure • A big thing was cost – we could linearly grow our cluster as needed by just adding new machines • Ran on commodity hardware – we didn’t have to buy expensive (and relatively slow), proprietary SAN setups Copyright 2010 Cloudera Inc. All rights reserved 12
  • 13. MapReduce Provides a Powerful Parallel Processing Framework • We found Map Reduce to be the perfect framework to quickly process large amounts of PMU (timeseries) data • Created a machine learning algorithm in Map Reduce which detected “unbounded oscillations” in grid data • Map Reduce based oscillation scan of a few TBs takes minutes • A scan of comparable data from a SAN would take days or weeks Copyright 2010 Cloudera Inc. All rights reserved 13
  • 14. What is common across Hadoop-able problems? Nature of the data • Complex data • Multiple data sources • Lots of it Nature of the analysis • Batch processing • Parallel execution • Spread data over a cluster of servers and take the computation to the data Copyright 2010 Cloudera Inc. All rights reserved 14
  • 15. What Analysis is Possible With Hadoop? • Text mining • Collaborative filtering • Index building • Prediction models • Graph creation and • Sentiment analysis analysis • Risk assessment • Pattern recognition Copyright 2010 Cloudera Inc. All rights reserved 15
  • 16. Benefits of Analyzing With Hadoop • Previously impossible/impractical to do this analysis • Analysis conducted at lower cost • Analysis conducted in less time • Greater flexibility Copyright 2010 Cloudera Inc. All rights reserved 16
  • 17. The Storm of the Data Deluge is Brewing • Challenges of the openPDC project were just the first wave • Storage requirements are accelerating • Disk speeds are relatively constant • Seeing signs of data deluge, GE now using open sourced Hadoop-based timeseries classifiers developed in the openPDC project Copyright 2010 Cloudera Inc. All rights reserved 17
  • 18. Coming Power Grid Stressors • Larger fluctuations in power demands • Ex: Millions of new electric cars all charging in the evenings • An aging power grid that requires more capital infusion than most companies have allocated for these purposes • Grid infrastructure is older than most realize • Maintenance policies generally only look at age of equipment Copyright 2010 Cloudera Inc. All rights reserved 18
  • 19. The Power Grid Domain is Slow to Evolve • Power companies are slow to adopt technology • They generally have poor maps of their overall infrastructure • Coming pressures are going to force power companies to have to analyze TBs and PBs of data • Ad-Hoc analysis will be needed to explore the complex relationships in this data Copyright 2010 Cloudera Inc. All rights reserved 19
  • 20. Broader Emerging Smartgrid Themes • Simply adding lots of sensors is only a very small part of the solution • Collection, storage, and processing are in themselves all difficult problems • In order to build a more effective Smartgrid, platforms are needed that handle these things well • Smartgrid sensor collection is a subset of the larger undercurrent of emerging massive sensor based systems Copyright 2010 Cloudera Inc. All rights reserved 20
  • 21. Even Broader Theme: Internet of Things • We’re collecting sensor data everywhere, not just the Smartgrid • Many of the techniques described above can be easily done with Hadoop • Open Source generalized collector system is called “Flume” • Examples: • Weather sensors • Mesh networks – battlefield UAVs • Cell Phones – Google Android as a collector Copyright 2010 Cloudera Inc. All rights reserved 21
  • 22. Next Generation Sensor Platform: Hadoop and Related Projects Copyright 2010 Cloudera Inc. All rights reserved 22
  • 23. The Companies That Provide Real Results for Sensor Platforms Will Win • Much of today’s Smartgrid talk is just hype • Few “solutions” actually fix anything, only put sensors on things • Analysis is where the true value lies • But you need a complete platform to be in position to analyze the data Copyright 2010 Cloudera Inc. All rights reserved 23
  • 24. Harnessing Hadoop Has Its Challenges Ease of use – command line interface only; data import and access requires development skills Complexity -- > 12 different components, different versions, dependencies and patch requirements Manageability – Hadoop is challenging to configure, upgrade, monitor and administer Interoperability – limited support for popular databases and analytical tools Copyright 2010 Cloudera Inc. All Rights Reserved. 24
  • 25. Cloudera’s Distribution for Hadoop, version 3 The industry’s leading Hadoop distribution Hue Hue SDK Oozie Oozie Hive Pig/ Hive Flume, Sqoop HBase Zookeeper • Open source – 100% Apache licensed • Simplified – Component versions & dependencies managed for you • Integrated – All components & functions interoperate through standard API’s • Reliable – Patched with fixes from future releases to improve stability • Supported – Employs project founders and committers for >70% of components Copyright 2010 Cloudera Inc. All Rights Reserved. 25
  • 26. Who is Cloudera? • Enterprise software & services company providing the industry’s leading Hadoop-based data management platform • Founding team came from large Web companies • Products: Cloudera Enterprise & Cloudera’s Distribution for Hadoop • All necessary packages, matched, tested and supported • Tools to support production use of Hadoop • The leading distribution for the enterprise • Contributors and committers • Fixing, patching and adding features 26
  • 27. Hear More Examples @ Hadoop World 2010 http://www.cloudera.com/company/press-center/hadoop-world-nyc/ • 2nd annual event focused on practical applications of Hadoop • Date: October 12th 2010 • Location: Hilton New York Confirmed speakers from • Keynote from Tim O’Reilly – founder O’Reilly Media • Pre and post conference training available for Hadoop and related projects • 36 business and technical focused sessions Copyright 2010 Cloudera Inc. All Rights Reserved. 27
  • 28. Questions? Copyright 2010 Cloudera Inc. All Rights Reserved. 28