SlideShare a Scribd company logo
1 of 28
How Apache Hadoop is Revolutionizing
Business Intelligence and Data Analytics

Strata Conference, Sept 22nd 2011, New York, NY

Dr. Amr Awadallah, Founder, CTO, VP of Engineering
aaa@cloudera.com, twitter: @awadallah
Business Intelligence Before Adopting Apache Hadoop

  BI Reports + Interactive Apps                        Can’t Explore Original
                                                       High Fidelity Raw Data
    RDBMS (processed data)
       ETL Compute Grid
                   Moving Data To
                   Compute Doesn’t Scale
           Storage Only Grid (original raw data)
                                                                            Archiving =
            Mostly Append
                                                                            Premature
                           Collection                                       Data Death
                     Instrumentation

                    Copyright © 2011, Cloudera, Inc. All Rights Reserved.             2
Business Intelligence After Adopting Apache Hadoop
                                                               Data Exploration &
  BI Reports + Interactive Apps                                Advanced Analytics

            RDBMS




    ETL and Aggregations                               Complex Data Processing
                 Hadoop: Storage + Compute Grid
                 Mostly Append                       Keep Data Alive For Ever
                                  Collection
                            Instrumentation

                    Copyright © 2011, Cloudera, Inc. All Rights Reserved.           3
So What is Apache Hadoop?
• A scalable fault-tolerant distributed system for data storage and
  processing (open source under the Apache license)

• Core Hadoop has two main components:
    • Hadoop Distributed File System: self-healing high-bandwidth clustered storage
    • MapReduce: fault-tolerant distributed processing


• Key business values:
    •   Flexible – Store any data, Run any analysis (Mine First, Govern Later)
    •   Scalable – Start at 1TB/3-nodes then grow to petabytes/thousands of nodes
    •   Affordable – Cost per TB at a fraction of traditional options
    •   Open Source – No Lock-In, Rich Ecosystem, Large developer community
    •   Broadly adopted – A large and active ecosystem, Proven to run at scale

                          Copyright © 2011, Cloudera, Inc. All Rights Reserved.       4
The Main Benefit: Agility/Flexibility

Schema-on-Write (RDBMS):                                  Schema-on-Read (Hadoop):
•   Schema must be created before                        •   Data is simply copied to the file
    data is loaded                                           store, no special transformation is
                                                             needed
•   Explicit load operation has to
    take place which transforms data                     •   A SerDe (Serializer/Deserlizer) is
    to database internal structure                           applied during read time to extract
                                                             the required columns
•   New columns must be added
    explicitly before data for such                      •   New data can start flowing
    columns can be loaded into the                           anytime and will appear
    database                                                 retroactively once the SerDe is
                                                             updated to parse them
•   Read is Fast                                         •   Load is Fast
                                        Benefits
•   Standards/Governance                                 •   Flexibility/Agility

                         Copyright © 2011, Cloudera, Inc. All Rights Reserved.                 5
What is Complex Data Processing?
1. Java MapReduce: Gives the most flexibility and performance,
   but potentially long development cycle (the “assembly
   language” of Hadoop).
2. Streaming MapReduce (also Pipes): Allows you to develop in
   any programming language of your choice, but slightly lower
   performance and less flexibility.
3. Pig: A high-level language out of Yahoo, suitable for batch data
   flow workloads.
4. Hive: A SQL interpreter out of Facebook, also includes a meta-
   store mapping files to their schemas and associated SerDe.
5. Oozie: A PDL XML workflow server engine that enables creating
   a workflow of jobs composed of any of the above.

                    Copyright © 2011, Cloudera, Inc. All Rights Reserved.   6
What This Means For You: Agility

Up Front Design                                                Just in Time




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.         7
What This Means For You: Innovation

   Data Committee                                              Data Scientist




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.           8
What This Means For You: Consolidation

        Silos                                                           Sharing




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.             9
What This Means For You: Extract Value from Latent Data

  Archive to Tape                                         Keep Data Alive




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.       10
What This Means For You: Ability to Grow Fluidly
Benefit #2: Scalability




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.   11
What This Means For You: Data Beats Algorithm

  Smarter Algos                                            More Data




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.   12
Where Does Hadoop Fit in the Enterprise Data Stack?
                                          Data Scientists          Analysts         Business Users



                                                                                       Enterprise
                                                 IDEs            BI, Analytics
                           System                                                      Reporting
                          Operators
                                          Development Tools                 Business Intelligence Tools


                          Cloudera
                         Mgmt Suite                                                               Enterprise
                                                                                                    Data
  Data
             ETL Tools




Architects                                                                                        Warehouse     Customers



                                                                                                  Low-Latency     Web
                                                                                                    Serving     Application

                                                                           Relational               Systems
                     Logs             Files           Web Data
                                                                           Databases

                                          Copyright © 2011, Cloudera, Inc. All Rights Reserved.                         13
Use The Right Tool For The Right Job

    Relational Databases:                             Hadoop:




Use when:                                              Use when:
•   Interactive OLAP Analytics (<1sec)                 •   Structured or Not (Agility)
•   Multistep ACID Transactions                        •   Scalability of Storage/Compute
•   100% SQL Compliance                                •   Complex Data Processing
                         Copyright © 2011, Cloudera, Inc. All Rights Reserved.              14
Two Core Use Cases Common Across Many Industries

Use Case                   Application                       Industry                            Application      Use Case
                      Social Network Analysis                  Web                   Clickstream Sessionization
 ADVANCED ANALYTICS




                                                             Media




                                                                                                                   DATA PROCESSING
                       Content Optimization                                          Clickstream Sessionization

                        Network Analytics                      Telco                              Mediation

                       Loyalty & Promotions                   Retail                             Data Factory

                          Fraud Analysis                    Financial                    Trade Reconciliation

                          Entity Analysis                    Federal                               SIGINT

                       Sequencing Analysis             Bioinformatics                      Genome Mapping

                         Product Quality              Manufacturing                     Mfg Process Tracking



                                         Copyright © 2011, Cloudera, Inc. All Rights Reserved.                               15
CDH: Cloudera’s Distribution Including Apache Hadoop
                     UI Framework                HUE                               SDK              HUE SDK


               Workflow       OOZIE             Scheduling         OOZIE                 Metadata      HIVE


                                        Languages / Compilers
                                                                       PIG, HIVE     Fast Read/Write
         Data Integration
                                                                                          Access
         FLUME, SQOOP, ODBC                                                                  HBASE


                                               Coordination                                ZOOKEEPER




•   Open Source – 100% Apache licensed, 100% Open Source, 100% Free.
•   Enterprise Ready – Predictable releases, Documentation, Hotfix Patches, Intensive QA
•   Integrated – All required component versions & dependencies are managed for you
•   Industry Standard – Existing RDBMS, ETL and BI systems work best with it
•   Many Form Factors – Public Cloud, Private Cloud, Ubuntu, RHEL, 32/64bit, etc

                                 Copyright © 2011, Cloudera, Inc. All Rights Reserved.                        16
SCM Express: Simplifies Installation and Configuration

    Service & Configuration Manager
    (SCM) Express takes the complexity out of
    deploying and configuring CDH.

     Provision a complete Hadoop stack in minutes
     Centrally manage system services through a user-
      friendly interface
     Manages services for up to 50 nodes
     FREE to download


KEY FEATURES
Automated, wizard-based    Central, real-time        Ability to configure the         Incorporates          Automates the expansion
   installation of the      dashboard for           cluster while it’s running   comprehensive validation   of services to new nodes
 complete Hadoop stack       configuration                                          and error checking       when they come online
                             management


         1                       2                            3                           4                          5
                                            ©2011 Cloudera, Inc. All Rights Reserved.                                         17
What is Cloudera Enterprise?

Cloudera Enterprise makes open source                            CLOUDERA ENTERPRISE COMPONENTS
Apache Hadoop enterprise-easy
                                                               Cloudera                       Production-Level
 Simplify and Accelerate Hadoop Deployment
                                                            Management Suite                      Support
 Reduce Adoption Costs and Risks
 Lower the Cost of Administration                             Comprehensive                Our Team of Experts
                                                             Toolset for Hadoop             On-Call to Help You
 Increase the Transparency & Control of Hadoop
                                                               Administration                 Meet Your SLAs
 Leverage the Experience of Our Experts



   3 of the top 5 telecommunications, mobile services, defense & intelligence,
     banking, media and retail organizations depend on Cloudera Enterprise

            EFFECTIVENESS                                                         EFFICIENCY
            Ensuring Repeatable Value from                                        Enabling Apache Hadoop to be
            Apache Hadoop Deployments                                             Affordably Run in Production



                                     ©2011 Cloudera, Inc. All Rights Reserved.                                    18
Hadoop World 2011

    The largest gathering of Hadoop practitioners, developers,
    business executives, industry luminaries and innovative
    companies in the Hadoop ecosystem.

•    1400 attendees, 25+ sponsors
                                                                        November 8-9
•    60 sessions across 5 tracks for:
                                                                   Sheraton New York Hotel
      – Business Decision Makers                                        & Towers, NYC
      – Enterprise Architects
      – IT Operators                                                   Learn more and register at
      – Data Scientists                                            www.hadoopworld.com
      – Developers
•    Cloudera Training and Certification                                  $50 discount for
     (November 7, 10, 11)                                                 Strata attendees



                           ©2011 Cloudera, Inc. All Rights Reserved.                                19
What I Would Like You To Remember:
• The Key Benefits of the Apache Hadoop Data Platform:
   • Agility/Flexibility (Enables Innovation/Exploration).
   • Complex Data Processing (Any Language, Any Problem).
   • Scalability of Storage/Compute (Freedom to Grow).
   • Economical Active Archive (Keep All Your Data Alive).

• Cloudera Enterprise enables:
   •   Lower the Cost of Management and Administration.
   •   Simplify and Accelerate Hadoop Deployment.
   •   Increase the Transparency & Control of Hadoop.
   •   Firm SLAs on Issue Resolution.
                   Copyright © 2011, Cloudera, Inc. All Rights Reserved.   20
Contact Information:



          Amr Awadallah
        aaa@cloudera.com
           650-644-3921
   http://twitter.com/awadallah




                  Copyright © 2011, Cloudera, Inc. All Rights Reserved.   21
Copyright © 2011, Cloudera, Inc. All Rights Reserved.   22
Appendix



      Copyright © 2011, Cloudera, Inc. All Rights Reserved.   23
Hadoop Timeline

                                                                              Fastest sort of a TB, 3.5mins
                                                                              over 910 nodes
                         Doug Cutting adds DFS &
                        MapReduce support to Nutch                                              • Fastest sort of a TB, 62secs
                                                                                                over 1,460 nodes
                                                            NY Times converts 4TB of            • Sorted a PB in 16.25hours
Doug Cutting & Mike Cafarella                                                                   over 3,658 nodes
                                                          image archives over 100 EC2s
  started working on Nutch


     2002        2003           2004         2005            2006            2007         2008           2009

             Google publishes GFS &
                                                   Yahoo! hires Cutting,                      Cloudera         Doug Cutting
               MapReduce papers
                                                 Hadoop spins out of Nutch                    Founded         joins Cloudera

                                                                     Facebooks launches Hive:
                                                                      SQL Support for Hadoop
                                                                                                  Hadoop Summit 2009,
                                                                                                     750 attendees


                                  Copyright © 2011, Cloudera, Inc. All Rights Reserved.                                  24
Cloudera’s Track Record
• Customers: Multiple customers with >1,000 Hadoop nodes under management
• Supporting dozens of diverse production use cases including ones that are revenue critical
  with tight SLA’s

• Community: years of demonstrated leadership in the Apache Hadoop ecosystem.
  Cloudera employees are:
    • The largest contributor to the Hadoop ecosystem in patches
    • Founders of 70% of the projects in the Apache Hadoop ecosystem including Apache
      Hadoop itself
    • The first to build & integrate what is now the reference Hadoop stack

• Industry: Multiple years of experience providing Hadoop solutions across industries:
    • 2 of the top 5 payments companies run Cloudera
    • 3 of the top 5 commerical banks run Cloudera
    • 2 of the top 4 online travel companies run Cloudera


                            Copyright © 2011, Cloudera, Inc. All Rights Reserved.        25
Cloudera Enterprise Management Suite

Utility                   It Helps You…                       So You Can…                        It’s Like…
Activity Monitor          • Consolidate all user activities
                            into a real-time view
                                                              • Improve performance              • MySQL Enterprise Monitor
                                                              • Improve conformance to           • Quest Foglight for Oracle /
                          • Diagnose user performance           SLAs                               SQL Server
                          • Track activity metrics            • Improve QOS



Service &                 • Manage system services            • Lower cost of administration     • Red Hat Satellite Server
                          • Automate changes                  • Improve uptime                   • Microsoft System Center
Configuration             • Validate settings                                                    • Oracle Enterprise Manager
Manager                   • 1-click security


Resource                  • Report on the usage of
                            scarce resources
                                                              • Improve quality of service       • VMware vCenter
                                                              • Extend the life of the cluster
Manager                   • Plan for capacity expansion




Authorization             • Centralize management of all
                            users, groups and privileges
                                                              • Lower the costs of
                                                                administration
                                                                                                 • Teradata security
                                                                                                   administration
Manager                   • Manage permissions via            • Improve compliance
                            delegated administration




                   ©2011 Cloudera, Inc. All Rights Reserved.                                                             26
CDH Integrates with Existing IT Infrastructure

   BI/Analytics   ETL                   Databases                 Cloud/OS      Hardware




                        Copyright © 2011, Cloudera, Inc. All Rights Reserved.              27
Copyright © 2011, Cloudera, Inc. All Rights Reserved.   28

More Related Content

What's hot

Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsEduardo Castro
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsRavindra kumar
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big dataJack (Yaakov) Bezalel
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesCarole Gunst
 
McGraw-Hill Optimizes Analytics Workloads with Databricks
 McGraw-Hill Optimizes Analytics Workloads with Databricks McGraw-Hill Optimizes Analytics Workloads with Databricks
McGraw-Hill Optimizes Analytics Workloads with DatabricksAmazon Web Services
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricksBrandon Berlinrut
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsMatei Zaharia
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 

What's hot (20)

Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Data lake
Data lakeData lake
Data lake
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
McGraw-Hill Optimizes Analytics Workloads with Databricks
 McGraw-Hill Optimizes Analytics Workloads with Databricks McGraw-Hill Optimizes Analytics Workloads with Databricks
McGraw-Hill Optimizes Analytics Workloads with Databricks
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 

Viewers also liked

Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Amr Awadallah
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteAmr Awadallah
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentCloudera, Inc.
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
ElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
ElasticES-Hadoop: Bridging the world of Hadoop and ElasticsearchElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
ElasticES-Hadoop: Bridging the world of Hadoop and ElasticsearchMapR Technologies
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR Technologies
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadThink Big, a Teradata Company
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Amr Awadallah
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystemtfmailru
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsAmr Awadallah
 

Viewers also liked (20)

Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI Environment
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
ElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
ElasticES-Hadoop: Bridging the world of Hadoop and ElasticsearchElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
ElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale Applications
 

Similar to How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics - Strata Conf - Sept 2011

The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesCloudera, Inc.
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseCloudera, Inc.
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...Cloudera, Inc.
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 

Similar to How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics - Strata Conf - Sept 2011 (20)

The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 

Recently uploaded

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesSanjay Willie
 

Recently uploaded (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
 

How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics - Strata Conf - Sept 2011

  • 1. How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics Strata Conference, Sept 22nd 2011, New York, NY Dr. Amr Awadallah, Founder, CTO, VP of Engineering aaa@cloudera.com, twitter: @awadallah
  • 2. Business Intelligence Before Adopting Apache Hadoop BI Reports + Interactive Apps Can’t Explore Original High Fidelity Raw Data RDBMS (processed data) ETL Compute Grid Moving Data To Compute Doesn’t Scale Storage Only Grid (original raw data) Archiving = Mostly Append Premature Collection Data Death Instrumentation Copyright © 2011, Cloudera, Inc. All Rights Reserved. 2
  • 3. Business Intelligence After Adopting Apache Hadoop Data Exploration & BI Reports + Interactive Apps Advanced Analytics RDBMS ETL and Aggregations Complex Data Processing Hadoop: Storage + Compute Grid Mostly Append Keep Data Alive For Ever Collection Instrumentation Copyright © 2011, Cloudera, Inc. All Rights Reserved. 3
  • 4. So What is Apache Hadoop? • A scalable fault-tolerant distributed system for data storage and processing (open source under the Apache license) • Core Hadoop has two main components: • Hadoop Distributed File System: self-healing high-bandwidth clustered storage • MapReduce: fault-tolerant distributed processing • Key business values: • Flexible – Store any data, Run any analysis (Mine First, Govern Later) • Scalable – Start at 1TB/3-nodes then grow to petabytes/thousands of nodes • Affordable – Cost per TB at a fraction of traditional options • Open Source – No Lock-In, Rich Ecosystem, Large developer community • Broadly adopted – A large and active ecosystem, Proven to run at scale Copyright © 2011, Cloudera, Inc. All Rights Reserved. 4
  • 5. The Main Benefit: Agility/Flexibility Schema-on-Write (RDBMS): Schema-on-Read (Hadoop): • Schema must be created before • Data is simply copied to the file data is loaded store, no special transformation is needed • Explicit load operation has to take place which transforms data • A SerDe (Serializer/Deserlizer) is to database internal structure applied during read time to extract the required columns • New columns must be added explicitly before data for such • New data can start flowing columns can be loaded into the anytime and will appear database retroactively once the SerDe is updated to parse them • Read is Fast • Load is Fast Benefits • Standards/Governance • Flexibility/Agility Copyright © 2011, Cloudera, Inc. All Rights Reserved. 5
  • 6. What is Complex Data Processing? 1. Java MapReduce: Gives the most flexibility and performance, but potentially long development cycle (the “assembly language” of Hadoop). 2. Streaming MapReduce (also Pipes): Allows you to develop in any programming language of your choice, but slightly lower performance and less flexibility. 3. Pig: A high-level language out of Yahoo, suitable for batch data flow workloads. 4. Hive: A SQL interpreter out of Facebook, also includes a meta- store mapping files to their schemas and associated SerDe. 5. Oozie: A PDL XML workflow server engine that enables creating a workflow of jobs composed of any of the above. Copyright © 2011, Cloudera, Inc. All Rights Reserved. 6
  • 7. What This Means For You: Agility Up Front Design Just in Time Copyright © 2011, Cloudera, Inc. All Rights Reserved. 7
  • 8. What This Means For You: Innovation Data Committee Data Scientist Copyright © 2011, Cloudera, Inc. All Rights Reserved. 8
  • 9. What This Means For You: Consolidation Silos Sharing Copyright © 2011, Cloudera, Inc. All Rights Reserved. 9
  • 10. What This Means For You: Extract Value from Latent Data Archive to Tape Keep Data Alive Copyright © 2011, Cloudera, Inc. All Rights Reserved. 10
  • 11. What This Means For You: Ability to Grow Fluidly Benefit #2: Scalability Copyright © 2011, Cloudera, Inc. All Rights Reserved. 11
  • 12. What This Means For You: Data Beats Algorithm Smarter Algos More Data Copyright © 2011, Cloudera, Inc. All Rights Reserved. 12
  • 13. Where Does Hadoop Fit in the Enterprise Data Stack? Data Scientists Analysts Business Users Enterprise IDEs BI, Analytics System Reporting Operators Development Tools Business Intelligence Tools Cloudera Mgmt Suite Enterprise Data Data ETL Tools Architects Warehouse Customers Low-Latency Web Serving Application Relational Systems Logs Files Web Data Databases Copyright © 2011, Cloudera, Inc. All Rights Reserved. 13
  • 14. Use The Right Tool For The Right Job Relational Databases: Hadoop: Use when: Use when: • Interactive OLAP Analytics (<1sec) • Structured or Not (Agility) • Multistep ACID Transactions • Scalability of Storage/Compute • 100% SQL Compliance • Complex Data Processing Copyright © 2011, Cloudera, Inc. All Rights Reserved. 14
  • 15. Two Core Use Cases Common Across Many Industries Use Case Application Industry Application Use Case Social Network Analysis Web Clickstream Sessionization ADVANCED ANALYTICS Media DATA PROCESSING Content Optimization Clickstream Sessionization Network Analytics Telco Mediation Loyalty & Promotions Retail Data Factory Fraud Analysis Financial Trade Reconciliation Entity Analysis Federal SIGINT Sequencing Analysis Bioinformatics Genome Mapping Product Quality Manufacturing Mfg Process Tracking Copyright © 2011, Cloudera, Inc. All Rights Reserved. 15
  • 16. CDH: Cloudera’s Distribution Including Apache Hadoop UI Framework HUE SDK HUE SDK Workflow OOZIE Scheduling OOZIE Metadata HIVE Languages / Compilers PIG, HIVE Fast Read/Write Data Integration Access FLUME, SQOOP, ODBC HBASE Coordination ZOOKEEPER • Open Source – 100% Apache licensed, 100% Open Source, 100% Free. • Enterprise Ready – Predictable releases, Documentation, Hotfix Patches, Intensive QA • Integrated – All required component versions & dependencies are managed for you • Industry Standard – Existing RDBMS, ETL and BI systems work best with it • Many Form Factors – Public Cloud, Private Cloud, Ubuntu, RHEL, 32/64bit, etc Copyright © 2011, Cloudera, Inc. All Rights Reserved. 16
  • 17. SCM Express: Simplifies Installation and Configuration Service & Configuration Manager (SCM) Express takes the complexity out of deploying and configuring CDH.  Provision a complete Hadoop stack in minutes  Centrally manage system services through a user- friendly interface  Manages services for up to 50 nodes  FREE to download KEY FEATURES Automated, wizard-based Central, real-time Ability to configure the Incorporates Automates the expansion installation of the dashboard for cluster while it’s running comprehensive validation of services to new nodes complete Hadoop stack configuration and error checking when they come online management 1 2 3 4 5 ©2011 Cloudera, Inc. All Rights Reserved. 17
  • 18. What is Cloudera Enterprise? Cloudera Enterprise makes open source CLOUDERA ENTERPRISE COMPONENTS Apache Hadoop enterprise-easy Cloudera Production-Level  Simplify and Accelerate Hadoop Deployment Management Suite Support  Reduce Adoption Costs and Risks  Lower the Cost of Administration Comprehensive Our Team of Experts Toolset for Hadoop On-Call to Help You  Increase the Transparency & Control of Hadoop Administration Meet Your SLAs  Leverage the Experience of Our Experts 3 of the top 5 telecommunications, mobile services, defense & intelligence, banking, media and retail organizations depend on Cloudera Enterprise EFFECTIVENESS EFFICIENCY Ensuring Repeatable Value from Enabling Apache Hadoop to be Apache Hadoop Deployments Affordably Run in Production ©2011 Cloudera, Inc. All Rights Reserved. 18
  • 19. Hadoop World 2011 The largest gathering of Hadoop practitioners, developers, business executives, industry luminaries and innovative companies in the Hadoop ecosystem. • 1400 attendees, 25+ sponsors November 8-9 • 60 sessions across 5 tracks for: Sheraton New York Hotel – Business Decision Makers & Towers, NYC – Enterprise Architects – IT Operators Learn more and register at – Data Scientists www.hadoopworld.com – Developers • Cloudera Training and Certification $50 discount for (November 7, 10, 11) Strata attendees ©2011 Cloudera, Inc. All Rights Reserved. 19
  • 20. What I Would Like You To Remember: • The Key Benefits of the Apache Hadoop Data Platform: • Agility/Flexibility (Enables Innovation/Exploration). • Complex Data Processing (Any Language, Any Problem). • Scalability of Storage/Compute (Freedom to Grow). • Economical Active Archive (Keep All Your Data Alive). • Cloudera Enterprise enables: • Lower the Cost of Management and Administration. • Simplify and Accelerate Hadoop Deployment. • Increase the Transparency & Control of Hadoop. • Firm SLAs on Issue Resolution. Copyright © 2011, Cloudera, Inc. All Rights Reserved. 20
  • 21. Contact Information: Amr Awadallah aaa@cloudera.com 650-644-3921 http://twitter.com/awadallah Copyright © 2011, Cloudera, Inc. All Rights Reserved. 21
  • 22. Copyright © 2011, Cloudera, Inc. All Rights Reserved. 22
  • 23. Appendix Copyright © 2011, Cloudera, Inc. All Rights Reserved. 23
  • 24. Hadoop Timeline Fastest sort of a TB, 3.5mins over 910 nodes Doug Cutting adds DFS & MapReduce support to Nutch • Fastest sort of a TB, 62secs over 1,460 nodes NY Times converts 4TB of • Sorted a PB in 16.25hours Doug Cutting & Mike Cafarella over 3,658 nodes image archives over 100 EC2s started working on Nutch 2002 2003 2004 2005 2006 2007 2008 2009 Google publishes GFS & Yahoo! hires Cutting, Cloudera Doug Cutting MapReduce papers Hadoop spins out of Nutch Founded joins Cloudera Facebooks launches Hive: SQL Support for Hadoop Hadoop Summit 2009, 750 attendees Copyright © 2011, Cloudera, Inc. All Rights Reserved. 24
  • 25. Cloudera’s Track Record • Customers: Multiple customers with >1,000 Hadoop nodes under management • Supporting dozens of diverse production use cases including ones that are revenue critical with tight SLA’s • Community: years of demonstrated leadership in the Apache Hadoop ecosystem. Cloudera employees are: • The largest contributor to the Hadoop ecosystem in patches • Founders of 70% of the projects in the Apache Hadoop ecosystem including Apache Hadoop itself • The first to build & integrate what is now the reference Hadoop stack • Industry: Multiple years of experience providing Hadoop solutions across industries: • 2 of the top 5 payments companies run Cloudera • 3 of the top 5 commerical banks run Cloudera • 2 of the top 4 online travel companies run Cloudera Copyright © 2011, Cloudera, Inc. All Rights Reserved. 25
  • 26. Cloudera Enterprise Management Suite Utility It Helps You… So You Can… It’s Like… Activity Monitor • Consolidate all user activities into a real-time view • Improve performance • MySQL Enterprise Monitor • Improve conformance to • Quest Foglight for Oracle / • Diagnose user performance SLAs SQL Server • Track activity metrics • Improve QOS Service & • Manage system services • Lower cost of administration • Red Hat Satellite Server • Automate changes • Improve uptime • Microsoft System Center Configuration • Validate settings • Oracle Enterprise Manager Manager • 1-click security Resource • Report on the usage of scarce resources • Improve quality of service • VMware vCenter • Extend the life of the cluster Manager • Plan for capacity expansion Authorization • Centralize management of all users, groups and privileges • Lower the costs of administration • Teradata security administration Manager • Manage permissions via • Improve compliance delegated administration ©2011 Cloudera, Inc. All Rights Reserved. 26
  • 27. CDH Integrates with Existing IT Infrastructure BI/Analytics ETL Databases Cloud/OS Hardware Copyright © 2011, Cloudera, Inc. All Rights Reserved. 27
  • 28. Copyright © 2011, Cloudera, Inc. All Rights Reserved. 28