SlideShare a Scribd company logo
1 of 41
Deploying and Managing
 Hadoop Clusters with
 AMBARI
Matt Foley and Hitesh Shah
Hortonworks, Inc.
mfoley@hortonworks.com
hitesh@hortonworks.com


 © Hortonworks Inc. 2012     Page 1
Matt Foley - Background
•  MTS at Hortonworks Inc.
   – Hadoop Core contributor, part of original ~25 in Yahoo! spin-out of
     Hortonworks
   – Currently managing engineering infrastructure for Hortonworks, including
     build and deployment automation
   – My team also volunteers Build Engineering infrastructure services to ASF,
     for Hadoop core and several related projects within Apache
   – Participated in the Hortonworks team working on Ambari implementation
     during transitional phase
   – Formerly, led software development for back end of Yahoo Mail for three
     years – 20,000 servers in hundreds of clusters, with 30 PB of data under
     management, 400M active users


•  Apache Hadoop, ASF
   – Committer and PMC member, Hadoop core
   – Release Manager – Hadoop-1.0

       Architecting the Future of Big Data
                                                                           Page 2
       © Hortonworks Inc. 2012
Hitesh Shah - Background
• MTS at Hortonworks Inc.
• Committer for Apache MapReduce and Ambari
• Earlier, spent 8+ years at Yahoo! building various
  frameworks all the way from data storage platforms to
  high throughput online ad-serving systems.




     Architecting the Future of Big Data
                                                     Page 3
     © Hortonworks Inc. 2012
Overview
• Brief history – evolution of the Ambari project
• Installation
• Monitoring
• Management
• Invitation




      Architecting the Future of Big Data
                                                    Page 4
      © Hortonworks Inc. 2012
All features are available today
• Apologies that screen shots are from HMC
  (Hortonworks Management Console) version of
  Ambari
• Same code as current Ambari, but with Hortonworks
  graphic elements
• You too can “skin” Ambari with your own logotype
  and graphic elements!




     Architecting the Future of Big Data
                                                     Page 5
     © Hortonworks Inc. 2012
History
Of Ambari




Architecting the Future of Big Data
                                      Page 6
© Hortonworks Inc. 2012
Brief History of the Ambari Project
• Deployment, Monitoring, and Management of Hadoop
  and HBase clusters is:
  – HARD, due to massive scale and distributed services; and
  – DIFFERENT from other kinds of compute clusters,
    due to Hadoop’s intrinsic fault-tolerance
• We needed an Apache opensource solution
• Started Ambari as an Apache incubator project
  – Originally based in part on what was learned from “Hadoop
    Management System” project out of Yahoo!




     Architecting the Future of Big Data
                                                                Page 7
     © Hortonworks Inc. 2012
History (continued)
• Early work specified a full architecture, including
  many elements that remain today:
  – State-based configuration management, rather than event-based
  – Cluster configuration as a data object, able to be saved and manipulated
  – Reliable deployment, parallelized for scalability
  – Insightful monitoring and alerting, sharing our deep experience with the
    community
  – Take advantage of Puppet to achieve idempotence on installs, and
    reliable start/stop of processes
  – Go beyond Puppet to offer orchestrated start/stop of distributed services
• The team started with a “whole cloth” design and
  build project
• 6 months into it, we figured out we had a 2-year
  project on our hands!

      Architecting the Future of Big Data
                                                                           Page 8
      © Hortonworks Inc. 2012
Evolution
•  How to get a useful tool out to the community sooner?
•  Make more use of existing tech
   – Ganglia and Nagios for monitoring and alerting
   – Puppet for reliable deployment and process control
•  Commit to incremental delivery
   – First generation won’t have all the breadth and features desirable
   – But will be useful and worth using


•  And the team has completed the first usable version of Ambari
   over the last few weeks!
   – Offers a good, GUI-driven Deploy experience, currently limited to RHEL5/
     CentOS5 and non-secure mode (but just wait a few more weeks!)
   – Quite nice Monitoring, based on our experience managing multi-
     thousand-node Hadoop clusters at Yahoo!
   – A beginning on Management, with several basic post-install operations

       Architecting the Future of Big Data
                                                                          Page 9
       © Hortonworks Inc. 2012
Deployment
With Ambari




Architecting the Future of Big Data
                                      Page 10
© Hortonworks Inc. 2012
Deployment and Installation Phases
• Preparation
• Cluster Pre-config
• Hadoop Stack Configuration
• Hadoop Stack Deploy / Install
• Service start-up and smoke test




      Architecting the Future of Big Data
                                            Page 11
      © Hortonworks Inc. 2012
Deployment and Installation (Preparation)
•  Prepare Ambari and the Ambari Agent (includes Puppet agent)
   –  Can follow instructions at
      http://svn.apache.org/viewvc/incubator/ambari/trunk/README.txt
   –  Or download the HMC from Hortonworks after Summit, and access its
      documentation
•  Prepare access to ‘yum’ Repositories containing Hadoop Stack
   and Ambari dependencies
   –  If your nodes have direct internet access, can use provided RPMs to “install” the
      repos on each node
   –  Or, to avoid direct access from each node and minimize WAN traffic, can mirror the
      yum repositories to an internal server accessible from the nodes
•  Prepare nodes for installation commands
   –  Set up password-less ‘ssh’ for root user (secured via public keys and agent
      forwarding) from Install Master node to all other cluster nodes, so can run ‘yum
      install’ and ‘puppet’ commands
   –  Take care of any other issues that may prevent root ssh during the Deployment
      phase, such as iptables or SELinux.


        Architecting the Future of Big Data
                                                                                         Page 12
        © Hortonworks Inc. 2012
Deployment and Installation (Pre-config)

• Start running Ambari
• Provide list of hosts
  – Works with Amazon EC2 IP addresses too
• Ambari does node Validation and Discovery
  – Confirms availability and access capability
  – Scans for node attributes and mount points
• Select desired services and data directory paths
• Automatic role assignments to nodes, with your
  approval
  – Based on node attributes and selected services
  – Currently based primarily on memory size, to be refined in future



      Architecting the Future of Big Data
                                                                   Page 13
      © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 14
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 15
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 16
    © Hortonworks Inc. 2012
Deployment and Installation (Configuration)
•  Currently supported Hadoop Stack components for installation:
   – Hadoop Core (required)
   – HBase
   – Pig
   – Hive
   – HCatalog
   – Zookeeper (required for HBase, Hive, Hcat)
   – Sqoop
   – Oozie
   – Ganglia
   – Nagios


•  Modify a subset of about 50 key parameters that most commonly
   need to be adjusted, depending on components selected


       Architecting the Future of Big Data
                                                              Page 17
       © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 18
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 19
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 20
    © Hortonworks Inc. 2012
Deployment and Installation (Deploy)
•  Final review of Cluster and Stack parameters
•  Puppet agent on each node is invoked (in parallel) to reliably
   deploy needed packages
•  Actual fetch and install is managed with ‘yum’
   (for RHEL/CentOS) or comparable services
•  Success / failure is reported back to Install Master and the
   Ambari application
•  Log messages for failures are provided to assist debugging




       Architecting the Future of Big Data
                                                                    Page 21
       © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 22
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 23
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 24
    © Hortonworks Inc. 2012
Deployment and Installation (Smoke Test)
After successful install:

•  Ambari provides “orchestration” to start-up distributed services
   in dependency order

•  Puppet “kicks” are used to reliably (mostly) start and stop
   service processes on individual nodes

•  After each distributed service is started, a smoketest is run and
   results reported

•  Each component is smoketested before dependent components


After successful smoketest, you can be confident that your
selected components have been successfully installed and
started, and are running correctly.

       Architecting the Future of Big Data
                                                                  Page 25
       © Hortonworks Inc. 2012
Going forward
•  Multiple OS support
   – RHEL6/CentOS6
   – Ubuntu and Debian
   – SUSE/SLES
   – Windows
•  Hadoop Security support, including secure install for all
   components
•  HA support
•  Hadoop 2.0 support
•  Improved GUI user interface
•  Integration: Provide CLI commands for invoking Puppet scripts,
   and Web APIs where appropriate
•  Etc.



       Architecting the Future of Big Data
                                                               Page 26
       © Hortonworks Inc. 2012
Monitoring
With Ambari




Architecting the Future of Big Data
                                      Page 27
© Hortonworks Inc. 2012
Monitoring Dashboard




Architecting the Future of Big Data
                                             Page 28
© Hortonworks Inc. 2012
Ambari Monitoring
•  Basic Monitoring capabilities for Hadoop Cluster Services
   –  Up/Down status for installed Hadoop services
   –  Key Alerts configured for health, performance and usage monitoring of
      Hadoop services
   –  Consolidated summary information for Hadoop Services (HDFS, M/R & HBase)
   –  Key service metrics graphs for temporal analysis of service performance, utilization
      and health (+System metrics - Cpu/Memory/Network etc.)


•  Efficient collection and visualization of monitoring metrics
   –  Light weight alert condition checks (mostly over network) for better scalability


•  Leverage Open Source monitoring systems such as Nagios & Ganglia
   –  Nagios - for Alert Monitoring
   –  Ganglia/RRDTool for Hadoop metrics graphs


•  Simple and Intuitive UI to monitor the Hadoop cluster status


        Architecting the Future of Big Data
                                                                                         Page 29
        © Hortonworks Inc. 2012
HDFS Service




Architecting the Future of Big Data
                                               Page 30
© Hortonworks Inc. 2012
Map/Reduce Service




Architecting the Future of Big Data
                                                Page 31
© Hortonworks Inc. 2012
HBase Service




Architecting the Future of Big Data
                                                Page 32
© Hortonworks Inc. 2012
Going forward
•  Rapid iterations with Ambari Open Source community to add more
   monitoring capabilities e.g.
   –  More services Alerts, Summary stats & Reporting for the Hadoop services
   –  Queue/Job level monitoring & Diagnostic Reporting for M/R
   –  Improved Visualization of service metrics graphs & reports
   –  Ability to customize dashboard with relevant graphs, alerts and service information


•  RESTful APIs for Hadoop Monitoring
   –  For integration with Enterprise and Cloud Management Systems, and
      “powered by Ambari” products integration
   –  CLIs


•  Ability to integrate with third party monitoring tools in place of Nagios &
   Ganglia

•  Best practices, tips and guidelines for using Monitoring dashboard for
   identifying and debugging common cluster problems

        Architecting the Future of Big Data
                                                                                     Page 33
        © Hortonworks Inc. 2012
Management
With Ambari




Architecting the Future of Big Data
                                      Page 34
© Hortonworks Inc. 2012
Management
• “Management” can include many different
  post-install activities with Hadoop clusters

• Ambari currently supports only a small set:
  – Start / Stop individual services
       – Dependent services will be automatically stopped also

  – Change configuration parameters for a service
       – Cannot currently change data directory paths

  – Add nodes to the Cluster
       – Decommissioning nodes is currently a manual process

  – Uninstall the Cluster


      Architecting the Future of Big Data
                                                                 Page 35
      © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 36
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 37
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 38
    © Hortonworks Inc. 2012
Going forward
•  Lots more management actions supported
   – Security and user management
   – HA alerting and recovery
   – Extensions of current functionalities
   – Etc.


•  Integration: RESTful APIs / web services for integration with
   established management tools in the data center

•  Improved GUI user interface




       Architecting the Future of Big Data
                                                                   Page 39
       © Hortonworks Inc. 2012
Invitation
• Deployment, Monitoring, and Management – this is
  just the first generation!
• If you are interested in these functionalities and want
  to participate in an Apache opensource project,
  please consider becoming a contributor to the
  AMBARI (incubating) project!
• http://incubator.apache.org/ambari/mail-lists.html




      Architecting the Future of Big Data
                                                       Page 40
      © Hortonworks Inc. 2012
Thank you.




  Architecting the Future of Big Data
                                        Page 41
  © Hortonworks Inc. 2012

More Related Content

What's hot

Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101Quach Tung
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
EC2でkeepalived+LVS(DSR)
EC2でkeepalived+LVS(DSR)EC2でkeepalived+LVS(DSR)
EC2でkeepalived+LVS(DSR)Sugawara Genki
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Oracle Extended Clusters for Oracle RAC
Oracle Extended Clusters for Oracle RACOracle Extended Clusters for Oracle RAC
Oracle Extended Clusters for Oracle RACMarkus Michalewicz
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントCloudera Japan
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
(2020-01).HPE SimpliVity 如何分享腹內Datastore給現現有的ESXi使用
(2020-01).HPE SimpliVity 如何分享腹內Datastore給現現有的ESXi使用(2020-01).HPE SimpliVity 如何分享腹內Datastore給現現有的ESXi使用
(2020-01).HPE SimpliVity 如何分享腹內Datastore給現現有的ESXi使用裝機安 Angelo
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High AvailabilityHortonworks
 

What's hot (20)

Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
EC2でkeepalived+LVS(DSR)
EC2でkeepalived+LVS(DSR)EC2でkeepalived+LVS(DSR)
EC2でkeepalived+LVS(DSR)
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Oracle Extended Clusters for Oracle RAC
Oracle Extended Clusters for Oracle RACOracle Extended Clusters for Oracle RAC
Oracle Extended Clusters for Oracle RAC
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイント
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
(2020-01).HPE SimpliVity 如何分享腹內Datastore給現現有的ESXi使用
(2020-01).HPE SimpliVity 如何分享腹內Datastore給現現有的ESXi使用(2020-01).HPE SimpliVity 如何分享腹內Datastore給現現有的ESXi使用
(2020-01).HPE SimpliVity 如何分享腹內Datastore給現現有的ESXi使用
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 

Viewers also liked

Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariDataWorks Summit
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Hortonworks
 
Ambari Meetup: Architecture and Demo
Ambari Meetup: Architecture and DemoAmbari Meetup: Architecture and Demo
Ambari Meetup: Architecture and DemoHortonworks
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNHortonworks
 
Cloumon Product Introduction
Cloumon Product IntroductionCloumon Product Introduction
Cloumon Product IntroductionGruter
 
Managing your Hadoop Clusters with Ambari
Managing your Hadoop Clusters with AmbariManaging your Hadoop Clusters with Ambari
Managing your Hadoop Clusters with AmbariDataWorks Summit
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
Ambari: Agent Registration Flow
Ambari: Agent Registration FlowAmbari: Agent Registration Flow
Ambari: Agent Registration FlowHortonworks
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache AmbariHortonworks
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureHortonworks
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Past, Present and Future of Apache Ambari
Past, Present and Future of Apache AmbariPast, Present and Future of Apache Ambari
Past, Present and Future of Apache AmbariArtem Ervits
 
Παρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetup
Παρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetupΠαρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetup
Παρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetupIoannis Konstantinou
 

Viewers also liked (20)

Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 
Ambari Meetup: Architecture and Demo
Ambari Meetup: Architecture and DemoAmbari Meetup: Architecture and Demo
Ambari Meetup: Architecture and Demo
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
 
Cloumon Product Introduction
Cloumon Product IntroductionCloumon Product Introduction
Cloumon Product Introduction
 
Managing your Hadoop Clusters with Ambari
Managing your Hadoop Clusters with AmbariManaging your Hadoop Clusters with Ambari
Managing your Hadoop Clusters with Ambari
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Ambari: Agent Registration Flow
Ambari: Agent Registration FlowAmbari: Agent Registration Flow
Ambari: Agent Registration Flow
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
 
Hadoop 기반 빅데이터 이해
Hadoop 기반 빅데이터 이해Hadoop 기반 빅데이터 이해
Hadoop 기반 빅데이터 이해
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Past, Present and Future of Apache Ambari
Past, Present and Future of Apache AmbariPast, Present and Future of Apache Ambari
Past, Present and Future of Apache Ambari
 
Faster Python
Faster PythonFaster Python
Faster Python
 
Παρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetup
Παρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetupΠαρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetup
Παρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetup
 

Similar to Deploying and Managing Hadoop Clusters with AMBARI

Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Hortonworks
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureDataWorks Summit
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureVinod Kumar Vavilapalli
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureDataWorks Summit
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Cloudera, Inc.
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 

Similar to Deploying and Managing Hadoop Clusters with AMBARI (20)

Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Inside hadoop-dev
Inside hadoop-devInside hadoop-dev
Inside hadoop-dev
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Deploying and Managing Hadoop Clusters with AMBARI

  • 1. Deploying and Managing Hadoop Clusters with AMBARI Matt Foley and Hitesh Shah Hortonworks, Inc. mfoley@hortonworks.com hitesh@hortonworks.com © Hortonworks Inc. 2012 Page 1
  • 2. Matt Foley - Background •  MTS at Hortonworks Inc. – Hadoop Core contributor, part of original ~25 in Yahoo! spin-out of Hortonworks – Currently managing engineering infrastructure for Hortonworks, including build and deployment automation – My team also volunteers Build Engineering infrastructure services to ASF, for Hadoop core and several related projects within Apache – Participated in the Hortonworks team working on Ambari implementation during transitional phase – Formerly, led software development for back end of Yahoo Mail for three years – 20,000 servers in hundreds of clusters, with 30 PB of data under management, 400M active users •  Apache Hadoop, ASF – Committer and PMC member, Hadoop core – Release Manager – Hadoop-1.0 Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2012
  • 3. Hitesh Shah - Background • MTS at Hortonworks Inc. • Committer for Apache MapReduce and Ambari • Earlier, spent 8+ years at Yahoo! building various frameworks all the way from data storage platforms to high throughput online ad-serving systems. Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2012
  • 4. Overview • Brief history – evolution of the Ambari project • Installation • Monitoring • Management • Invitation Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2012
  • 5. All features are available today • Apologies that screen shots are from HMC (Hortonworks Management Console) version of Ambari • Same code as current Ambari, but with Hortonworks graphic elements • You too can “skin” Ambari with your own logotype and graphic elements! Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2012
  • 6. History Of Ambari Architecting the Future of Big Data Page 6 © Hortonworks Inc. 2012
  • 7. Brief History of the Ambari Project • Deployment, Monitoring, and Management of Hadoop and HBase clusters is: – HARD, due to massive scale and distributed services; and – DIFFERENT from other kinds of compute clusters, due to Hadoop’s intrinsic fault-tolerance • We needed an Apache opensource solution • Started Ambari as an Apache incubator project – Originally based in part on what was learned from “Hadoop Management System” project out of Yahoo! Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2012
  • 8. History (continued) • Early work specified a full architecture, including many elements that remain today: – State-based configuration management, rather than event-based – Cluster configuration as a data object, able to be saved and manipulated – Reliable deployment, parallelized for scalability – Insightful monitoring and alerting, sharing our deep experience with the community – Take advantage of Puppet to achieve idempotence on installs, and reliable start/stop of processes – Go beyond Puppet to offer orchestrated start/stop of distributed services • The team started with a “whole cloth” design and build project • 6 months into it, we figured out we had a 2-year project on our hands! Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2012
  • 9. Evolution •  How to get a useful tool out to the community sooner? •  Make more use of existing tech – Ganglia and Nagios for monitoring and alerting – Puppet for reliable deployment and process control •  Commit to incremental delivery – First generation won’t have all the breadth and features desirable – But will be useful and worth using •  And the team has completed the first usable version of Ambari over the last few weeks! – Offers a good, GUI-driven Deploy experience, currently limited to RHEL5/ CentOS5 and non-secure mode (but just wait a few more weeks!) – Quite nice Monitoring, based on our experience managing multi- thousand-node Hadoop clusters at Yahoo! – A beginning on Management, with several basic post-install operations Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2012
  • 10. Deployment With Ambari Architecting the Future of Big Data Page 10 © Hortonworks Inc. 2012
  • 11. Deployment and Installation Phases • Preparation • Cluster Pre-config • Hadoop Stack Configuration • Hadoop Stack Deploy / Install • Service start-up and smoke test Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2012
  • 12. Deployment and Installation (Preparation) •  Prepare Ambari and the Ambari Agent (includes Puppet agent) –  Can follow instructions at http://svn.apache.org/viewvc/incubator/ambari/trunk/README.txt –  Or download the HMC from Hortonworks after Summit, and access its documentation •  Prepare access to ‘yum’ Repositories containing Hadoop Stack and Ambari dependencies –  If your nodes have direct internet access, can use provided RPMs to “install” the repos on each node –  Or, to avoid direct access from each node and minimize WAN traffic, can mirror the yum repositories to an internal server accessible from the nodes •  Prepare nodes for installation commands –  Set up password-less ‘ssh’ for root user (secured via public keys and agent forwarding) from Install Master node to all other cluster nodes, so can run ‘yum install’ and ‘puppet’ commands –  Take care of any other issues that may prevent root ssh during the Deployment phase, such as iptables or SELinux. Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2012
  • 13. Deployment and Installation (Pre-config) • Start running Ambari • Provide list of hosts – Works with Amazon EC2 IP addresses too • Ambari does node Validation and Discovery – Confirms availability and access capability – Scans for node attributes and mount points • Select desired services and data directory paths • Automatic role assignments to nodes, with your approval – Based on node attributes and selected services – Currently based primarily on memory size, to be refined in future Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2012
  • 14. . Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2012
  • 15. . Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2012
  • 16. . Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2012
  • 17. Deployment and Installation (Configuration) •  Currently supported Hadoop Stack components for installation: – Hadoop Core (required) – HBase – Pig – Hive – HCatalog – Zookeeper (required for HBase, Hive, Hcat) – Sqoop – Oozie – Ganglia – Nagios •  Modify a subset of about 50 key parameters that most commonly need to be adjusted, depending on components selected Architecting the Future of Big Data Page 17 © Hortonworks Inc. 2012
  • 18. . Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2012
  • 19. . Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2012
  • 20. . Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2012
  • 21. Deployment and Installation (Deploy) •  Final review of Cluster and Stack parameters •  Puppet agent on each node is invoked (in parallel) to reliably deploy needed packages •  Actual fetch and install is managed with ‘yum’ (for RHEL/CentOS) or comparable services •  Success / failure is reported back to Install Master and the Ambari application •  Log messages for failures are provided to assist debugging Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2012
  • 22. . Architecting the Future of Big Data Page 22 © Hortonworks Inc. 2012
  • 23. . Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2012
  • 24. . Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2012
  • 25. Deployment and Installation (Smoke Test) After successful install: •  Ambari provides “orchestration” to start-up distributed services in dependency order •  Puppet “kicks” are used to reliably (mostly) start and stop service processes on individual nodes •  After each distributed service is started, a smoketest is run and results reported •  Each component is smoketested before dependent components After successful smoketest, you can be confident that your selected components have been successfully installed and started, and are running correctly. Architecting the Future of Big Data Page 25 © Hortonworks Inc. 2012
  • 26. Going forward •  Multiple OS support – RHEL6/CentOS6 – Ubuntu and Debian – SUSE/SLES – Windows •  Hadoop Security support, including secure install for all components •  HA support •  Hadoop 2.0 support •  Improved GUI user interface •  Integration: Provide CLI commands for invoking Puppet scripts, and Web APIs where appropriate •  Etc. Architecting the Future of Big Data Page 26 © Hortonworks Inc. 2012
  • 27. Monitoring With Ambari Architecting the Future of Big Data Page 27 © Hortonworks Inc. 2012
  • 28. Monitoring Dashboard Architecting the Future of Big Data Page 28 © Hortonworks Inc. 2012
  • 29. Ambari Monitoring •  Basic Monitoring capabilities for Hadoop Cluster Services –  Up/Down status for installed Hadoop services –  Key Alerts configured for health, performance and usage monitoring of Hadoop services –  Consolidated summary information for Hadoop Services (HDFS, M/R & HBase) –  Key service metrics graphs for temporal analysis of service performance, utilization and health (+System metrics - Cpu/Memory/Network etc.) •  Efficient collection and visualization of monitoring metrics –  Light weight alert condition checks (mostly over network) for better scalability •  Leverage Open Source monitoring systems such as Nagios & Ganglia –  Nagios - for Alert Monitoring –  Ganglia/RRDTool for Hadoop metrics graphs •  Simple and Intuitive UI to monitor the Hadoop cluster status Architecting the Future of Big Data Page 29 © Hortonworks Inc. 2012
  • 30. HDFS Service Architecting the Future of Big Data Page 30 © Hortonworks Inc. 2012
  • 31. Map/Reduce Service Architecting the Future of Big Data Page 31 © Hortonworks Inc. 2012
  • 32. HBase Service Architecting the Future of Big Data Page 32 © Hortonworks Inc. 2012
  • 33. Going forward •  Rapid iterations with Ambari Open Source community to add more monitoring capabilities e.g. –  More services Alerts, Summary stats & Reporting for the Hadoop services –  Queue/Job level monitoring & Diagnostic Reporting for M/R –  Improved Visualization of service metrics graphs & reports –  Ability to customize dashboard with relevant graphs, alerts and service information •  RESTful APIs for Hadoop Monitoring –  For integration with Enterprise and Cloud Management Systems, and “powered by Ambari” products integration –  CLIs •  Ability to integrate with third party monitoring tools in place of Nagios & Ganglia •  Best practices, tips and guidelines for using Monitoring dashboard for identifying and debugging common cluster problems Architecting the Future of Big Data Page 33 © Hortonworks Inc. 2012
  • 34. Management With Ambari Architecting the Future of Big Data Page 34 © Hortonworks Inc. 2012
  • 35. Management • “Management” can include many different post-install activities with Hadoop clusters • Ambari currently supports only a small set: – Start / Stop individual services – Dependent services will be automatically stopped also – Change configuration parameters for a service – Cannot currently change data directory paths – Add nodes to the Cluster – Decommissioning nodes is currently a manual process – Uninstall the Cluster Architecting the Future of Big Data Page 35 © Hortonworks Inc. 2012
  • 36. . Architecting the Future of Big Data Page 36 © Hortonworks Inc. 2012
  • 37. . Architecting the Future of Big Data Page 37 © Hortonworks Inc. 2012
  • 38. . Architecting the Future of Big Data Page 38 © Hortonworks Inc. 2012
  • 39. Going forward •  Lots more management actions supported – Security and user management – HA alerting and recovery – Extensions of current functionalities – Etc. •  Integration: RESTful APIs / web services for integration with established management tools in the data center •  Improved GUI user interface Architecting the Future of Big Data Page 39 © Hortonworks Inc. 2012
  • 40. Invitation • Deployment, Monitoring, and Management – this is just the first generation! • If you are interested in these functionalities and want to participate in an Apache opensource project, please consider becoming a contributor to the AMBARI (incubating) project! • http://incubator.apache.org/ambari/mail-lists.html Architecting the Future of Big Data Page 40 © Hortonworks Inc. 2012
  • 41. Thank you. Architecting the Future of Big Data Page 41 © Hortonworks Inc. 2012