SlideShare a Scribd company logo
1 of 17
Download to read offline
2010/5/20
   OSS
OSS Laboratories Inc.!
                   http://www.ossl.co.jp

             Mail: funai@ossl.co.jp
       Twitter: http://twitter.com/satoruf
LinkedIn: http://jp.linkedin.com/in/satorufunai/ja
                                                                 1
   Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved
•                                     OSS
•    Apache
•    Google
          !                         GFS (Google File System) : HDFS (Hadoop Distributed File System)
          !                        Google MapReduce : Hadoop MapReduce
          !                      Google Chubby : Hadoop Zookeeper
          !    DSL Google Sawzall : Hadoop Pig
          !                 Google BigTable : Hadoop Hbase
          !                            Google ? : Hadoop Hive

• 

• 

     •    Yahoo! Facebook Amazon China Mobile VISA JP Morgan Chase
     •                                               UFJ             NTT
•    ACID Atomic Consistent Isolated Durable                                                BASE Basically
     Available Soft-State Eventual Consistency
• 
                              Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved                    2
Apache Hadoop

                   ETL Tools        BI Reporting      RDBMS

                 Pig (Data Flow)     Hive (SQL)        Sqoop




                                                                    Avro (Serialization)
(Coordination)




                 MapReduce (Job Scheduling/Execution System)
  Zookeepr




                 HBase (key-value store)   (Streaming/Pipes APIs)


                                     HDFS
                        (Hadoop Distributed File System)
HDFS: Hadoop Distributed File System




        HDFS




         64MB
MapReduce: Distributed Processing




       /
                  Map
                 Reduce

1
Hadoop

    Business Intelligence                Interactive Application
               OLAP Data Mart                 OLTP Data Store




         Engineers

                     Hadoop: Storage and Batch Processing




                                  ETL/sqoop
Hadoop
!                                         :
     !    2x Quad Core Nehalems
     !    24GB
     !    12 * 1TB SATA       (JBOD           , RAID     )
     !    1 Gigabit Ethernet
!                            :
!           HDFS         :
     !    ! reserved for temp shuffle space, which leaves 9TB/node
     !    3 way replication leads to 3TB effective HDFS space/node
     !    But assuming 7x compression that becomes ~ 20TB/node
          TB          :2     5    /TB
Yahoo!
•             Hadoop
•    25,000    82PB                                         Hadoop
•                  4,000                  64TB                 16PB                32,000
•    500


•                       SearchAssistTM
              26       20
•    1,500    1TB                                                          62
•    3,700    1PB                                                          16




                     Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved            8
Facebook
     •    200                                                     Hive                                         DWH Hadoop

     •    +12TB/
     •    135TB/
     •                                  1,050                     32TB                            12.5PB
          4,800
     • 



                                                              .*/"&
    !"#$                     %)&*#"$                          ($
    %"&'"&($                 +*,-*"&$
              5*'"$
                                                      %)&*#"657,118$
              &"8/*)731
=,A1)$5*'"657,118$                                    9/2(:"&$
              4$
9/2(:"&$                                                                                                                                            Node
                                                                                                                                                    =

                    0&1,2)314$5*'"657,118$
                                                                               Disks      Disks        Disks      Disks       Disks      Disks      DataNode
      ;&7)/"$                                          .","&7:",$                  Node       Node         Node       Node        Node       Node   +

      <=9$          9/2(:"&$                                                                                                                        Map-Reduce
                                                       +>%?@$                                        1 Gigabit               4 Gigabit



                                        Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved                                                              9
VISA
•          2     Hadoop                                         340TB
     •    Hadoop #1 ~40Tb / 42 node
     •    Hadoop #1 ~300Tb / 28 node
•    Hadoop
                      (                   )
•                     (                                      ) Hadoop
                                                                                         IP


•                                          2                   7 3000                         36TB
•                                                              1
     Hadoop                                                                             13

                          Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved                10
•                                            5
     •                               CMCC
                                        CDR
                                                         1
          5TB~9TB                 2,000
               1          300GB
•    BC-PDM(Big Cloud based Parallel Data Mining)

     •    Hadoop                          HDFS
           Hyper-DFS
                Hadoop
•    16
     •    ETL   12   16
     •                       10      50
     •                                           3   7
•          Hadoop

                      256                            Hadoop




                                            Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved   11
JP
•    Hadoop
•         PC
                                         (RDBMS)
•         RDBMS                    SAN/NAS




                  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved   12
•                                                            Hadoop
     •    4,000                2,000
                  GB
     •      GB          x
     • 



• 
     • 


     •                       150%
     • 


                  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved   15
•    COOKPAD:                                                                   3.9
                              816                                                     64
           30                 4                 1

• 


•    Amazon EC2               50                  Hadoop
• 


• 
                            http://business.nikkeibp.co.jp/article/tech/20100416/214016/
                  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved              16
http://www.cloudera.com/


•    Hadoop
•    Cloudera                Mike Olson Oracle
                          SleepycatSoftware CEO) Christophe Bisciglia Google
                                                                       Dr.Amr
     Awadallah Yahoo!            VivaSmart Jeff Hammerbacher Facebook

•    Cloudera                  Diane Greene VMware CEO Mike Abbott Palm
                CaterinaFake Flickr              Dr. Qi Lu Microsoft
                                    Yahoo!           MartenMickos MySQL
       CEO Jeff Weiner LinkedIn        Yahoo!          Gideon Yu Facebook
     CFO    YouTube CFO
•       Yahoo! Facebook OpenPDC Codeplex
     Hadoop




                          Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved   17
Pentaho + Hadoop
 •    2010/7
 •    Hadoop                                                                 BI




                                     Hive

                                                   Hadoop DFS


               Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved        18
IBM InfoSphere BigInsights
•     Apache Hadoop                                      BigInsights Core
           Web
            BigSheets   2
•                          BigSheets               BigSheets
                        BigInsights Core

•     BigSheet




                              Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved   19

More Related Content

What's hot

データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)Takumi Asai
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
DMM.com ラボはなぜSparkを採用したのか? レコメンドエンジン開発の裏側をお話します
DMM.com ラボはなぜSparkを採用したのか? レコメンドエンジン開発の裏側をお話しますDMM.com ラボはなぜSparkを採用したのか? レコメンドエンジン開発の裏側をお話します
DMM.com ラボはなぜSparkを採用したのか? レコメンドエンジン開発の裏側をお話しますWataru Shinohara
 
Pittaro open stackloganalysis_20130416
Pittaro open stackloganalysis_20130416Pittaro open stackloganalysis_20130416
Pittaro open stackloganalysis_20130416OpenStack Foundation
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
Sparkcamp stratasingapore
Sparkcamp stratasingaporeSparkcamp stratasingapore
Sparkcamp stratasingaporeCheng Feng
 
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理Tatsuya Sasaki
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoopDavid Chiu
 
Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2ovarene
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceHortonworks
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hortonworks
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationYahoo Developer Network
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 

What's hot (19)

データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
DMM.com ラボはなぜSparkを採用したのか? レコメンドエンジン開発の裏側をお話します
DMM.com ラボはなぜSparkを採用したのか? レコメンドエンジン開発の裏側をお話しますDMM.com ラボはなぜSparkを採用したのか? レコメンドエンジン開発の裏側をお話します
DMM.com ラボはなぜSparkを採用したのか? レコメンドエンジン開発の裏側をお話します
 
Pittaro open stackloganalysis_20130416
Pittaro open stackloganalysis_20130416Pittaro open stackloganalysis_20130416
Pittaro open stackloganalysis_20130416
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
Sparkcamp stratasingapore
Sparkcamp stratasingaporeSparkcamp stratasingapore
Sparkcamp stratasingapore
 
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
May 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data OutMay 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
 

Viewers also liked

Expanding The Micro Blaze System
Expanding  The Micro Blaze  SystemExpanding  The Micro Blaze  System
Expanding The Micro Blaze Systemiuui
 
Internet of Things
Internet of ThingsInternet of Things
Internet of ThingsDavid Ruiz
 
Alpine IDA-X300 Español
Alpine IDA-X300 EspañolAlpine IDA-X300 Español
Alpine IDA-X300 Españoldegarden
 
Consumer Electronics_B Plan Final
Consumer Electronics_B Plan FinalConsumer Electronics_B Plan Final
Consumer Electronics_B Plan FinalVikram Bothra
 
Quasi - Modelo de datos
Quasi - Modelo de datosQuasi - Modelo de datos
Quasi - Modelo de datosdegarden
 
Cenit 3 Grupo Frial
Cenit 3 Grupo FrialCenit 3 Grupo Frial
Cenit 3 Grupo FrialGrupo Frial
 
TechShanghai2016 - 高可靠性PCB – 从设计到制造
TechShanghai2016 - 高可靠性PCB – 从设计到制造TechShanghai2016 - 高可靠性PCB – 从设计到制造
TechShanghai2016 - 高可靠性PCB – 从设计到制造Hardway Hou
 
About Euroconsult
About EuroconsultAbout Euroconsult
About EuroconsultEuroconsult
 
BECARIOS MAESTRÍA EN CIENCIAS DE LA EDUCACIÓN 2014
BECARIOS MAESTRÍA EN CIENCIAS DE LA EDUCACIÓN 2014BECARIOS MAESTRÍA EN CIENCIAS DE LA EDUCACIÓN 2014
BECARIOS MAESTRÍA EN CIENCIAS DE LA EDUCACIÓN 2014SAUL MIQUIAS VICTORIO HURTADO
 
Estudio cuantitativo del grupo 28
 Estudio cuantitativo del grupo 28 Estudio cuantitativo del grupo 28
Estudio cuantitativo del grupo 28haoqi163
 
igus® : Câbles Chainflex®
igus® : Câbles Chainflex®  igus® : Câbles Chainflex®
igus® : Câbles Chainflex® igus France
 
Made In Norway? Hvordan roboter, 3D-printere og digitalisering gir nye muligh...
Made In Norway? Hvordan roboter, 3D-printere og digitalisering gir nye muligh...Made In Norway? Hvordan roboter, 3D-printere og digitalisering gir nye muligh...
Made In Norway? Hvordan roboter, 3D-printere og digitalisering gir nye muligh...Teknologirådet
 

Viewers also liked (20)

Expanding The Micro Blaze System
Expanding  The Micro Blaze  SystemExpanding  The Micro Blaze  System
Expanding The Micro Blaze System
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Alpine IDA-X300 Español
Alpine IDA-X300 EspañolAlpine IDA-X300 Español
Alpine IDA-X300 Español
 
Consumer Electronics_B Plan Final
Consumer Electronics_B Plan FinalConsumer Electronics_B Plan Final
Consumer Electronics_B Plan Final
 
Quasi - Modelo de datos
Quasi - Modelo de datosQuasi - Modelo de datos
Quasi - Modelo de datos
 
APIBF DIrectory
APIBF DIrectoryAPIBF DIrectory
APIBF DIrectory
 
Virtual Trip
Virtual TripVirtual Trip
Virtual Trip
 
Cenit 3 Grupo Frial
Cenit 3 Grupo FrialCenit 3 Grupo Frial
Cenit 3 Grupo Frial
 
TechShanghai2016 - 高可靠性PCB – 从设计到制造
TechShanghai2016 - 高可靠性PCB – 从设计到制造TechShanghai2016 - 高可靠性PCB – 从设计到制造
TechShanghai2016 - 高可靠性PCB – 从设计到制造
 
About Euroconsult
About EuroconsultAbout Euroconsult
About Euroconsult
 
NBN Living Mexico
NBN Living MexicoNBN Living Mexico
NBN Living Mexico
 
France
FranceFrance
France
 
BECARIOS MAESTRÍA EN CIENCIAS DE LA EDUCACIÓN 2014
BECARIOS MAESTRÍA EN CIENCIAS DE LA EDUCACIÓN 2014BECARIOS MAESTRÍA EN CIENCIAS DE LA EDUCACIÓN 2014
BECARIOS MAESTRÍA EN CIENCIAS DE LA EDUCACIÓN 2014
 
2002 i cartografía postal [aranaz]
2002 i cartografía postal [aranaz]2002 i cartografía postal [aranaz]
2002 i cartografía postal [aranaz]
 
Estudio cuantitativo del grupo 28
 Estudio cuantitativo del grupo 28 Estudio cuantitativo del grupo 28
Estudio cuantitativo del grupo 28
 
igus® : Câbles Chainflex®
igus® : Câbles Chainflex®  igus® : Câbles Chainflex®
igus® : Câbles Chainflex®
 
particiones
particionesparticiones
particiones
 
ProNatura
ProNaturaProNatura
ProNatura
 
Made In Norway? Hvordan roboter, 3D-printere og digitalisering gir nye muligh...
Made In Norway? Hvordan roboter, 3D-printere og digitalisering gir nye muligh...Made In Norway? Hvordan roboter, 3D-printere og digitalisering gir nye muligh...
Made In Norway? Hvordan roboter, 3D-printere og digitalisering gir nye muligh...
 
Planifccion
PlanifccionPlanifccion
Planifccion
 

Similar to hadoop事例紹介

Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprisesnvvrajesh
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoopDataWorks Summit
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)npinto
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたmoai kids
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Gavin Heavyside
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingSam Ng
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!DataWorks Summit
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Toshihiro Suzuki
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BIDenny Lee
 
Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.Michael Zhang
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Cloudera, Inc.
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataEnkitec
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoopfvanvollenhoven
 

Similar to hadoop事例紹介 (20)

Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoop
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきました
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadata
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 

More from OSSラボ株式会社

ジョブストリーム紹介資料
ジョブストリーム紹介資料ジョブストリーム紹介資料
ジョブストリーム紹介資料OSSラボ株式会社
 
Site Reliability Engineering (SRE)を可能にするOpenPIEのご紹介
Site Reliability Engineering (SRE)を可能にするOpenPIEのご紹介Site Reliability Engineering (SRE)を可能にするOpenPIEのご紹介
Site Reliability Engineering (SRE)を可能にするOpenPIEのご紹介OSSラボ株式会社
 
オープンソースNW監視ツールのご紹介
オープンソースNW監視ツールのご紹介オープンソースNW監視ツールのご紹介
オープンソースNW監視ツールのご紹介OSSラボ株式会社
 
CMDBuildを中心とした運用管理自動化基盤OpenPIEの事例紹介
CMDBuildを中心とした運用管理自動化基盤OpenPIEの事例紹介CMDBuildを中心とした運用管理自動化基盤OpenPIEの事例紹介
CMDBuildを中心とした運用管理自動化基盤OpenPIEの事例紹介OSSラボ株式会社
 
「今、ヨーロッパのオープンソースがアツい!」 クラウドの構成管理を自動化する基盤CMDBuild
「今、ヨーロッパのオープンソースがアツい!」クラウドの構成管理を自動化する基盤CMDBuild「今、ヨーロッパのオープンソースがアツい!」クラウドの構成管理を自動化する基盤CMDBuild
「今、ヨーロッパのオープンソースがアツい!」 クラウドの構成管理を自動化する基盤CMDBuildOSSラボ株式会社
 
Zabbix監視運用業務の自動化事例
Zabbix監視運用業務の自動化事例Zabbix監視運用業務の自動化事例
Zabbix監視運用業務の自動化事例OSSラボ株式会社
 

More from OSSラボ株式会社 (20)

220523JS7.pdf
220523JS7.pdf220523JS7.pdf
220523JS7.pdf
 
JS7 JobScheduler プレビュー
JS7 JobScheduler プレビューJS7 JobScheduler プレビュー
JS7 JobScheduler プレビュー
 
201023 jobscheduler os_cfall
201023 jobscheduler os_cfall201023 jobscheduler os_cfall
201023 jobscheduler os_cfall
 
ジョブストリーム紹介資料
ジョブストリーム紹介資料ジョブストリーム紹介資料
ジョブストリーム紹介資料
 
191010 opie2
191010 opie2191010 opie2
191010 opie2
 
CMDBuild V.3 update [Japanese]
CMDBuild V.3 update [Japanese]CMDBuild V.3 update [Japanese]
CMDBuild V.3 update [Japanese]
 
180729 jtf open-audit
180729 jtf open-audit180729 jtf open-audit
180729 jtf open-audit
 
170827 jtf garafana
170827 jtf garafana170827 jtf garafana
170827 jtf garafana
 
NMIS overview
NMIS overviewNMIS overview
NMIS overview
 
JobSchedulerアップデート2016
JobSchedulerアップデート2016JobSchedulerアップデート2016
JobSchedulerアップデート2016
 
Site Reliability Engineering (SRE)を可能にするOpenPIEのご紹介
Site Reliability Engineering (SRE)を可能にするOpenPIEのご紹介Site Reliability Engineering (SRE)を可能にするOpenPIEのご紹介
Site Reliability Engineering (SRE)を可能にするOpenPIEのご紹介
 
160901 osce2016sre
160901 osce2016sre160901 osce2016sre
160901 osce2016sre
 
160724 jtf2016sre
160724 jtf2016sre160724 jtf2016sre
160724 jtf2016sre
 
オープンソースNW監視ツールのご紹介
オープンソースNW監視ツールのご紹介オープンソースNW監視ツールのご紹介
オープンソースNW監視ツールのご紹介
 
Ansible2.0と実用例
Ansible2.0と実用例Ansible2.0と実用例
Ansible2.0と実用例
 
CMDBuildを中心とした運用管理自動化基盤OpenPIEの事例紹介
CMDBuildを中心とした運用管理自動化基盤OpenPIEの事例紹介CMDBuildを中心とした運用管理自動化基盤OpenPIEの事例紹介
CMDBuildを中心とした運用管理自動化基盤OpenPIEの事例紹介
 
「今、ヨーロッパのオープンソースがアツい!」 クラウドの構成管理を自動化する基盤CMDBuild
「今、ヨーロッパのオープンソースがアツい!」クラウドの構成管理を自動化する基盤CMDBuild「今、ヨーロッパのオープンソースがアツい!」クラウドの構成管理を自動化する基盤CMDBuild
「今、ヨーロッパのオープンソースがアツい!」 クラウドの構成管理を自動化する基盤CMDBuild
 
150726cmdbuild jtf2015
150726cmdbuild jtf2015150726cmdbuild jtf2015
150726cmdbuild jtf2015
 
CMDBuild Ready2Use紹介資料
CMDBuild Ready2Use紹介資料CMDBuild Ready2Use紹介資料
CMDBuild Ready2Use紹介資料
 
Zabbix監視運用業務の自動化事例
Zabbix監視運用業務の自動化事例Zabbix監視運用業務の自動化事例
Zabbix監視運用業務の自動化事例
 

Recently uploaded

Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 

Recently uploaded (20)

Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 

hadoop事例紹介

  • 1. 2010/5/20 OSS OSS Laboratories Inc.! http://www.ossl.co.jp Mail: funai@ossl.co.jp Twitter: http://twitter.com/satoruf LinkedIn: http://jp.linkedin.com/in/satorufunai/ja 1 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved
  • 2. •  OSS •  Apache •  Google !  GFS (Google File System) : HDFS (Hadoop Distributed File System) !  Google MapReduce : Hadoop MapReduce !  Google Chubby : Hadoop Zookeeper !  DSL Google Sawzall : Hadoop Pig !  Google BigTable : Hadoop Hbase !  Google ? : Hadoop Hive •  •  •  Yahoo! Facebook Amazon China Mobile VISA JP Morgan Chase •  UFJ NTT •  ACID Atomic Consistent Isolated Durable BASE Basically Available Soft-State Eventual Consistency •  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 2
  • 3. Apache Hadoop ETL Tools BI Reporting RDBMS Pig (Data Flow) Hive (SQL) Sqoop Avro (Serialization) (Coordination) MapReduce (Job Scheduling/Execution System) Zookeepr HBase (key-value store) (Streaming/Pipes APIs) HDFS (Hadoop Distributed File System)
  • 4. HDFS: Hadoop Distributed File System HDFS 64MB
  • 6. Hadoop Business Intelligence Interactive Application OLAP Data Mart OLTP Data Store Engineers Hadoop: Storage and Batch Processing ETL/sqoop
  • 7. Hadoop !  : !  2x Quad Core Nehalems !  24GB !  12 * 1TB SATA (JBOD , RAID ) !  1 Gigabit Ethernet !  : !  HDFS : !  ! reserved for temp shuffle space, which leaves 9TB/node !  3 way replication leads to 3TB effective HDFS space/node !  But assuming 7x compression that becomes ~ 20TB/node TB :2 5 /TB
  • 8. Yahoo! •  Hadoop •  25,000 82PB Hadoop •  4,000 64TB 16PB 32,000 •  500 •  SearchAssistTM 26 20 •  1,500 1TB 62 •  3,700 1PB 16 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 8
  • 9. Facebook •  200 Hive DWH Hadoop •  +12TB/ •  135TB/ •  1,050 32TB 12.5PB 4,800 •  .*/"& !"#$ %)&*#"$ ($ %"&'"&($ +*,-*"&$ 5*'"$ %)&*#"657,118$ &"8/*)731 =,A1)$5*'"657,118$ 9/2(:"&$ 4$ 9/2(:"&$ Node = 0&1,2)314$5*'"657,118$ Disks Disks Disks Disks Disks Disks DataNode ;&7)/"$ .","&7:",$ Node Node Node Node Node Node + <=9$ 9/2(:"&$ Map-Reduce +>%?@$ 1 Gigabit 4 Gigabit Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 9
  • 10. VISA •  2 Hadoop 340TB •  Hadoop #1 ~40Tb / 42 node •  Hadoop #1 ~300Tb / 28 node •  Hadoop ( ) •  ( ) Hadoop IP •  2 7 3000 36TB •  1 Hadoop 13 Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 10
  • 11. •  5 •  CMCC CDR 1 5TB~9TB 2,000 1 300GB •  BC-PDM(Big Cloud based Parallel Data Mining) •  Hadoop HDFS Hyper-DFS Hadoop •  16 •  ETL 12 16 •  10 50 •  3 7 •  Hadoop 256 Hadoop Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 11
  • 12. JP •  Hadoop •  PC (RDBMS) •  RDBMS SAN/NAS Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 12
  • 13. •  Hadoop •  4,000 2,000 GB •  GB x •  •  •  •  150% •  Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 15
  • 14. •  COOKPAD: 3.9 816 64 30 4 1 •  •  Amazon EC2 50 Hadoop •  •  http://business.nikkeibp.co.jp/article/tech/20100416/214016/ Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 16
  • 15. http://www.cloudera.com/ •  Hadoop •  Cloudera Mike Olson Oracle SleepycatSoftware CEO) Christophe Bisciglia Google Dr.Amr Awadallah Yahoo! VivaSmart Jeff Hammerbacher Facebook •  Cloudera Diane Greene VMware CEO Mike Abbott Palm CaterinaFake Flickr Dr. Qi Lu Microsoft Yahoo! MartenMickos MySQL CEO Jeff Weiner LinkedIn Yahoo! Gideon Yu Facebook CFO YouTube CFO •  Yahoo! Facebook OpenPDC Codeplex Hadoop Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 17
  • 16. Pentaho + Hadoop •  2010/7 •  Hadoop BI Hive Hadoop DFS Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 18
  • 17. IBM InfoSphere BigInsights •  Apache Hadoop BigInsights Core Web BigSheets 2 •  BigSheets BigSheets BigInsights Core •  BigSheet Copyright 2010(C) OSS Laboratories Inc. All Rights Reserved 19