SlideShare a Scribd company logo
1 of 26
Download to read offline
7/12/14	
  
!  Prepared	
  for:	
  
v Orange	
  County	
  Java	
  Users	
  Group	
  
	
  
!  Presented	
  by:	
  
v “Big	
  Data	
  Joe”	
  Rossi	
  
v @bigdatajoerossi	
  
Hadoop	
  
Past,	
  Present	
  and	
  Future	
  
Roadmap	
  
~1	
  hour	
  
1-­‐	
  What	
  Makes	
  Up	
  Hadoop	
  1.x?	
  
2-­‐	
  What’s	
  New	
  In	
  Hadoop	
  2.x?	
  
3-­‐	
  The	
  Future	
  Of	
  Hadoop	
  …	
  
What	
  Makes	
  Up	
  Hadoop	
  1.x?	
  
Hadoop	
  1.0:	
  HDFS	
  +	
  MapReduce	
  
NameNode	
  
DataNode	
  /	
  TaskTracker	
   DataNode	
  /	
  TaskTracker	
  
DataNode	
  /	
  TaskTracker	
   DataNode	
  /	
  TaskTracker	
  
JobTracker	
  
Client	
  
1-­‐1	
  
1-­‐2	
  1-­‐3	
  
Hadoop	
  1.0:	
  HDFS	
  +	
  MapReduce	
  
NameNode	
  
DataNode	
  /	
  TaskTracker	
   DataNode	
  /	
  TaskTracker	
  
DataNode	
  /	
  TaskTracker	
   DataNode	
  /	
  TaskTracker	
  
JobTracker	
  
Client	
  
1-­‐1	
   1-­‐2	
  
1-­‐3	
  
Reduce	
  Map	
  
2-­‐1	
   3-­‐2	
   3-­‐3	
   4-­‐1	
  
2-­‐3	
   4-­‐2	
   2-­‐2	
   3-­‐1	
   4-­‐3	
  
Reduce	
  Map	
  
MapReduce	
  v1	
  LimitaTons	
  
Scalability	
  
Maximum	
  cluster	
  size	
  is	
  4,000	
  nodes	
  and	
  maximum	
  concurrent	
  tasks	
  is	
  40,000	
  
Availability	
  
JobTracker	
  failure	
  kills	
  all	
  queued	
  and	
  running	
  jobs	
  
Resources	
  ParVVoned	
  into	
  Map	
  and	
  Reduce	
  
Hard	
  parTToning	
  of	
  Map	
  and	
  Reduce	
  slots	
  led	
  to	
  low	
  resource	
  uVlizaVon	
  
No	
  Support	
  for	
  Alternate	
  Paradigms	
  /	
  Services	
  
Only	
  MapReduce	
  batch	
  jobs,	
  nothing	
  else	
  
HADOOP	
  1.0	
  
Single	
  Use	
  System	
  
Batch	
  Apps	
  
Apache	
  Hadoop	
  1.0:	
  Single	
  Use	
  System	
  
HDFS	
  
(redundant,	
  reliable	
  storage)	
  
MapReduce	
  
(cluster	
  resource	
  management	
  and	
  data	
  
processing)	
  
Pig	
   Hive	
  
What’s	
  New	
  In	
  Hadoop	
  2.x?	
  
YARN	
  Replaces	
  
MapReduce	
  
Yet	
  Another	
  Resource	
  NegoVator	
  
YARN	
  
YARN	
  will	
  be	
  the	
  de-­‐facto	
  distributed	
  
operaVng	
  system	
  for	
  Big	
  Data	
  
Store	
  DATA	
  in	
  one	
  place	
  
YARN:	
  Taking	
  Hadoop	
  Beyond	
  Batch	
  
Interact	
  with	
  that	
  data	
  in	
  MULTIPLE	
  WAYS	
  
with	
  Predictable	
  Performance	
  and	
  Quality	
  of	
  Service	
  
	
  	
  	
  	
  	
  	
  ApplicaTons	
  Run	
  NaTvely	
  IN	
  Hadoop	
  
HDFS2	
  
(redundant,	
  reliable	
  storage)	
  
YARN	
  
(cluster	
  resource	
  management)	
  
BATCH	
  
(MapReduce)	
  
INTERACTIVE	
  
(Tez)	
  
ONLINE	
  
(HBase)	
  
STREAMING	
  
(DataTorrent)	
  
GRAPH	
  
(Giraph)	
  
Running	
  all	
  on	
  the	
  same	
  Hadoop	
  cluster	
  to	
  give	
  
applicaVons	
  access	
  to	
  all	
  the	
  same	
  source	
  data!	
  
YARN:	
  ApplicaTons	
  
MapReduce	
  v2	
  
Stream	
  Processing	
  
Master-­‐Worker	
  Online	
  
In-­‐Memory	
  
Apache	
  Storm	
  
2010	
  
	
  
2011	
  
	
  
2012	
  
	
  
2013	
  
	
  
2014	
  
	
  
Today	
  
YARN:	
  Moving	
  Quickly	
  
Conceived	
  at	
  Yahoo!	
  
Alpha	
  Releases	
  –	
  2.0	
  
Beta	
  Releases	
  –	
  2.1	
  
GA	
  Released	
  –	
  2.2	
  
100,000+	
  nodes,	
  400,000+	
  jobs	
  daily	
  
10	
  million+	
  hours	
  of	
  compute	
  daily	
  
Version	
  2.3	
  
Version	
  2.4	
  
YARN:	
  Dr.	
  Evil	
  Approved	
  
YARN:	
  How	
  It	
  Works	
  
ResourceManager	
  
NodeManager	
  
ApplicaVonMaster	
  
NodeManager	
  
NodeManager	
   NodeManager	
  
Scheduler	
  
Container	
  
Container	
   Container	
  
Client	
  
YARN:	
  What	
  Has	
  Changed?	
  
YARN	
   MRv1	
  
RM	
  
ResourceManager	
  
AM	
  ApplicaVonMaster	
  
JT	
  
JobTracker	
  
Scheduler	
   Scheduler	
  
NM	
  NodeManager	
  
TT	
  TaskTracker	
  
Container	
  
Map	
  
Reduce	
  
ResourceManager	
  
Scheduler	
  
JobTracker	
  
Scheduler	
  
NodeManager	
  
ApplicaVonMaster	
  
TaskTracker	
  
Map	
   Reduce	
  
NodeManager	
  
Container	
   Container	
  
TaskTracker	
  
Map	
   Reduce	
  
!  Scale	
  
!  New	
  programming	
  models	
  
and	
  services	
  
!  Improved	
  cluster	
  uVlizaVon	
  
!  Agility	
  
!  Backwards	
  compaVble	
  with	
  
MapReduce	
  v1	
  
!  Mixed	
  workloads	
  on	
  the	
  
same	
  source	
  of	
  data	
  
6	
  Benefits	
  of	
  YARN	
  
The	
  Future	
  of	
  Hadoop	
  
Projects	
  and	
  Roadmap	
  
Speed	
  
Deliver	
  interacTve	
  query	
  performance.	
  
SQL	
  on	
  Hadoop	
  
SQL	
  
Support	
  array	
  of	
  SQL	
  semanTcs	
  for	
  analyTc	
  
applicaTons	
  running	
  against	
  Hadoop.	
  
Scale	
  
SQL	
  interface	
  to	
  Hadoop	
  designed	
  for	
  queries	
  
that	
  scale	
  from	
  Terabytes	
  to	
  Petabytes	
  
	
  
Hive	
  on	
  Apache	
  Tez	
  
Hortonworks	
  
Next	
  Gen	
  SQL	
  on	
  Hadoop	
  
Hive	
  on	
  Apache	
  Spark	
  
Cloudera	
  
Cloudera	
  Impala	
  
Cloudera	
  
	
  
Apache	
  Drill	
  
MapR	
  
Dynamic	
  Scaling	
  
On-­‐demand	
  cluster	
  size.	
  Increase	
  and	
  decrease	
  
the	
  size	
  with	
  load.	
  
HOYA:	
  HBase	
  (NoSQL)	
  on	
  YARN	
  
Easier	
  Deployment	
  
APIs	
  to	
  create,	
  start,	
  stop	
  and	
  delete	
  HBase	
  
clusters.	
  
Availability	
  
Recover	
  from	
  Region	
  Server	
  loss	
  with	
  a	
  new	
  
container.	
  
Machine	
  Learning	
  
Framework	
  well	
  suited	
  for	
  building	
  machine	
  
learning	
  jobs.	
  
Microsog	
  REEF	
  
Scalable	
  /	
  Fault	
  Tolerant	
  
Makes	
  it	
  easy	
  to	
  implement	
  scalable,	
  fault-­‐
tolerant	
  runTme	
  environments	
  for	
  a	
  range	
  of	
  
computaTonal	
  models.	
  
Maintain	
  State	
  
Users	
  can	
  build	
  jobs	
  that	
  uTlize	
  data	
  from	
  
where	
  it’s	
  needed	
  and	
  also	
  maintain	
  state	
  ager	
  
jobs	
  are	
  done.	
  
Retainable	
  
Evaluator	
  
ExecuTon	
  
Framework	
  
Heterogeneous	
  Storages	
  in	
  HDFS	
  
NameNode	
  
Storage	
  
NameNode	
  
SATA	
   SSD	
  
Fusion	
  
IO	
  
 
	
  
!  Apache	
  Hadoop	
  2.5	
  
v NodeManager	
  Restart	
  w/o	
  disrupTon	
  
v Dynamic	
  Resource	
  ConfiguraTon	
  
	
  
!  Apache	
  Hadoop	
  2.6	
  
v Memory	
  As	
  Storage	
  Tier	
  
v Support	
  For	
  Docker	
  Containers	
  
Hadoop	
  Roadmap	
  
Q3	
  2014	
  
Q4	
  2014	
  
I	
  Know	
  You	
  Have	
  
QuesVons	
  …	
  
No	
  such	
  thing	
  as	
  a	
  stupid	
  quesVon.	
  
Hadoop:	
  Past,	
  Present	
  and	
  Future	
  
OC	
  Big	
  Data	
  Meetup	
  	
  
One	
  Last	
  Thing	
  …	
  
meetup.com/ocbigdata	
  
3rd	
  Wednesday	
  Of	
  The	
  Month	
  
Next:	
  July	
  16st	
  @	
  5:45P	
  
Thank	
  You!	
  
Hadoop:	
  Past,	
  Present	
  and	
  Future	
  
Big	
  Data	
  Joe	
  Rossi	
  
hkp://bigdatajoe.io/	
  
@bigdatajoerossi	
  

More Related Content

What's hot

YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerVertiCloud Inc
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceHortonworks
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Big Data Joe™ Rossi
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopVigen Sahakyan
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnMike Frampton
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 

What's hot (20)

YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
10c introduction
10c introduction10c introduction
10c introduction
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduce
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 

Viewers also liked

Effective Hadoop Cluster Management - Impetus Webinar
Effective Hadoop Cluster Management - Impetus WebinarEffective Hadoop Cluster Management - Impetus Webinar
Effective Hadoop Cluster Management - Impetus WebinarImpetus Technologies
 
Javaone 2013 moscow gradle english
Javaone 2013 moscow gradle   englishJavaone 2013 moscow gradle   english
Javaone 2013 moscow gradle englishEvgeny Borisov
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Gradle - time for a new build
Gradle - time for a new buildGradle - time for a new build
Gradle - time for a new buildIgor Khotin
 
AngularJS for Java Developers
AngularJS for Java DevelopersAngularJS for Java Developers
AngularJS for Java DevelopersLoc Nguyen
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Viewers also liked (20)

Effective Hadoop Cluster Management - Impetus Webinar
Effective Hadoop Cluster Management - Impetus WebinarEffective Hadoop Cluster Management - Impetus Webinar
Effective Hadoop Cluster Management - Impetus Webinar
 
Javaone 2013 moscow gradle english
Javaone 2013 moscow gradle   englishJavaone 2013 moscow gradle   english
Javaone 2013 moscow gradle english
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Gradle - time for a new build
Gradle - time for a new buildGradle - time for a new build
Gradle - time for a new build
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
AngularJS for Java Developers
AngularJS for Java DevelopersAngularJS for Java Developers
AngularJS for Java Developers
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar to Hadoop - Past, Present and Future - v1.2

Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Big Data Joe™ Rossi
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
20140202 fosdem-nosql-devroom-hadoop-yarn
20140202 fosdem-nosql-devroom-hadoop-yarn20140202 fosdem-nosql-devroom-hadoop-yarn
20140202 fosdem-nosql-devroom-hadoop-yarnDatalayer
 

Similar to Hadoop - Past, Present and Future - v1.2 (20)

Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
 
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Big data
Big dataBig data
Big data
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
20140202 fosdem-nosql-devroom-hadoop-yarn
20140202 fosdem-nosql-devroom-hadoop-yarn20140202 fosdem-nosql-devroom-hadoop-yarn
20140202 fosdem-nosql-devroom-hadoop-yarn
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 

More from Big Data Joe™ Rossi

OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/RiakOC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/RiakBig Data Joe™ Rossi
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleBig Data Joe™ Rossi
 
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo LogicOC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo LogicBig Data Joe™ Rossi
 

More from Big Data Joe™ Rossi (6)

OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/RiakOC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
 
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo LogicOC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
 

Recently uploaded

Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopThinkInnovation
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Cyclistic Memberships Data Analysis Project
Cyclistic Memberships Data Analysis ProjectCyclistic Memberships Data Analysis Project
Cyclistic Memberships Data Analysis Projectdanielbell861
 

Recently uploaded (13)

Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI Desktop
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Cyclistic Memberships Data Analysis Project
Cyclistic Memberships Data Analysis ProjectCyclistic Memberships Data Analysis Project
Cyclistic Memberships Data Analysis Project
 

Hadoop - Past, Present and Future - v1.2

  • 1. 7/12/14   !  Prepared  for:   v Orange  County  Java  Users  Group     !  Presented  by:   v “Big  Data  Joe”  Rossi   v @bigdatajoerossi   Hadoop   Past,  Present  and  Future  
  • 2. Roadmap   ~1  hour   1-­‐  What  Makes  Up  Hadoop  1.x?   2-­‐  What’s  New  In  Hadoop  2.x?   3-­‐  The  Future  Of  Hadoop  …  
  • 3. What  Makes  Up  Hadoop  1.x?  
  • 4. Hadoop  1.0:  HDFS  +  MapReduce   NameNode   DataNode  /  TaskTracker   DataNode  /  TaskTracker   DataNode  /  TaskTracker   DataNode  /  TaskTracker   JobTracker   Client   1-­‐1   1-­‐2  1-­‐3  
  • 5. Hadoop  1.0:  HDFS  +  MapReduce   NameNode   DataNode  /  TaskTracker   DataNode  /  TaskTracker   DataNode  /  TaskTracker   DataNode  /  TaskTracker   JobTracker   Client   1-­‐1   1-­‐2   1-­‐3   Reduce  Map   2-­‐1   3-­‐2   3-­‐3   4-­‐1   2-­‐3   4-­‐2   2-­‐2   3-­‐1   4-­‐3   Reduce  Map  
  • 6. MapReduce  v1  LimitaTons   Scalability   Maximum  cluster  size  is  4,000  nodes  and  maximum  concurrent  tasks  is  40,000   Availability   JobTracker  failure  kills  all  queued  and  running  jobs   Resources  ParVVoned  into  Map  and  Reduce   Hard  parTToning  of  Map  and  Reduce  slots  led  to  low  resource  uVlizaVon   No  Support  for  Alternate  Paradigms  /  Services   Only  MapReduce  batch  jobs,  nothing  else  
  • 7. HADOOP  1.0   Single  Use  System   Batch  Apps   Apache  Hadoop  1.0:  Single  Use  System   HDFS   (redundant,  reliable  storage)   MapReduce   (cluster  resource  management  and  data   processing)   Pig   Hive  
  • 8. What’s  New  In  Hadoop  2.x?  
  • 9. YARN  Replaces   MapReduce   Yet  Another  Resource  NegoVator   YARN   YARN  will  be  the  de-­‐facto  distributed   operaVng  system  for  Big  Data  
  • 10. Store  DATA  in  one  place   YARN:  Taking  Hadoop  Beyond  Batch   Interact  with  that  data  in  MULTIPLE  WAYS   with  Predictable  Performance  and  Quality  of  Service              ApplicaTons  Run  NaTvely  IN  Hadoop   HDFS2   (redundant,  reliable  storage)   YARN   (cluster  resource  management)   BATCH   (MapReduce)   INTERACTIVE   (Tez)   ONLINE   (HBase)   STREAMING   (DataTorrent)   GRAPH   (Giraph)  
  • 11. Running  all  on  the  same  Hadoop  cluster  to  give   applicaVons  access  to  all  the  same  source  data!   YARN:  ApplicaTons   MapReduce  v2   Stream  Processing   Master-­‐Worker  Online   In-­‐Memory   Apache  Storm  
  • 12. 2010     2011     2012     2013     2014     Today   YARN:  Moving  Quickly   Conceived  at  Yahoo!   Alpha  Releases  –  2.0   Beta  Releases  –  2.1   GA  Released  –  2.2   100,000+  nodes,  400,000+  jobs  daily   10  million+  hours  of  compute  daily   Version  2.3   Version  2.4  
  • 13. YARN:  Dr.  Evil  Approved  
  • 14. YARN:  How  It  Works   ResourceManager   NodeManager   ApplicaVonMaster   NodeManager   NodeManager   NodeManager   Scheduler   Container   Container   Container   Client  
  • 15. YARN:  What  Has  Changed?   YARN   MRv1   RM   ResourceManager   AM  ApplicaVonMaster   JT   JobTracker   Scheduler   Scheduler   NM  NodeManager   TT  TaskTracker   Container   Map   Reduce   ResourceManager   Scheduler   JobTracker   Scheduler   NodeManager   ApplicaVonMaster   TaskTracker   Map   Reduce   NodeManager   Container   Container   TaskTracker   Map   Reduce  
  • 16. !  Scale   !  New  programming  models   and  services   !  Improved  cluster  uVlizaVon   !  Agility   !  Backwards  compaVble  with   MapReduce  v1   !  Mixed  workloads  on  the   same  source  of  data   6  Benefits  of  YARN  
  • 17. The  Future  of  Hadoop   Projects  and  Roadmap  
  • 18. Speed   Deliver  interacTve  query  performance.   SQL  on  Hadoop   SQL   Support  array  of  SQL  semanTcs  for  analyTc   applicaTons  running  against  Hadoop.   Scale   SQL  interface  to  Hadoop  designed  for  queries   that  scale  from  Terabytes  to  Petabytes    
  • 19. Hive  on  Apache  Tez   Hortonworks   Next  Gen  SQL  on  Hadoop   Hive  on  Apache  Spark   Cloudera   Cloudera  Impala   Cloudera     Apache  Drill   MapR  
  • 20. Dynamic  Scaling   On-­‐demand  cluster  size.  Increase  and  decrease   the  size  with  load.   HOYA:  HBase  (NoSQL)  on  YARN   Easier  Deployment   APIs  to  create,  start,  stop  and  delete  HBase   clusters.   Availability   Recover  from  Region  Server  loss  with  a  new   container.  
  • 21. Machine  Learning   Framework  well  suited  for  building  machine   learning  jobs.   Microsog  REEF   Scalable  /  Fault  Tolerant   Makes  it  easy  to  implement  scalable,  fault-­‐ tolerant  runTme  environments  for  a  range  of   computaTonal  models.   Maintain  State   Users  can  build  jobs  that  uTlize  data  from   where  it’s  needed  and  also  maintain  state  ager   jobs  are  done.   Retainable   Evaluator   ExecuTon   Framework  
  • 22. Heterogeneous  Storages  in  HDFS   NameNode   Storage   NameNode   SATA   SSD   Fusion   IO  
  • 23.     !  Apache  Hadoop  2.5   v NodeManager  Restart  w/o  disrupTon   v Dynamic  Resource  ConfiguraTon     !  Apache  Hadoop  2.6   v Memory  As  Storage  Tier   v Support  For  Docker  Containers   Hadoop  Roadmap   Q3  2014   Q4  2014  
  • 24. I  Know  You  Have   QuesVons  …   No  such  thing  as  a  stupid  quesVon.   Hadoop:  Past,  Present  and  Future  
  • 25. OC  Big  Data  Meetup     One  Last  Thing  …   meetup.com/ocbigdata   3rd  Wednesday  Of  The  Month   Next:  July  16st  @  5:45P  
  • 26. Thank  You!   Hadoop:  Past,  Present  and  Future   Big  Data  Joe  Rossi   hkp://bigdatajoe.io/   @bigdatajoerossi