SlideShare a Scribd company logo
1 of 58
Download to read offline
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
{

    "_id" : ObjectId("4dcd3ebc9278000000005158"),

    "timestamp" : ISODate("2011-05-13T14:22:46.777Z"),

    "binary" : BinData(0,""),

    "string" : "abc",

    "number" : 3,

    "subobj" : {"subA": 1, "subB": 2 },

    "array" : [1, 2, 3],

    "dbref" : [_id1, _id2, _id3]

                        padding

}
{   db.coll.find({"string": "abc"});
db.coll.find({ "string" : /^a.*$/i });
    "_id" : ObjectId("4dcd3ebc9278000000005158"),

    "timestamp" : ISODate("2011-05-13T14:22:46.777Z"),
                   db.coll.find({"subobj.subA": 1});
                db.coll.find({"subobj.subB": {$exists: true} });
    "binary" : BinData(0,""),

    "string" : "abc",              db.coll.find({"number": 3});
                               db.coll.find({"number": {$gt: 1}});
    "number" : 3,

    "subobj" : {"subA": 1, "subB": 2 },

    "array" : [1, 2, 3],
                         db.coll.find({"array": {$all:[1, 2]} });
    "dbref" : [_id1, _id2, _id3]
                        db.coll.find({"array": {$in:[2, 4, 6]} });
                             padding

}
{

    "_id" : ObjectId("4dcd3ebc9278000000005158"),

    "timestamp" : ISODate("2011-05-13T14:22:46.777Z"),
          { $set : {"string": "def"} }

    "binary" : BinData(0,""), { $inc : {"number": 1} }

    "string" : "def",
                          { $pull : {"subobj": {"subB": 2 } } }
    "number" : 4,

    "subobj" : {"subA": 1, "subB": 2 },

    "array" : [1, 2, 3, 4, 5, 6],

    "dbref"$addToSet : { "array" : { $each : [ 4 , 5 , 6 ] } } }
         { : [_id1, _id2, _id3]


    "newkey" : "In-place"

}                              { $set : {"newkey": "In-place"} }
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
ScientificPython
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
def mapper(key, value):
   for word in value.split(): yield word,1
def reducer(key, values):
   yield key,sum(values)
if __name__ == "__main__":
   import dumbo
   dumbo.run(mapper, reducer)



dumbo start wordcount.py 
       -hadoop /path/to/hadoop 
       -input wc_input.txt 
       -output wc_output
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
[2011-07-01 12:01:48,447]
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
db.collection.insert(
 {hour:0,
   userId:”1234”,
   actionType:”login”,}
);
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
m = function(){
     this.tags.forEach{
          function(z) {
              emit(z, {count: 1});
          }
     };
};
r = function(key, values) {
     var total=0;
     for (i=0, i<values.length, i++)
          total += values[i].count;
     return { count : total };
}
res=db.things.mapReduce(m,!r);
#                              finalize
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
Examples
                Conclusions and Future Work


 Party Solutions




                                    Motivation
                                  Architecture
                                    Examples
                  Conclusions and Future Work


ummary of Features


 Hadoop-based: same limitations as Streaming (Dumbo) and
                       Streaming Jython Pydoop
 Jython (Happy), except for ease of use
           C/C++ Ext       Yes       No     Yes
 Other implementations: good if you have your own cluster
         Standard Lib      Full     Partial   Full
     Hadoop is the most widespread implementation
            MR API         No*       Full   Partial
           Java-like FW                   No           Yes            Yes
               HDFS                       No
                                  Leo, Zanetti
                                                       Yes            Yes
                                                 Pydoop: a Python MapReduce and HDFS API for Hadoop



     (*) you can only write the map and reduce parts as executable scripts.
Motivation
                               Architecture
                                 Examples
               Conclusions and Future Work


Hadoop Pipes



                                                      Communication with Java
                                                      framework via persistent
                                                      sockets
                                                      The C++ app provides a
                                                      factory used by the framework
                                                      to create MR components
                                                      Providing Mapper and
                                                      Reducer is mandatory




                               Leo, Zanetti   Pydoop: a Python MapReduce and HDFS API for Hadoop
Motivation
                              Architecture
                                Examples
              Conclusions and Future Work


Integration of Pydoop with C++


                                             Integration with Pipes:
                                                 Method calls flow from the
                                                 framework through the C++ and the
                                                 Pydoop API, ultimately reaching
                                                 user-defined methods
                                                 Results are wrapped by Boost and
                                                 returned to the framework
                                             Integration with HDFS:
                                                 Function calls initiated by Pydoop
                                                 Results wrapped and returned as
                                                 Python objects to the app
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
gawk '
 BEGIN{ reducenum='$REDUCE_NUM'; }
  { userid=$7; key=$8; }
  key ~ /a{GetLoginBonus}/ { incrby(userid,key,$9,a); next;}
  key ~ /a{SideJob}/          { incrby(userid,key,$11,a); next;}
  key ~ /a{CleanMyShop}/      { hincr(userid,key,$9,a); next; }
  key ~ /(GetAvatarPart|ChangeP|ChangeWakuwakuP|ChangeKonergy)/
                                { incrbydiff(userid,key,$9,a); next; }
 ...‘ $IN

# for reducer1 (such as “userid % reducenum == 0”)
# command userid key value
MULTI
HINCRBY 1111 a{ChangeGreed} 3
HINCRBY 1111 a{GianEvent} 7
HINCRBY 1111 a{TeamChallenge} 5
HINCRBY 2222 a{Battle} 3
HINCRBY 2222 a{ChangeMoney} 3
...
EXEC
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model

More Related Content

What's hot

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answerstechieguy85
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in SparkShiao-An Yuan
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
Implementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopImplementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopLamprini Koutsokera
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoopDavid Chiu
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud appsDavid Cunningham
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 

What's hot (20)

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
mesos-devoxx14
mesos-devoxx14mesos-devoxx14
mesos-devoxx14
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Implementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopImplementation of k means algorithm on Hadoop
Implementation of k means algorithm on Hadoop
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
Scala+data
Scala+dataScala+data
Scala+data
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud apps
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 

Viewers also liked

Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJSKyung Yeol Kim
 
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기KTH, 케이티하이텔
 
Meteor 0.3.6 Preview
Meteor 0.3.6 PreviewMeteor 0.3.6 Preview
Meteor 0.3.6 PreviewJuntai Park
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011rogerbodamer
 
React in Native Apps - Meetup React - 20150409
React in Native Apps - Meetup React - 20150409React in Native Apps - Meetup React - 20150409
React in Native Apps - Meetup React - 20150409Minko3D
 
Ionic adventures - Hybrid Mobile App Development rocks
Ionic adventures - Hybrid Mobile App Development rocksIonic adventures - Hybrid Mobile App Development rocks
Ionic adventures - Hybrid Mobile App Development rocksJuarez Filho
 
React JS and why it's awesome
React JS and why it's awesomeReact JS and why it's awesome
React JS and why it's awesomeAndrew Hull
 
System webpack-jspm
System webpack-jspmSystem webpack-jspm
System webpack-jspmJesse Warden
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB
 
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB
 
Mongo db and hadoop driving business insights - final
Mongo db and hadoop   driving business insights - finalMongo db and hadoop   driving business insights - final
Mongo db and hadoop driving business insights - finalMongoDB
 
Groovy overview, DSLs and ecosystem - Mars JUG - 2010
Groovy overview, DSLs and ecosystem - Mars JUG - 2010Groovy overview, DSLs and ecosystem - Mars JUG - 2010
Groovy overview, DSLs and ecosystem - Mars JUG - 2010Guillaume Laforge
 
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGroovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGuillaume Laforge
 
groovy DSLs from beginner to expert
groovy DSLs from beginner to expertgroovy DSLs from beginner to expert
groovy DSLs from beginner to expertPaul King
 
Sharding
ShardingSharding
ShardingMongoDB
 
Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기JangHyuk You
 

Viewers also liked (20)

Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJS
 
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
 
Meteor 0.3.6 Preview
Meteor 0.3.6 PreviewMeteor 0.3.6 Preview
Meteor 0.3.6 Preview
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011
 
React in Native Apps - Meetup React - 20150409
React in Native Apps - Meetup React - 20150409React in Native Apps - Meetup React - 20150409
React in Native Apps - Meetup React - 20150409
 
Ionic adventures - Hybrid Mobile App Development rocks
Ionic adventures - Hybrid Mobile App Development rocksIonic adventures - Hybrid Mobile App Development rocks
Ionic adventures - Hybrid Mobile App Development rocks
 
React JS and why it's awesome
React JS and why it's awesomeReact JS and why it's awesome
React JS and why it's awesome
 
Grid FS
Grid FSGrid FS
Grid FS
 
Angular2 ecosystem
Angular2 ecosystemAngular2 ecosystem
Angular2 ecosystem
 
System webpack-jspm
System webpack-jspmSystem webpack-jspm
System webpack-jspm
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
 
Mongo db and hadoop driving business insights - final
Mongo db and hadoop   driving business insights - finalMongo db and hadoop   driving business insights - final
Mongo db and hadoop driving business insights - final
 
Groovy overview, DSLs and ecosystem - Mars JUG - 2010
Groovy overview, DSLs and ecosystem - Mars JUG - 2010Groovy overview, DSLs and ecosystem - Mars JUG - 2010
Groovy overview, DSLs and ecosystem - Mars JUG - 2010
 
Hadoop to spark-v2
Hadoop to spark-v2Hadoop to spark-v2
Hadoop to spark-v2
 
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGroovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
 
groovy DSLs from beginner to expert
groovy DSLs from beginner to expertgroovy DSLs from beginner to expert
groovy DSLs from beginner to expert
 
Sharding
ShardingSharding
Sharding
 
Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기
 

Similar to MongoDB & Hadoop: Flexible Hourly Batch Processing Model

Introduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training WebinarIntroduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training WebinarCloudera, Inc.
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training courseKamal A
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaTim Ellison
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Søren Lund
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBigData_Europe
 
Serverless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersServerless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersSashko Stubailo
 
Introduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google CloudIntroduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google Cloudwesley chun
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkDatascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkSequelGate
 

Similar to MongoDB & Hadoop: Flexible Hourly Batch Processing Model (20)

January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Introduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training WebinarIntroduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training Webinar
 
mapReduce.pptx
mapReduce.pptxmapReduce.pptx
mapReduce.pptx
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training course
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfish
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
 
Serverless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersServerless GraphQL for Product Developers
Serverless GraphQL for Product Developers
 
Introduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google CloudIntroduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google Cloud
 
Data Science
Data ScienceData Science
Data Science
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkDatascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
 

More from Takahiro Inoue

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTakahiro Inoue
 
トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングTakahiro Inoue
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Takahiro Inoue
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するTakahiro Inoue
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューションTakahiro Inoue
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方Takahiro Inoue
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータTakahiro Inoue
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612Takahiro Inoue
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)Takahiro Inoue
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜Takahiro Inoue
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Takahiro Inoue
 
Hadoop and the Data Scientist
Hadoop and the Data ScientistHadoop and the Data Scientist
Hadoop and the Data ScientistTakahiro Inoue
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big DataTakahiro Inoue
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsTakahiro Inoue
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to TinkerpopTakahiro Inoue
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4jTakahiro Inoue
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDBTakahiro Inoue
 
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Takahiro Inoue
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Takahiro Inoue
 

More from Takahiro Inoue (20)

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC Demo
 
トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティング
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解する
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータ
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!
 
Hadoop and the Data Scientist
Hadoop and the Data ScientistHadoop and the Data Scientist
Hadoop and the Data Scientist
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big Data
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB Plugins
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to Tinkerpop
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4j
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDB
 
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)
 
Advanced MongoDB #1
Advanced MongoDB #1Advanced MongoDB #1
Advanced MongoDB #1
 

Recently uploaded

NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 

Recently uploaded (20)

NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 

MongoDB & Hadoop: Flexible Hourly Batch Processing Model

  • 6. { "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), "binary" : BinData(0,""), "string" : "abc", "number" : 3, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3], "dbref" : [_id1, _id2, _id3] padding }
  • 7. { db.coll.find({"string": "abc"}); db.coll.find({ "string" : /^a.*$/i }); "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), db.coll.find({"subobj.subA": 1}); db.coll.find({"subobj.subB": {$exists: true} }); "binary" : BinData(0,""), "string" : "abc", db.coll.find({"number": 3}); db.coll.find({"number": {$gt: 1}}); "number" : 3, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3], db.coll.find({"array": {$all:[1, 2]} }); "dbref" : [_id1, _id2, _id3] db.coll.find({"array": {$in:[2, 4, 6]} }); padding }
  • 8. { "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), { $set : {"string": "def"} } "binary" : BinData(0,""), { $inc : {"number": 1} } "string" : "def", { $pull : {"subobj": {"subB": 2 } } } "number" : 4, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3, 4, 5, 6], "dbref"$addToSet : { "array" : { $each : [ 4 , 5 , 6 ] } } } { : [_id1, _id2, _id3] "newkey" : "In-place" } { $set : {"newkey": "In-place"} }
  • 21. def mapper(key, value): for word in value.split(): yield word,1 def reducer(key, values): yield key,sum(values) if __name__ == "__main__": import dumbo dumbo.run(mapper, reducer) dumbo start wordcount.py -hadoop /path/to/hadoop -input wc_input.txt -output wc_output
  • 40. db.collection.insert( {hour:0, userId:”1234”, actionType:”login”,} );
  • 42. m = function(){ this.tags.forEach{ function(z) { emit(z, {count: 1}); } }; }; r = function(key, values) { var total=0; for (i=0, i<values.length, i++) total += values[i].count; return { count : total }; } res=db.things.mapReduce(m,!r); # finalize
  • 49. Examples Conclusions and Future Work Party Solutions Motivation Architecture Examples Conclusions and Future Work ummary of Features Hadoop-based: same limitations as Streaming (Dumbo) and Streaming Jython Pydoop Jython (Happy), except for ease of use C/C++ Ext Yes No Yes Other implementations: good if you have your own cluster Standard Lib Full Partial Full Hadoop is the most widespread implementation MR API No* Full Partial Java-like FW No Yes Yes HDFS No Leo, Zanetti Yes Yes Pydoop: a Python MapReduce and HDFS API for Hadoop (*) you can only write the map and reduce parts as executable scripts.
  • 50. Motivation Architecture Examples Conclusions and Future Work Hadoop Pipes Communication with Java framework via persistent sockets The C++ app provides a factory used by the framework to create MR components Providing Mapper and Reducer is mandatory Leo, Zanetti Pydoop: a Python MapReduce and HDFS API for Hadoop
  • 51. Motivation Architecture Examples Conclusions and Future Work Integration of Pydoop with C++ Integration with Pipes: Method calls flow from the framework through the C++ and the Pydoop API, ultimately reaching user-defined methods Results are wrapped by Boost and returned to the framework Integration with HDFS: Function calls initiated by Pydoop Results wrapped and returned as Python objects to the app
  • 55. gawk ' BEGIN{ reducenum='$REDUCE_NUM'; } { userid=$7; key=$8; } key ~ /a{GetLoginBonus}/ { incrby(userid,key,$9,a); next;} key ~ /a{SideJob}/ { incrby(userid,key,$11,a); next;} key ~ /a{CleanMyShop}/ { hincr(userid,key,$9,a); next; } key ~ /(GetAvatarPart|ChangeP|ChangeWakuwakuP|ChangeKonergy)/ { incrbydiff(userid,key,$9,a); next; } ...‘ $IN # for reducer1 (such as “userid % reducenum == 0”) # command userid key value MULTI HINCRBY 1111 a{ChangeGreed} 3 HINCRBY 1111 a{GianEvent} 7 HINCRBY 1111 a{TeamChallenge} 5 HINCRBY 2222 a{Battle} 3 HINCRBY 2222 a{ChangeMoney} 3 ... EXEC