42. m = function(){
this.tags.forEach{
function(z) {
emit(z, {count: 1});
}
};
};
r = function(key, values) {
var total=0;
for (i=0, i<values.length, i++)
total += values[i].count;
return { count : total };
}
res=db.things.mapReduce(m,!r);
# finalize
49. Examples
Conclusions and Future Work
Party Solutions
Motivation
Architecture
Examples
Conclusions and Future Work
ummary of Features
Hadoop-based: same limitations as Streaming (Dumbo) and
Streaming Jython Pydoop
Jython (Happy), except for ease of use
C/C++ Ext Yes No Yes
Other implementations: good if you have your own cluster
Standard Lib Full Partial Full
Hadoop is the most widespread implementation
MR API No* Full Partial
Java-like FW No Yes Yes
HDFS No
Leo, Zanetti
Yes Yes
Pydoop: a Python MapReduce and HDFS API for Hadoop
(*) you can only write the map and reduce parts as executable scripts.
50. Motivation
Architecture
Examples
Conclusions and Future Work
Hadoop Pipes
Communication with Java
framework via persistent
sockets
The C++ app provides a
factory used by the framework
to create MR components
Providing Mapper and
Reducer is mandatory
Leo, Zanetti Pydoop: a Python MapReduce and HDFS API for Hadoop
51. Motivation
Architecture
Examples
Conclusions and Future Work
Integration of Pydoop with C++
Integration with Pipes:
Method calls flow from the
framework through the C++ and the
Pydoop API, ultimately reaching
user-defined methods
Results are wrapped by Boost and
returned to the framework
Integration with HDFS:
Function calls initiated by Pydoop
Results wrapped and returned as
Python objects to the app