SlideShare a Scribd company logo
1 of 43
HIVE          PIG
MAPREDUCE

•   @hamburger_kid 2010 5   27   1
done




   CLOUDERA
HADOOP TRAINING
 FOR DEVELOPERS
done




   CLOUDERA
HADOOP TRAINING
 FOR DEVELOPERS
Day 1            Day 3
  Hadoop           mapreduce
 HDFS              mapreduce
 mapreduce         mapreduce

Day 2
  RDBMS Hadoop
 Hive
 Pig
Day 1            Day 3
  Hadoop           mapreduce
 HDFS              mapreduce
 mapreduce         mapreduce

Day 2
  RDBMS Hadoop
 Hive
 Pig
                               Mr.Alex
Day 1            Day 3
  Hadoop           mapreduce
 HDFS              mapreduce
 mapreduce         mapreduce

Day 2
  RDBMS Hadoop
 Hive
 Pig
                               Mr.Alex
Hive vs Pig
Hive vs Pig




VS
mapreduce
mapreduce
NameNode      Secondary
ClientNode
              JobTracker    NameNode




      Block

         DataNode TaskTracker
Hive Pig
                   NameNode      Secondary
  ClientNode
                   JobTracker    NameNode




           Block

              DataNode TaskTracker
mapreduce




THE END OF MONEY IS THE END OF LOVE

                  map

             shuffle&sort

                reduce
                      source:
                      http://techblog.yahoo.co.jp/cat207/cat209/hadoop/
Hive
Hive
Hive




Facebook

SQL like    mapreduce                   Hive QL
           Table, Partitions, Buckets
                         Metastore
             HDFS
Hive



                        Table, Partitions, Buckets
Table
               column int, float, string, boolean
Partitions
                 data            table   partitioning
  HDFS
  Partitions
Buckets
                   data                     Buckets     = Reduce
  Sampling
Hive Metastore




Metastore     Table, Partitions
Metastore     ClientNode                  NameNode
Derby/MySQL                  DB Metastore



 Table
 HDFS
 Partitions
Hive




                        HDFS
HDFS         directory                            /user/hive/warehouse
Table       warehouse subdirectory
Partitons       Table subdirectory
data        reduce
        /user/hive/warehouse/table/patition/data
                                   SequenceFiles
SerDe                                                        format

        http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook
http://www.rakuten.co.jp/recruit/en/career/employee/
appengineer.html
http://www.rakuten.co.jp/recruit/en/career/employee/
systemproducer.html
HDFS
ls -al /home/hamburgerkid/workspace/techtalk/data/

hadoop fs -rmr hive

hadoop fs -mkdir hive/input
hadoop fs -put /home/hamburgerkid/workspace/
techtalk/data/* hive/input
hadoop fs -ls /user/hamburgerkid/hive/input
Table                                   mapreduce

                     wordcount
hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar wordcount hive/input/app_eng
hive/output/app_eng/
hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar wordcount hive/input/producer
hive/output/producer/

hadoop fs -ls /user/hamburgerkid/hive/output/app_eng/
hadoop fs -ls /user/hamburgerkid/hive/output/producer/

hadoop fs -cat /user/hamburgerkid/hive/output/app_eng/part*
hadoop fs -cat /user/hamburgerkid/hive/output/producer/part*
wordcount output
CREATE TABLE producer (word STRING , freq INT) PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' STORED AS TEXTFILE ;
SHOW TABLES ;
DESCRIBE producer ;

LOAD DATA INPATH '/user/hamburgerkid/hive/output/producer/part*'
INTO TABLE producer PARTITION (dt='20100526') ;

SELECT * FROM producer
WHERE LENGTH(word) > 3 AND freq > 1 SORT BY freq DESC LIMIT 10 ;

EXPLAIN SELECT * FROM producer
WHERE LENGTH(word) > 3 AND freq > 1 SORT BY freq DESC LIMIT 10 ;

hadoop fs -ls /user/hive/warehouse/producer/dt=20100526/
hadoop fs -ls /user/hamburgerkid/hive/output/producer/
wordcount output
CREATE TABLE app_eng (word STRING , freq INT) PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' STORED AS TEXTFILE ;

LOAD DATA INPATH '/user/hamburgerkid/hive/output/app_eng/part*'
INTO TABLE app_eng PARTITION (dt='20100526') ;
Table            (JOIN)
CREATE TABLE develop (word STRING, p_freq INT, e_freq INT) ;

INSERT OVERWRITE TABLE develop
SELECT p.word, p.freq, e.freq FROM producer p
JOIN app_eng e ON (p.word = e.word)
WHERE p.freq > 1 AND e.freq > 1 ;

SELECT word, p_freq, e_freq, (p_freq + e_freq) AS ttl FROM develop
WHERE LENGTH(word) > 3 SORT BY ttl DESC LIMIT 10 ;
(OUTER JOIN)
SELECT e.word, e.freq, p.freq FROM app_eng e
LEFT OUTER JOIN producer p ON (e.word = p.word)
WHERE LENGTH(e.word) > 3 AND p.freq IS NULL
SORT BY e.freq DESC LIMIT 10 ;
Engineering with..
Engineering with..
Pig
Pig
Pig




Yahoo!

Pig Latin                      mapreduce
  join, group, filter, sort
Grunt         Pig shell              script
Hive Metastore
                                              map, tuple, bag
mapreduce              Local                             Hadoop
Pig




             int, long, double, chararray, bytearray
        'apache.org' , '1.0'
tuple
        <apache.org , 1.0>
bag                                  tuple
        {<apache.org , 1.0> , <flickr.com , 0.8>}
map key/value                           value                 OK
        [ 'apache' : <'search' , 'news'> ; 'cnn' : 'news' ]
Hive



       HDFS
ls -al /home/hamburgerkid/workspace/techtalk/data/

hadoop fs -rmr pig

hadoop fs -mkdir pig/input
hadoop fs -put /home/hamburgerkid/workspace/techtalk/data/* pig/input
hadoop fs -ls /user/hamburgerkid/pig/input

              Hive Pig      mapreduce
wordcount

log = LOAD 'pig/input/app_eng' USING TextLoader() ;
flatd = FOREACH log GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS word ;
grpd = GROUP flatd BY word ;
cntd = FOREACH grpd GENERATE COUNT(flatd) , group ;
STORE cntd INTO 'pig/output/app_eng' ;

hadoop fs -ls /user/hamburgerkid/pig/output/app_eng
hadoop fs -cat /user/hamburgerkid/pig/output/app_eng/part*
wordcount

log = LOAD 'pig/input/producer' USING TextLoader() ;
flatd = FOREACH log GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS word ;
grpd = GROUP flatd BY word ;
cntd = FOREACH grpd GENERATE COUNT(flatd) , group ;
STORE cntd INTO 'pig/output/producer' ;

hadoop fs -ls /user/hamburgerkid/pig/output/producer
hadoop fs -cat /user/hamburgerkid/pig/output/producer/part*
(JOIN)

eng
= LOAD 'pig/output/app_eng' AS (freq , word) ;
pro
= LOAD 'pig/output/producer' AS (freq , word) ;
cg = COGROUP eng BY word , pro BY word ;
flatd = FOREACH cg GENERATE FLATTEN(eng) , FLATTEN(pro.freq) AS freq2 ;
ttld = FOREACH flatd GENERATE word , SIZE(word) AS size , freq , freq2 , (freq + freq2) AS total ;
fltrd = FILTER ttld BY freq > 1 AND freq2 > 1 AND size > 3L ;
odrd = LIMIT (ORDER fltrd BY total DESC) 10 ;
DUMP odrd ;
STORE odrd INTO 'pig/output/develop' ;

hadoop fs -ls pig/output/develop
hadoop fs -cat pig/output/develop/part*
(OUTER JOIN)

eng
= LOAD 'pig/output/app_eng' AS (freq , word) ;
pro
= LOAD 'pig/output/producer' AS (freq , word) ;
cg = COGROUP eng BY word , pro BY word ;
outrd = FILTER cg BY COUNT(eng) == 0 ;
flatd = FOREACH outrd GENERATE FLATTEN(pro) ;
szd = FOREACH flatd GENERATE word , SIZE(word) AS size , freq ;
fltrd = FILTER szd BY size > 3L ;
odrd = LIMIT (ORDER fltrd BY freq DESC) 10 ;
DUMP odrd ;
Produce for..
Produce for..
Hive Pig




           mapreduce < Hive < Pig
           mapreduce > Hive < Pig
           mapreduce > Hive > Pig

Hive/Pig         UDF
  /
Hive Pig


         Pig                 close   w



Core logic              mapreduce
                SQL        Hive
                             Pig
@hamburger_kid

More Related Content

What's hot

HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamZheng Shao
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Yahoo Developer Network
 
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateYahoo Developer Network
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]knowbigdata
 
report on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivereport on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivesiddharthboora
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 

What's hot (19)

HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Apache Pig
Apache PigApache Pig
Apache Pig
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]
 
report on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivereport on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hive
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scale
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 

Viewers also liked

Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonCaserta
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)Steve Min
 
Hadoop, Pig, and Python (PyData NYC 2012)
Hadoop, Pig, and Python (PyData NYC 2012)Hadoop, Pig, and Python (PyData NYC 2012)
Hadoop, Pig, and Python (PyData NYC 2012)mortardata
 
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Cloudera, Inc.
 
Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Kevin Weil
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and HadoopDonald Miner
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystèmeKhanh Maudoux
 
Big Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache HadoopBig Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache Hadoophajlaoui jaleleddine
 

Viewers also liked (20)

Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
 
Apache pig
Apache pigApache pig
Apache pig
 
Pig statements
Pig statementsPig statements
Pig statements
 
Hadoop, Pig, and Python (PyData NYC 2012)
Hadoop, Pig, and Python (PyData NYC 2012)Hadoop, Pig, and Python (PyData NYC 2012)
Hadoop, Pig, and Python (PyData NYC 2012)
 
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
 
Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and Hadoop
 
500 important and useful bangla translation
500 important and useful bangla translation500 important and useful bangla translation
500 important and useful bangla translation
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystème
 
Big Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache HadoopBig Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache Hadoop
 

Similar to Hive vs Pig for HadoopSourceCodeReading

ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)moai kids
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章moai kids
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Toshihiro Suzuki
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoopDavid Chiu
 
Introduction to Hadoop - FinistJug
Introduction to Hadoop - FinistJugIntroduction to Hadoop - FinistJug
Introduction to Hadoop - FinistJugDavid Morin
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Groupgethue
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
power point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkpower point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkbhargavi804095
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍Tae Young Lee
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop IntegrationJeremy Hanna
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and futureCodemotion
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 

Similar to Hive vs Pig for HadoopSourceCodeReading (20)

ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop
HadoopHadoop
Hadoop
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
Introduction to Hadoop - FinistJug
Introduction to Hadoop - FinistJugIntroduction to Hadoop - FinistJug
Introduction to Hadoop - FinistJug
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Group
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
power point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkpower point presentation on pig -hadoop framework
power point presentation on pig -hadoop framework
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Zenith it-hadoop-training
Zenith it-hadoop-trainingZenith it-hadoop-training
Zenith it-hadoop-training
 

Recently uploaded

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Hive vs Pig for HadoopSourceCodeReading

  • 1. HIVE PIG MAPREDUCE • @hamburger_kid 2010 5 27 1
  • 2. done CLOUDERA HADOOP TRAINING FOR DEVELOPERS
  • 3. done CLOUDERA HADOOP TRAINING FOR DEVELOPERS
  • 4. Day 1 Day 3 Hadoop mapreduce HDFS mapreduce mapreduce mapreduce Day 2 RDBMS Hadoop Hive Pig
  • 5. Day 1 Day 3 Hadoop mapreduce HDFS mapreduce mapreduce mapreduce Day 2 RDBMS Hadoop Hive Pig Mr.Alex
  • 6. Day 1 Day 3 Hadoop mapreduce HDFS mapreduce mapreduce mapreduce Day 2 RDBMS Hadoop Hive Pig Mr.Alex
  • 11. NameNode Secondary ClientNode JobTracker NameNode Block DataNode TaskTracker
  • 12. Hive Pig NameNode Secondary ClientNode JobTracker NameNode Block DataNode TaskTracker
  • 13. mapreduce THE END OF MONEY IS THE END OF LOVE map shuffle&sort reduce source: http://techblog.yahoo.co.jp/cat207/cat209/hadoop/
  • 14. Hive
  • 15. Hive
  • 16. Hive Facebook SQL like mapreduce Hive QL Table, Partitions, Buckets Metastore HDFS
  • 17. Hive Table, Partitions, Buckets Table column int, float, string, boolean Partitions data table partitioning HDFS Partitions Buckets data Buckets = Reduce Sampling
  • 18. Hive Metastore Metastore Table, Partitions Metastore ClientNode NameNode Derby/MySQL DB Metastore Table HDFS Partitions
  • 19. Hive HDFS HDFS directory /user/hive/warehouse Table warehouse subdirectory Partitons Table subdirectory data reduce /user/hive/warehouse/table/patition/data SequenceFiles SerDe format http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook
  • 21. HDFS ls -al /home/hamburgerkid/workspace/techtalk/data/ hadoop fs -rmr hive hadoop fs -mkdir hive/input hadoop fs -put /home/hamburgerkid/workspace/ techtalk/data/* hive/input hadoop fs -ls /user/hamburgerkid/hive/input
  • 22. Table mapreduce wordcount hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar wordcount hive/input/app_eng hive/output/app_eng/ hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar wordcount hive/input/producer hive/output/producer/ hadoop fs -ls /user/hamburgerkid/hive/output/app_eng/ hadoop fs -ls /user/hamburgerkid/hive/output/producer/ hadoop fs -cat /user/hamburgerkid/hive/output/app_eng/part* hadoop fs -cat /user/hamburgerkid/hive/output/producer/part*
  • 23. wordcount output CREATE TABLE producer (word STRING , freq INT) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' STORED AS TEXTFILE ; SHOW TABLES ; DESCRIBE producer ; LOAD DATA INPATH '/user/hamburgerkid/hive/output/producer/part*' INTO TABLE producer PARTITION (dt='20100526') ; SELECT * FROM producer WHERE LENGTH(word) > 3 AND freq > 1 SORT BY freq DESC LIMIT 10 ; EXPLAIN SELECT * FROM producer WHERE LENGTH(word) > 3 AND freq > 1 SORT BY freq DESC LIMIT 10 ; hadoop fs -ls /user/hive/warehouse/producer/dt=20100526/ hadoop fs -ls /user/hamburgerkid/hive/output/producer/
  • 24. wordcount output CREATE TABLE app_eng (word STRING , freq INT) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' STORED AS TEXTFILE ; LOAD DATA INPATH '/user/hamburgerkid/hive/output/app_eng/part*' INTO TABLE app_eng PARTITION (dt='20100526') ;
  • 25. Table (JOIN) CREATE TABLE develop (word STRING, p_freq INT, e_freq INT) ; INSERT OVERWRITE TABLE develop SELECT p.word, p.freq, e.freq FROM producer p JOIN app_eng e ON (p.word = e.word) WHERE p.freq > 1 AND e.freq > 1 ; SELECT word, p_freq, e_freq, (p_freq + e_freq) AS ttl FROM develop WHERE LENGTH(word) > 3 SORT BY ttl DESC LIMIT 10 ;
  • 26. (OUTER JOIN) SELECT e.word, e.freq, p.freq FROM app_eng e LEFT OUTER JOIN producer p ON (e.word = p.word) WHERE LENGTH(e.word) > 3 AND p.freq IS NULL SORT BY e.freq DESC LIMIT 10 ;
  • 29. Pig
  • 30. Pig
  • 31. Pig Yahoo! Pig Latin mapreduce join, group, filter, sort Grunt Pig shell script Hive Metastore map, tuple, bag mapreduce Local Hadoop
  • 32. Pig int, long, double, chararray, bytearray 'apache.org' , '1.0' tuple <apache.org , 1.0> bag tuple {<apache.org , 1.0> , <flickr.com , 0.8>} map key/value value OK [ 'apache' : <'search' , 'news'> ; 'cnn' : 'news' ]
  • 33. Hive HDFS ls -al /home/hamburgerkid/workspace/techtalk/data/ hadoop fs -rmr pig hadoop fs -mkdir pig/input hadoop fs -put /home/hamburgerkid/workspace/techtalk/data/* pig/input hadoop fs -ls /user/hamburgerkid/pig/input Hive Pig mapreduce
  • 34. wordcount log = LOAD 'pig/input/app_eng' USING TextLoader() ; flatd = FOREACH log GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS word ; grpd = GROUP flatd BY word ; cntd = FOREACH grpd GENERATE COUNT(flatd) , group ; STORE cntd INTO 'pig/output/app_eng' ; hadoop fs -ls /user/hamburgerkid/pig/output/app_eng hadoop fs -cat /user/hamburgerkid/pig/output/app_eng/part*
  • 35. wordcount log = LOAD 'pig/input/producer' USING TextLoader() ; flatd = FOREACH log GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS word ; grpd = GROUP flatd BY word ; cntd = FOREACH grpd GENERATE COUNT(flatd) , group ; STORE cntd INTO 'pig/output/producer' ; hadoop fs -ls /user/hamburgerkid/pig/output/producer hadoop fs -cat /user/hamburgerkid/pig/output/producer/part*
  • 36. (JOIN) eng = LOAD 'pig/output/app_eng' AS (freq , word) ; pro = LOAD 'pig/output/producer' AS (freq , word) ; cg = COGROUP eng BY word , pro BY word ; flatd = FOREACH cg GENERATE FLATTEN(eng) , FLATTEN(pro.freq) AS freq2 ; ttld = FOREACH flatd GENERATE word , SIZE(word) AS size , freq , freq2 , (freq + freq2) AS total ; fltrd = FILTER ttld BY freq > 1 AND freq2 > 1 AND size > 3L ; odrd = LIMIT (ORDER fltrd BY total DESC) 10 ; DUMP odrd ; STORE odrd INTO 'pig/output/develop' ; hadoop fs -ls pig/output/develop hadoop fs -cat pig/output/develop/part*
  • 37. (OUTER JOIN) eng = LOAD 'pig/output/app_eng' AS (freq , word) ; pro = LOAD 'pig/output/producer' AS (freq , word) ; cg = COGROUP eng BY word , pro BY word ; outrd = FILTER cg BY COUNT(eng) == 0 ; flatd = FOREACH outrd GENERATE FLATTEN(pro) ; szd = FOREACH flatd GENERATE word , SIZE(word) AS size , freq ; fltrd = FILTER szd BY size > 3L ; odrd = LIMIT (ORDER fltrd BY freq DESC) 10 ; DUMP odrd ;
  • 40. Hive Pig mapreduce < Hive < Pig mapreduce > Hive < Pig mapreduce > Hive > Pig Hive/Pig UDF /
  • 41. Hive Pig Pig close w Core logic mapreduce SQL Hive Pig
  • 42.

Editor's Notes