SlideShare a Scribd company logo
1 of 20
Download to read offline
Habits of Effective Sqoop Users
Kate Ting, Customer Operations Engineer
kate@cloudera.com
Halp! Sqoop doesn't work!

    Now what?


2
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

3
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

4
First Things First
    Save time by providing this upfront:
    •  Versions: Sqoop, Hadoop, OS, JDBC
    •  Run with --verbose flag then attach log
    •  Sqoop command including options-file
    •  Expected output vs. actual output
    •  Table definition
    •  Input data set that triggers problem
    •  Hadoop task logs
    •  Check permissions on input files
    •  Divide and conquer
       –  e.g. Import that creates and populates a Hive table is failing
           •  First, do the import alone
           •  Second, create a Hive table without the import using the create-hive-
              table tool



5
Common Problems


6
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

7
MySQL: Connection Failure
    java.lang.RuntimeException: java.lang.RuntimeException:
    com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure

    The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received
    any packets from the server.
    at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 )
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1127)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)




8
MySQL: Connection Failure
    •    Problem: Communications Link Failure caused by incorrect permissions.
    •    Solution:
          –  Verify that you can connect to the database from the node where you
             are running Sqoop:
               •  $ mysql --host=<IP Address> --database=test --user=<username> --
                  password=<password>
          –  Add the network port for the server to your my.cnf file
          –  Set up a user account to connect via Sqoop. Grant permissions to the
             user to access the database over the network:
               •  Log into MySQL as root mysql -u root -p<ThisIsMyPassword>
               •  Issue the following command: mysql> grant all privileges on test.* to
                  'testuser'@'%' identified by 'testpassword'




9
MySQL: Importing into Hive
 •  Troubleshooting tips:
     –  Look at /tmp/${user}/hive.log
        •  Identifies exceptions during the load
     –  Look at /user/hive/warehouse
        •  View contents of the imported data




10
Agenda

 •  First Things First
 •  Common Problems
 •  MySQL
     –  Connection Failure
     –  Importing into Hive
 •  Oracle
     –  Case-Sensitive Catalog Query Errors
     –  Sqoop Export Failing
 •  Effective Sqoop Habits

11
Oracle: Case-Sensitive Catalog Query Errors
 INFO manager.OracleManager: Time zone has been set to GMT
 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
 INFO manager.SqlManager: Executing SQL statement:
 SELECT t.* FROM addlabel_pris t WHERE 1=0
 DEBUG manager.OracleManager$ConnCache: Caching
 released connection for jdbc:oracle:thin:
 ERROR sqoop.Sqoop: Got exception running Sqoop:
 java.lang.NullPointerException
 java.lang.NullPointerException
 at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148)
 at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187)
 at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
 at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
 at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
 at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
 at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
 at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)




12
Oracle: Case-Sensitive Catalog Query Errors

 •  Problem: NPE caused by using the wrong
    case for the user name and table name.
 •  Solution: Always specify the user and table
    names in upper case (unless it was
    created with mixed/lower case within
    quotes).




13
Oracle: Sqoop Export Failing
 INFO mapred.JobClient: Running job: job_201109231340_0785
 INFO mapred.JobClient: map 0% reduce 0%
 INFO mapred.JobClient: Task Id :
 attempt_201109231340_0785_m_000000_0, Status : FAILED
 java.lang.NullPointerException
 at
 com.cloudera.sqoop.mapreduce.db.DataDrivenDBRecordReader.getSe
 lectQuery(DataDrivenDBRecordReader.java:87)
 at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue
 (DBRecordReader.java:225)
 at org.apache.hadoop.mapred.MapTask
 $NewTrackingRecordReader.nextKeyValue(MapTask.java:455)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
 (MapContext.java:67)




14
Oracle: Sqoop Export Failing
 •  Problem: IllegalArgumentException
    caused by not non-owner trying to connect
    to the table.
 •  Solution: Prefix the table name with the
    schema, for example
    SchemaName.OracleTableName.




15
Agenda

 •  First Things First
 •  Common Problems
 •  MySQL
     –  Connection Failure
     –  Importing into Hive
 •  Oracle
     –  Case-Sensitive Catalog Query Errors
     –  Sqoop Export Failing
 •  Effective Sqoop Habits

16
Effective Sqoop Habits

 •  Do create an empty export table.

 •  Don’t use the same table for both import
    and export.




17
Effective Sqoop Habits

 •  Do use --escaped-by option during import
    and --input-escaped-by during export.
 •  Do use fields-terminated-by during import
    and input-fields-terminated-by during
    export.

 •  Don’t reverse them.



18
Effective Sqoop Habits

 •  Do specify the direct mode option (--
    direct), if you use the direct connector.

 •  Don’t specify the query, if you use the
    direct connector.




19
How Do You Eat an Elephant?

 •  One bite at a time
     –  Versions
     –  Verbose flag
     –  Console log
     –  Exact command, etc

 •  Sqoop Troubleshooting Guide
     –  http://archive.cloudera.com/cdh/3/sqoop/
        SqoopUserGuide.html#_troubleshooting


20

More Related Content

What's hot

Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
 
Moving Data Between Exadata and Hadoop
Moving Data Between Exadata and HadoopMoving Data Between Exadata and Hadoop
Moving Data Between Exadata and HadoopEnkitec
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronDuyhai Doan
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemDuyhai Doan
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Lucidworks
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in sparkPeng Cheng
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 

What's hot (19)

Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Moving Data Between Exadata and Hadoop
Moving Data Between Exadata and HadoopMoving Data Between Exadata and Hadoop
Moving Data Between Exadata and Hadoop
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata Migrations
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
 

Viewers also liked

Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
The Transporter 4 MySQL
The Transporter 4 MySQLThe Transporter 4 MySQL
The Transporter 4 MySQLSagar Nikam
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Guy Harrison
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Cloudera, Inc.
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopJeyamariappan Guru
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopSomeshwar Kale
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaDataWorks Summit/Hadoop Summit
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionDataWorks Summit
 

Viewers also liked (10)

Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
The Transporter 4 MySQL
The Transporter 4 MySQLThe Transporter 4 MySQL
The Transporter 4 MySQL
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 

Similar to Habits of Effective Sqoop Users

Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the CloudInes Sombra
 
Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Chef
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Tingjaxconf
 
Writing Better Haskell
Writing Better HaskellWriting Better Haskell
Writing Better Haskellnkpart
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)andyhky
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystemconfluent
 
Velocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack WorkshopVelocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack WorkshopChef Software, Inc.
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesJonathan Klein
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Host Health Monitoring with Docker Run
Host Health Monitoring with Docker RunHost Health Monitoring with Docker Run
Host Health Monitoring with Docker RunNoah Zoschke
 
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...Nilesh Panchal
 
DrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an AfterthoughtDrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an AfterthoughtNick Santamaria
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewLei (Harry) Zhang
 
Breaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloudBreaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloudJan Kischkel
 
TryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsAnne Gentle
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in CloudInMobi Technology
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)Colin Charles
 

Similar to Habits of Effective Sqoop Users (20)

Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Ting
 
Writing Better Haskell
Writing Better HaskellWriting Better Haskell
Writing Better Haskell
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystem
 
Velocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack WorkshopVelocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack Workshop
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast Websites
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Host Health Monitoring with Docker Run
Host Health Monitoring with Docker RunHost Health Monitoring with Docker Run
Host Health Monitoring with Docker Run
 
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
 
DrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an AfterthoughtDrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an Afterthought
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Breaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloudBreaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloud
 
TryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and Admins
 
Avoid boring work_v2
Avoid boring work_v2Avoid boring work_v2
Avoid boring work_v2
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)
 

Recently uploaded

How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...SOFTTECHHUB
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 

Recently uploaded (20)

How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 

Habits of Effective Sqoop Users

  • 1. Habits of Effective Sqoop Users Kate Ting, Customer Operations Engineer kate@cloudera.com
  • 2. Halp! Sqoop doesn't work! Now what? 2
  • 3. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 3
  • 4. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 4
  • 5. First Things First Save time by providing this upfront: •  Versions: Sqoop, Hadoop, OS, JDBC •  Run with --verbose flag then attach log •  Sqoop command including options-file •  Expected output vs. actual output •  Table definition •  Input data set that triggers problem •  Hadoop task logs •  Check permissions on input files •  Divide and conquer –  e.g. Import that creates and populates a Hive table is failing •  First, do the import alone •  Second, create a Hive table without the import using the create-hive- table tool 5
  • 7. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 7
  • 8. MySQL: Connection Failure java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 ) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) 8
  • 9. MySQL: Connection Failure •  Problem: Communications Link Failure caused by incorrect permissions. •  Solution: –  Verify that you can connect to the database from the node where you are running Sqoop: •  $ mysql --host=<IP Address> --database=test --user=<username> -- password=<password> –  Add the network port for the server to your my.cnf file –  Set up a user account to connect via Sqoop. Grant permissions to the user to access the database over the network: •  Log into MySQL as root mysql -u root -p<ThisIsMyPassword> •  Issue the following command: mysql> grant all privileges on test.* to 'testuser'@'%' identified by 'testpassword' 9
  • 10. MySQL: Importing into Hive •  Troubleshooting tips: –  Look at /tmp/${user}/hive.log •  Identifies exceptions during the load –  Look at /user/hive/warehouse •  View contents of the imported data 10
  • 11. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 11
  • 12. Oracle: Case-Sensitive Catalog Query Errors INFO manager.OracleManager: Time zone has been set to GMT DEBUG manager.SqlManager: Using fetchSize for next query: 1000 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM addlabel_pris t WHERE 1=0 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin: ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148) at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187) at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362) at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423) at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228) at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237) 12
  • 13. Oracle: Case-Sensitive Catalog Query Errors •  Problem: NPE caused by using the wrong case for the user name and table name. •  Solution: Always specify the user and table names in upper case (unless it was created with mixed/lower case within quotes). 13
  • 14. Oracle: Sqoop Export Failing INFO mapred.JobClient: Running job: job_201109231340_0785 INFO mapred.JobClient: map 0% reduce 0% INFO mapred.JobClient: Task Id : attempt_201109231340_0785_m_000000_0, Status : FAILED java.lang.NullPointerException at com.cloudera.sqoop.mapreduce.db.DataDrivenDBRecordReader.getSe lectQuery(DataDrivenDBRecordReader.java:87) at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue (DBRecordReader.java:225) at org.apache.hadoop.mapred.MapTask $NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue (MapContext.java:67) 14
  • 15. Oracle: Sqoop Export Failing •  Problem: IllegalArgumentException caused by not non-owner trying to connect to the table. •  Solution: Prefix the table name with the schema, for example SchemaName.OracleTableName. 15
  • 16. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 16
  • 17. Effective Sqoop Habits •  Do create an empty export table. •  Don’t use the same table for both import and export. 17
  • 18. Effective Sqoop Habits •  Do use --escaped-by option during import and --input-escaped-by during export. •  Do use fields-terminated-by during import and input-fields-terminated-by during export. •  Don’t reverse them. 18
  • 19. Effective Sqoop Habits •  Do specify the direct mode option (-- direct), if you use the direct connector. •  Don’t specify the query, if you use the direct connector. 19
  • 20. How Do You Eat an Elephant? •  One bite at a time –  Versions –  Verbose flag –  Console log –  Exact command, etc •  Sqoop Troubleshooting Guide –  http://archive.cloudera.com/cdh/3/sqoop/ SqoopUserGuide.html#_troubleshooting 20

Editor's Notes

  1. Sqoopdoes not guarantee intuitive error messages. But I guarantee that in the next ten minutes you will either learn or be reminded of a fewtips to make your next debugging session more effective.
  2. Specifying the query bypasses the direct connector