Steve Totman's presentation from the Big Data Warehouse HUG with comScore NYC Sept 23rd covering Syncsort's contribution to Hadoop, Smarter ETL on Hadoop etc...
3. Why Syncsort?
3
• 50% of all mainframes run Syncsort
• 1,500 Mainframe Customers: Most
used & trusted 3rd party mainframe
software
• Speed leader for ETL & Sort
• A history of innovation
• 25+ Issued & Pending Patents
• Large global customer base
• 15,000+ deployments in 68 countries
• First-to-market, fully integrated
approach to Hadoop ETL
For 40 years we have been helping companies solve their big data
issues…even before they knew the name Big Data!
Our customers are achieving the
impossible, every day!Integrating Big Data… Smarter!
Key Technology Partners
6. Smart Contributions to Improve Hadoop
6
JIRA
4807 Allow MapOutputBuffer to be pluggable
4808 Allow Reduce-side merge to be pluggable
4809 Make classes required for 2454 public
4812 Create reduce input merger plug-in
Description
…and more!!
4842 Shuffle race can hang reducer
2461 HDFS file name globbing in libhdfs
4482 Backport of 2454 to MapReduce 1 & 1.2
Native Sort:
ᵡNot modular
ᵡLimited capabilities
ᵡDifficult to fine-tune & configure (requires coding &
compilation)
Native
Sort
Hadoop
Node
Native
Sort
Hadoop
Node
Contribution:
Modular
Extensible
Configurable through use of external sorters on
MapReduce nodes
Native
Sort
Hadoop
Node
Native
Sort
Hadoop
Node
+1 Committed Jan 22nd 2013
7. 7
0
50
100
150
200
250
0 1000 2000 3000 4000 5000
ElapsedTime(min)
File Size (GB)
TeraSort Benchmark
Benefits to the Community
JOIN
MERGE
AGGREGRATION
CDC
COMPRESSION
LOOKUP
RANK
MATCH
Native Sort
Syncsort
8. The Smarter Approach to Hadoop ETL… and Mainframe
8
Connect Process
Connect – One tool to connect all your data
Translate - Best in class mainframe data
access with seamless data translation &
COBOL Copybooks support
Process – Hadoop ETL without coding.
Develop, test & debug locally in Windows;
deploy on Hadoop
PLUS…
Enterprise-grade security
Smarter deployment, monitoring &
administration
Disruptive cost-structure
Decades of Mainframe expertise
Translate
9. 9
www.syncsort.com/try
+
Running on CDH & HDP
Test drive dmx-h:
Bridge the Gap Between
Big Iron & Big Data!
• Self-contained image
• Use case accelerators for
• mainframe, Hadoop and more!
…and Quite Possibly The Only Approach!
A Smarter Approach…
( )
10. What Next?
10
Date: Wednesday, September 25, 2013
Time: 9:00 AM PT | 12:00 PM ET
Register & Attend Webinar for a Free T-Shirt