SlideShare a Scribd company logo
1 of 29
Big Data Warehousing
Meetup - April 8, 2014
Building a Big Data Warehouse
on the Cloud in 30 Minutes
Sponsored By:
7:00 –
7:15
Networking (15 min)
Grab some food and drink... Make some friends.
7:15 –
7:35
Bob Eilbacher (20 min)
VP Sales
Caserta Concepts
Welcome + Intro
About the Meetup, about Caserta Concepts
+ Swag
7:35 –
8:20
Elliott Cordo (45 min)
Chief Architect
Caserta Concepts.
Building a Big Data Warehouse on the Cloud
Live demo of Amazon's AWS, S3, EMR, and
Redshift
8:20 –
8:40
Ben Sgro (20 min)
Sr. Software Engineer
Simulmedia
Implementing Redis on the Cloud
An ultra-low latency customer segmentation tool
with AWS Elasticache
8:40 –
9:00
Q&A (10 min)
More Networking (10 min)
Tell us what you’re up to…
Agenda
Gathering music brought to you by….
BIG DATA
a paranoid electronic music
project from the Internet,
formed out of a general
distrust for technology and
The Cloud (despite a
growing dependence on
them).
bigdata.fm
• Big Data is a complex, rapidly changing
landscape
• We want to share our stories and hear
about yours
• Great networking opportunity for like
minded data nerds
• Opportunities to collaborate on exciting
projects
• Founded by Caserta Concepts
• Big Data Analytics, DW, BI Consulting
About the BDW Meetup
A BDW Meetup Milestone
Real-world Data Science
w/Claudia Perlich
• Date:
• Tuesday May 27, 2014, 7:00 PM
• Location:
• New Work City, Broadway & Canal
• Sponsor:
• Revolution Analytics
Next BDW Meetup
Caserta Concepts
• Technology innovation company with expertise in:
• Big Data Solutions
• Data Warehousing
• Business Intelligence
• Core focus in the following industries:
• eCommerce / Retail / Marketing
• Financial Services / Insurance
• Healthcare / Digital Media
• Established in 2001:
• Increased growth year-over-year
• Industry recognized work force
• Consulting, Writing, Education
• Data Science & Analytics
• Data on the Cloud
• Data Interaction & Visualization
Innovation & Implementation
Listed as a Top 20 Most Promising
Data Analytics Consulting Companies
CIOReview looked at hundreds of data analytics consulting companies and shortlisted
the ones who are at the forefront of tackling the real analytics challenges.
A distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial
board of CIOReview selected the Final 20.
Expertise & Offerings
Strategic Roadmap /
Assessment / Education /
Implementation
Data Warehousing/
ETL/Data Integration
BI/Visualization/
Analytics
Big Data
Analytics
Hadoop Distributions
Platforms/ETL
Analytics & BI
Caserta Partners
Client Portfolio
Finance. Healthcare
& Insurance
Retail/eCommerce
& Manufacturing
Education
& Services
Does this word cloud excite you?
Speak with us about our open positions: leslie@casertaconcepts.com
Join Our Network
Storm
Big Data Architect Hbase
Cassandra
SWAG
Big Data is like water.
There is little point in debating how much there is.
It’s the flow and use that matters.
#gigaomlive
@dominiek
3/20/2014
Gigaom Structure Data
BUILDINGA BIG DATA WAREHOUSE IN THE
CLOUD IN 30 MIN
Elliott Cordo
Chief Architect, Caserta Concepts
What is a Big Data Warehouse??
• An enterprise system providing reliable ah-hoc analytics,
reporting, and decision support
• Large Scale – Big Data
• Not only confined to traditional Dimensional model
Big Data Warehouse
• Data governance is still important!
• Data Quality
• Metadata: Naming, Lineage, etc
Data cannot be governed until it is structured
Big Data
Warehouse
Data Science
Workspace
Data Lake – Integrated
Sandbox
Landing – Source Data in “Full
Fidelity”
Cloud
• Infrastructure is not fun
• Months to server procurement
• Inability to handle growth
• Servers idling all day doing nothing
• Cloud to the rescue
• Unlimited cheap storage
• Provision new servers in minutes
• Use of elastic services!  EMR
• AWESOME for prototypes and POC’s
About our sample data
• Consumer Yelp Ratings
• Generated based on Kaggle dataset  100 million rows
• Model looks something like this:
f_reviews
d_date d_business
d_user
So let’s get cooking
1. Create an EMR cluster  On Demand Hadoop
1. Provision a Redshift cluster  Data Warehouse
Redshift
• Massive Parallel Processing
• Columnar DB’s that present themselves as relational
• MPP’s grew up in Parallel to Hadoop
• Impala, HAWQ are MPP’s themselves!
• OEM of Actian Matrix (formerly ParaAccel)
• A modern MPP, clean, reliable, SCHEMA AGNOSTIC
Redshift is cheap inexpensive?
Enterprise grade EDW @ $1000/TB per year
MPP Design Considerations
• JOINS
• Shuffle – data is large and distributed by key to servers
• Broadcast – data is small and gets distributed to all servers
• Collocated – all data needed for join is on same server
• Design Considerations for MPP
• Distribution Key
• Collocated joins
• Even distribution of work across the cluster
• Customer will work well
• Sort Key
• Fastest scan operations
• Primary date field is usually best
ETL – Transform your data
• S3 is the ultimate staging ground
• Use EMR for the heavy lifting:
• Run your ETL Program and kill it when done!
• Pay just for processing.
• PIG, native map reduce, streaming
• For the right use case HIVE or Impala can be used for
ETL too (mainly for aggregates, summaries)
Smaller data - don’t need EMR?
• Python ETL on EC2 (on Demand)
• Can later “graduate” to big data using Hadoop streaming
• Your favorite ETL tool is just fine too
Presentation Layer – Data Warehouse
How do you get your ETL data in?
• Hadoop distcp - High performance transfer of data from
S3 to HDFS
• Distributed COPY from S3 to Redshift
And how to orchestrate all of this?
• Amazon data pipelines
• AWS CLI
• Build a driver program using modules like Boto (Python)
• Cron or external scheduler
Back to AWS
1. Apply Redshift DDL and load tables
1. Run some queries
elliott@casertaconcepts.com

More Related Content

What's hot

Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcarePerficient, Inc.
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
 
NYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talkNYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talkSanjay Sharma
 
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use CasesGlobal Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use CasesSanjay Sharma
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataKai Wähner
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?SnapLogic
 
NetApp at Gartner Symposium Show Guide
NetApp at Gartner Symposium Show GuideNetApp at Gartner Symposium Show Guide
NetApp at Gartner Symposium Show GuideNetAppUK
 
How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...
How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...
How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...Amazon Web Services
 
You are not Facebook or Google? Why you should still care about Big Data and ...
You are not Facebook or Google? Why you should still care about Big Data and ...You are not Facebook or Google? Why you should still care about Big Data and ...
You are not Facebook or Google? Why you should still care about Big Data and ...Kai Wähner
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies SnapLogic
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsSnapLogic
 
CWIN17 Frankfurt / data_stax_personalisatontopowercx
CWIN17 Frankfurt / data_stax_personalisatontopowercxCWIN17 Frankfurt / data_stax_personalisatontopowercx
CWIN17 Frankfurt / data_stax_personalisatontopowercxCapgemini
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014Amazon Web Services
 
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Tyler Wishnoff
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningTalend
 
Why, How, When and When Not of Big Data For Startups
Why, How, When and When Not of Big Data For StartupsWhy, How, When and When Not of Big Data For Startups
Why, How, When and When Not of Big Data For StartupsDhruv Gohil
 
Automate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeAutomate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeImpetus Technologies
 

What's hot (20)

Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
NYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talkNYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talk
 
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use CasesGlobal Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your Data
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
 
AWS Webcast - Tibco Jaspersoft
AWS Webcast - Tibco JaspersoftAWS Webcast - Tibco Jaspersoft
AWS Webcast - Tibco Jaspersoft
 
NetApp at Gartner Symposium Show Guide
NetApp at Gartner Symposium Show GuideNetApp at Gartner Symposium Show Guide
NetApp at Gartner Symposium Show Guide
 
How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...
How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...
How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...
 
You are not Facebook or Google? Why you should still care about Big Data and ...
You are not Facebook or Google? Why you should still care about Big Data and ...You are not Facebook or Google? Why you should still care about Big Data and ...
You are not Facebook or Google? Why you should still care about Big Data and ...
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud Analytics
 
The API Lie
The API LieThe API Lie
The API Lie
 
CWIN17 Frankfurt / data_stax_personalisatontopowercx
CWIN17 Frankfurt / data_stax_personalisatontopowercxCWIN17 Frankfurt / data_stax_personalisatontopowercx
CWIN17 Frankfurt / data_stax_personalisatontopowercx
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
 
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
 
Why, How, When and When Not of Big Data For Startups
Why, How, When and When Not of Big Data For StartupsWhy, How, When and When Not of Big Data For Startups
Why, How, When and When Not of Big Data For Startups
 
Automate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeAutomate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to Snowflake
 

Viewers also liked

Data warehousev2.1
Data warehousev2.1Data warehousev2.1
Data warehousev2.1Tuan Luong
 
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsBenchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsRim Moussa
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data WarehousesMichael Lamont
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Dw design 2_conceptual_model
Dw design 2_conceptual_modelDw design 2_conceptual_model
Dw design 2_conceptual_modelClaudia Gomez
 
Difference between star schema and snowflake schema
Difference between star schema and snowflake schemaDifference between star schema and snowflake schema
Difference between star schema and snowflake schemaUmar Ali
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Multidimensional data models
Multidimensional data  modelsMultidimensional data  models
Multidimensional data models774474
 
DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURESachin Batham
 
ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話Preferred Networks
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleSajjad Zaheer
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesCode Mastery
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
 
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...Amazon Web Services
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehousedrluckyspin
 

Viewers also liked (20)

Data warehousev2.1
Data warehousev2.1Data warehousev2.1
Data warehousev2.1
 
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsBenchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Dw design 2_conceptual_model
Dw design 2_conceptual_modelDw design 2_conceptual_model
Dw design 2_conceptual_model
 
Difference between star schema and snowflake schema
Difference between star schema and snowflake schemaDifference between star schema and snowflake schema
Difference between star schema and snowflake schema
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Multidimensional data models
Multidimensional data  modelsMultidimensional data  models
Multidimensional data models
 
DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURE
 
ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehouse
 

Similar to Build a Big Data Warehouse on the Cloud in 30 Minutes

Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupCaserta
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureKhalid Salama
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
AWS for Semiconductor and Electronics Design | Hsinchu, April 10
AWS for Semiconductor and Electronics Design | Hsinchu, April 10AWS for Semiconductor and Electronics Design | Hsinchu, April 10
AWS for Semiconductor and Electronics Design | Hsinchu, April 10Amazon Web Services
 
Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAmazon Web Services
 
Exploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & FutureExploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & FutureAgilisium Consulting
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationInside Analysis
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스Amazon Web Services Korea
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Web Services
 
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)Bogdan Bocse
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 

Similar to Build a Big Data Warehouse on the Cloud in 30 Minutes (20)

Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
AWS for Semiconductor and Electronics Design | Hsinchu, April 10
AWS for Semiconductor and Electronics Design | Hsinchu, April 10AWS for Semiconductor and Electronics Design | Hsinchu, April 10
AWS for Semiconductor and Electronics Design | Hsinchu, April 10
 
Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWS
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Exploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & FutureExploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & Future
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for Integration
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
 
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Build a Big Data Warehouse on the Cloud in 30 Minutes

  • 1. Big Data Warehousing Meetup - April 8, 2014 Building a Big Data Warehouse on the Cloud in 30 Minutes Sponsored By:
  • 2. 7:00 – 7:15 Networking (15 min) Grab some food and drink... Make some friends. 7:15 – 7:35 Bob Eilbacher (20 min) VP Sales Caserta Concepts Welcome + Intro About the Meetup, about Caserta Concepts + Swag 7:35 – 8:20 Elliott Cordo (45 min) Chief Architect Caserta Concepts. Building a Big Data Warehouse on the Cloud Live demo of Amazon's AWS, S3, EMR, and Redshift 8:20 – 8:40 Ben Sgro (20 min) Sr. Software Engineer Simulmedia Implementing Redis on the Cloud An ultra-low latency customer segmentation tool with AWS Elasticache 8:40 – 9:00 Q&A (10 min) More Networking (10 min) Tell us what you’re up to… Agenda
  • 3. Gathering music brought to you by…. BIG DATA a paranoid electronic music project from the Internet, formed out of a general distrust for technology and The Cloud (despite a growing dependence on them). bigdata.fm
  • 4. • Big Data is a complex, rapidly changing landscape • We want to share our stories and hear about yours • Great networking opportunity for like minded data nerds • Opportunities to collaborate on exciting projects • Founded by Caserta Concepts • Big Data Analytics, DW, BI Consulting About the BDW Meetup
  • 5. A BDW Meetup Milestone
  • 6. Real-world Data Science w/Claudia Perlich • Date: • Tuesday May 27, 2014, 7:00 PM • Location: • New Work City, Broadway & Canal • Sponsor: • Revolution Analytics Next BDW Meetup
  • 7. Caserta Concepts • Technology innovation company with expertise in: • Big Data Solutions • Data Warehousing • Business Intelligence • Core focus in the following industries: • eCommerce / Retail / Marketing • Financial Services / Insurance • Healthcare / Digital Media • Established in 2001: • Increased growth year-over-year • Industry recognized work force • Consulting, Writing, Education • Data Science & Analytics • Data on the Cloud • Data Interaction & Visualization
  • 8. Innovation & Implementation Listed as a Top 20 Most Promising Data Analytics Consulting Companies CIOReview looked at hundreds of data analytics consulting companies and shortlisted the ones who are at the forefront of tackling the real analytics challenges. A distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial board of CIOReview selected the Final 20.
  • 9. Expertise & Offerings Strategic Roadmap / Assessment / Education / Implementation Data Warehousing/ ETL/Data Integration BI/Visualization/ Analytics Big Data Analytics
  • 11. Client Portfolio Finance. Healthcare & Insurance Retail/eCommerce & Manufacturing Education & Services
  • 12. Does this word cloud excite you? Speak with us about our open positions: leslie@casertaconcepts.com Join Our Network Storm Big Data Architect Hbase Cassandra
  • 13. SWAG
  • 14. Big Data is like water. There is little point in debating how much there is. It’s the flow and use that matters. #gigaomlive @dominiek 3/20/2014 Gigaom Structure Data
  • 15. BUILDINGA BIG DATA WAREHOUSE IN THE CLOUD IN 30 MIN Elliott Cordo Chief Architect, Caserta Concepts
  • 16. What is a Big Data Warehouse?? • An enterprise system providing reliable ah-hoc analytics, reporting, and decision support • Large Scale – Big Data • Not only confined to traditional Dimensional model
  • 17. Big Data Warehouse • Data governance is still important! • Data Quality • Metadata: Naming, Lineage, etc Data cannot be governed until it is structured Big Data Warehouse Data Science Workspace Data Lake – Integrated Sandbox Landing – Source Data in “Full Fidelity”
  • 18. Cloud • Infrastructure is not fun • Months to server procurement • Inability to handle growth • Servers idling all day doing nothing • Cloud to the rescue • Unlimited cheap storage • Provision new servers in minutes • Use of elastic services!  EMR • AWESOME for prototypes and POC’s
  • 19. About our sample data • Consumer Yelp Ratings • Generated based on Kaggle dataset  100 million rows • Model looks something like this: f_reviews d_date d_business d_user
  • 20. So let’s get cooking 1. Create an EMR cluster  On Demand Hadoop 1. Provision a Redshift cluster  Data Warehouse
  • 21. Redshift • Massive Parallel Processing • Columnar DB’s that present themselves as relational • MPP’s grew up in Parallel to Hadoop • Impala, HAWQ are MPP’s themselves! • OEM of Actian Matrix (formerly ParaAccel) • A modern MPP, clean, reliable, SCHEMA AGNOSTIC
  • 22. Redshift is cheap inexpensive? Enterprise grade EDW @ $1000/TB per year
  • 23. MPP Design Considerations • JOINS • Shuffle – data is large and distributed by key to servers • Broadcast – data is small and gets distributed to all servers • Collocated – all data needed for join is on same server • Design Considerations for MPP • Distribution Key • Collocated joins • Even distribution of work across the cluster • Customer will work well • Sort Key • Fastest scan operations • Primary date field is usually best
  • 24. ETL – Transform your data • S3 is the ultimate staging ground • Use EMR for the heavy lifting: • Run your ETL Program and kill it when done! • Pay just for processing. • PIG, native map reduce, streaming • For the right use case HIVE or Impala can be used for ETL too (mainly for aggregates, summaries)
  • 25. Smaller data - don’t need EMR? • Python ETL on EC2 (on Demand) • Can later “graduate” to big data using Hadoop streaming • Your favorite ETL tool is just fine too
  • 26. Presentation Layer – Data Warehouse How do you get your ETL data in? • Hadoop distcp - High performance transfer of data from S3 to HDFS • Distributed COPY from S3 to Redshift
  • 27. And how to orchestrate all of this? • Amazon data pipelines • AWS CLI • Build a driver program using modules like Boto (Python) • Cron or external scheduler
  • 28. Back to AWS 1. Apply Redshift DDL and load tables 1. Run some queries