SlideShare a Scribd company logo
1 of 22
Download to read offline
Hadoop and Microsoft.


Michael Rys | Principal Program Manager   @SQLServerMike
Session Objectives
• What is BigData?
• How it fits into the Windows and Windows Azure environments
• How do I program against it in the Microsoft Environment
What is Big Data?
• Traditionally:
  • Physics Experiments, Sensor data, Satellite data, …
• Now:
  •   Operational Logs
  •   Customer behavior
  •   Social interactions online
  •   …


• From Terabytes in the 1990 over Petabytes today to Zetabytes in the
  future
Big Data.
VOLUME      VARIETY      VELOCITY
 (Size)    (Structure)    (Speed)


          Big Data.
What’s the social sentiment                                     How do I better predict
of my product?                                                  future outcomes?




                                How do I optimize my services
                                based on patterns of weather,
                                traffic, etc.?




                              New Questions.
Hadoop is for Big Data.
What is Hadoop (v1)?
• Processing Platform for Big Data Processing
• Using the “Map-Reduce” Processing Paradigm

• Characteristics:
  • Highly-scalable (scaled out)
  • Commodity HW-based
  • Open Source
    => Very low cost for acquisition and storage costs
Hadoop Data Flow




Data   Hadoop   Analytics
Microsoft's Hadoop Story
Hadoop Capabilities

      Extract Load     Distributed
       Transform        Compute




 Predictive     Machine         Graph
  Analysis      Learning      Processing
HDInsight Ecosystem




                                                       ODBC
                              Distributed Processing
                                   (Map Reduce)

                           Distributed Storage
                                  (HDFS)
World’s Data (Azure Data
                           Windows Azure Storage
Marketplace)
HDInsight

Data      Knowledge   Action
HDFS on Azure: Tale of two File Systems
                               HDFS API

                                            Containers on Azure Blob Storage
              NameNode


                                                              Front end
                                                          Front end
                                                      Front end



  Data Node                                        Partition Layer
                         Data Node
                 …
                                                    Stream Layer


DFS (1 Data Node per Worker Role)         Azure Storage Vault (ASV)
and Compute Cluster
.Net Map/Reduce Support
•   Install NuGet
•   “NuGet” Microsoft .Net MapReduce API for Hadoop
•   Provide an implementation of a HadoopJob
•   Execute the job via either
    • MRLibMRRunner.exe -dll ConsoleAppHadoopJob.exe
       Or
    – HadoopJobExecutor.ExecuteJob<HadoopJobClass>();
• Collect your result on HDFS
Javascript Map/Reduce Support
• Provide a map and reduce function variable in JS file
• Use Javascript console with
  • runJS(‘/user/myself/MRjob.js’, ‘/path/to/inputfile’,
    ‘/path/to/output/dir’)
• Collect your result on HDFS
Invoking HiveQL Queries
•   Run queries in Hadoop Command Shell after invoking hive
•   Through the web console
•   Programmatically through ODBC
•   Coming soon: LINQ to Hive!
Polybase – Enhancing PDW query engine


                     Data Scientists
                        BI Users
                      DB Admins


                                       Regular         Results    Traditional schema-based DW
Social      Sensor                      T-SQL                              applications
Apps        & RFID
Mobile      Web                             Enhanced
 Apps       Apps                         PDW query engine


            Hadoop                                               PDW V2


Unstructured data                                                      Structured data
Microsoft Hadoop Vision
Better on Windows and Azure
  • Active Directory
  • System Center
  • .Net Programmability
Microsoft Data Connectivity
  • SQL Server / SQL Parallel Data Warehouse
  • Azure Storage / Azure Data Market
Microsoft Business Intelligence (BI)
  • Hive ODBC Connectivity
  • BI Tools for Big Data
Collaborate with and Contribute to OSS
  • Collaborate with HortonWorks
  • Provide improvements and Windows support back to OSS
Getting started
• On prem: http://www.microsoft.com/bigdata/
   •   Single node cluster (onebox) install
   •   C:hadoop
   •   Starts local services
   •   Can start/stop them with start-onebox.cmd/stop-onebox.cmd
   •   Comes with:
       • Hadoop command line (shell)
       • Hadoop Status for name node and map-reduce cluster
       • HDInsight Dashboard
• On Windows Azure: http://HadoopOnAzure.com/
   • 3 node cluster running as a service in Azure
   • Can be used for 5 days
   • Provides samples and HDInsight Dashboard
• TAP Program
Related Content and Links
Thank you

More Related Content

What's hot

Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)Michael Rys
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Michael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Jason L Brugger
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)Michael Rys
 
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
24 Hour of PASS: Taking SQL Server into the Beyond Relational RealmMichael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Michael Rys
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101Ike Ellis
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
SQL Server 2012 Beyond Relational Performance and Scale
SQL Server 2012 Beyond Relational Performance and ScaleSQL Server 2012 Beyond Relational Performance and Scale
SQL Server 2012 Beyond Relational Performance and ScaleMichael Rys
 

What's hot (20)

Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
SQL Server 2012 Beyond Relational Performance and Scale
SQL Server 2012 Beyond Relational Performance and ScaleSQL Server 2012 Beyond Relational Performance and Scale
SQL Server 2012 Beyond Relational Performance and Scale
 

Viewers also liked

U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)Michael Rys
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Michael Rys
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningMichael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)Michael Rys
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)Michael Rys
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeBizTalk360
 

Viewers also liked (9)

U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 

Similar to Microsoft's Hadoop Story

Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BIDenny Lee
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 

Similar to Microsoft's Hadoop Story (20)

Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Anju
AnjuAnju
Anju
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 

Recently uploaded

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementNuwan Dias
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5DianaGray10
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API Management
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 

Microsoft's Hadoop Story

  • 1. Hadoop and Microsoft. Michael Rys | Principal Program Manager @SQLServerMike
  • 2. Session Objectives • What is BigData? • How it fits into the Windows and Windows Azure environments • How do I program against it in the Microsoft Environment
  • 3. What is Big Data? • Traditionally: • Physics Experiments, Sensor data, Satellite data, … • Now: • Operational Logs • Customer behavior • Social interactions online • … • From Terabytes in the 1990 over Petabytes today to Zetabytes in the future
  • 5. VOLUME VARIETY VELOCITY (Size) (Structure) (Speed) Big Data.
  • 6. What’s the social sentiment How do I better predict of my product? future outcomes? How do I optimize my services based on patterns of weather, traffic, etc.? New Questions.
  • 7. Hadoop is for Big Data.
  • 8. What is Hadoop (v1)? • Processing Platform for Big Data Processing • Using the “Map-Reduce” Processing Paradigm • Characteristics: • Highly-scalable (scaled out) • Commodity HW-based • Open Source => Very low cost for acquisition and storage costs
  • 9. Hadoop Data Flow Data Hadoop Analytics
  • 11. Hadoop Capabilities Extract Load Distributed Transform Compute Predictive Machine Graph Analysis Learning Processing
  • 12. HDInsight Ecosystem ODBC Distributed Processing (Map Reduce) Distributed Storage (HDFS) World’s Data (Azure Data Windows Azure Storage Marketplace)
  • 13. HDInsight Data Knowledge Action
  • 14. HDFS on Azure: Tale of two File Systems HDFS API Containers on Azure Blob Storage NameNode Front end Front end Front end Data Node Partition Layer Data Node … Stream Layer DFS (1 Data Node per Worker Role) Azure Storage Vault (ASV) and Compute Cluster
  • 15. .Net Map/Reduce Support • Install NuGet • “NuGet” Microsoft .Net MapReduce API for Hadoop • Provide an implementation of a HadoopJob • Execute the job via either • MRLibMRRunner.exe -dll ConsoleAppHadoopJob.exe Or – HadoopJobExecutor.ExecuteJob<HadoopJobClass>(); • Collect your result on HDFS
  • 16. Javascript Map/Reduce Support • Provide a map and reduce function variable in JS file • Use Javascript console with • runJS(‘/user/myself/MRjob.js’, ‘/path/to/inputfile’, ‘/path/to/output/dir’) • Collect your result on HDFS
  • 17. Invoking HiveQL Queries • Run queries in Hadoop Command Shell after invoking hive • Through the web console • Programmatically through ODBC • Coming soon: LINQ to Hive!
  • 18. Polybase – Enhancing PDW query engine Data Scientists BI Users DB Admins Regular Results Traditional schema-based DW Social Sensor T-SQL applications Apps & RFID Mobile Web Enhanced Apps Apps PDW query engine Hadoop PDW V2 Unstructured data Structured data
  • 19. Microsoft Hadoop Vision Better on Windows and Azure • Active Directory • System Center • .Net Programmability Microsoft Data Connectivity • SQL Server / SQL Parallel Data Warehouse • Azure Storage / Azure Data Market Microsoft Business Intelligence (BI) • Hive ODBC Connectivity • BI Tools for Big Data Collaborate with and Contribute to OSS • Collaborate with HortonWorks • Provide improvements and Windows support back to OSS
  • 20. Getting started • On prem: http://www.microsoft.com/bigdata/ • Single node cluster (onebox) install • C:hadoop • Starts local services • Can start/stop them with start-onebox.cmd/stop-onebox.cmd • Comes with: • Hadoop command line (shell) • Hadoop Status for name node and map-reduce cluster • HDInsight Dashboard • On Windows Azure: http://HadoopOnAzure.com/ • 3 node cluster running as a service in Azure • Can be used for 5 days • Provides samples and HDInsight Dashboard • TAP Program

Editor's Notes

  1. Big DataThis is a picture down the center isle of a shipping container from one of Microsoft’s datacenters. We put ~1800 computers inside one of these containers. Some of us had the privilege of working on the data storage and computational platform that powers Bing. We used 22 of these containers, spanning 40,000 machines where we stored over 100PB of data. This was three years ago, and now these servers are almost obsolete.Big Data is in constant motion and growing at an incredible rate,90% of the world’s data generated in just the past two years. That&apos;s remarkable growth. Technology history has taught us that the one with themost data wins. The empires of data like Twitter, Facebook, Yahoo all of whom are able to capitalize on the notion that data equates to power. More and more companies are increasingly utilizing Hadoop to power Big Data analytics and drive revenue and profit.It’s all about your Data.
  2. I’d like to introduce the 3V’s of Big DataIs it big as in Volume? Where your data exceeds limits of physical capabilities of systems today.Is it Velocity? The data is moving at a fast rate and value can decay over time.Is it Variability? of structure from unstructured, semi-structured to highly structured data.Doug Laney http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdfThe answer is it’s all of the above.Finally some refer to the fourth V of Big data as Value; the value of the insight that can be gained from extracting insight form your Big Data sources.
  3. Given all of this data, and the variety of sources there are new questions that we can answer today that weren’t possible just a few years ago.By asking and answering these questions you canreap the benefits of Big Data.Data is everywhere to be mined, but we have what one can call &quot;the pomegranate problem&quot; Imagine all of your data being inside a pomegranate. When you eat a pomegranate it’s a bit difficult getting into all of the little pieces inside the pomegranate out, it&apos;s a bit of work.That’s the process that you need to go through to extract insights out of your data.It’s useful to think of it in this way; where your data is the platform. Not the tooling that surrounds it. It’s all about the data. It’s all about the questions that you ask.
  4. The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way.   Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.  
  5. I’d like to share my experience with an internal Microsoft service; Halo 4. We launched Halo 4 recently; players are playing over 25 million games per day. Each of those games upload many metrics are coming in every minute.Something amazing happened when we moved the Halo4 event stream into Hadoop/Hive, we noticed a change in how we thought about data. We were freed from the constant anxiety of wondering how we were going to handle an ever-increasing amount of data, we shifted from trying to store only what we really needed to storing everything. It’s a digital shoebox of information. Then the questions started to shift from how and what to store; to how to gain breakthrough insights from the data.In a traditional database or data warehouse you have to define the structure of the data, or schema, up front, with Hadoop you define the structure of the data when you use it. It’s a schema on read vs a schema on write.
  6. Manage data of any type or sizeTo gain the full value of Big Data you need a modern data platform that manages data of any type, whether structured or unstructured, and of any size – from gigabytes to petabytes. Your Big Data solution should also manage data at rest or in motion. Leverage the power of HDInsight on Windows Server or as a Windows Azure Service. HDInsight provides simplicity, ease of management, and an open Enterprise-ready Hadoop service that runs on premise or in the cloud.Enrich your data with the worlds dataHDInsights enables you to realize new value in the data you have and can combine these new insights with 3rd party datasets simply and elegantly. The time spent by your data analysts trying to surface the right data and source for your precise needs is costly. By connecting to external data sources you can begin to answer new types of questions and deliver new value in ways that previously were not possible.Gain insight from any dataYou cannot begin to realize the value of Big Data until you can deliver new insights from all types of data- structured, unstructured, previously archived or discarded. The benefits of Big Data are not limited only to business intelligence experts or data scientists. Nearly everyone in your organization can analyze and make more informed decisions with the right tools including Microsoft Office Excel.Key CapabilitiesAny Data, Any Size, AnywhereMicrosoft Big Data offers an integrated platform for managing data of any shape or any size, whether it’s structured data in relational databases, unstructured data with Hadoop, or streaming data.Microsoft Big Data offers an integrated platform for managing data of any shape or any size, whether it’s structured data in relational databases, unstructured data with Hadoop, or streaming data.Enterprise-ready HadoopSeamlessly extend access privileges across HDInsight with Active Directory.Manage your HDInsight clusters easily with System Center 2012.Enjoy the reliability and high availability of 100% Apache Hadoop compatible HDInsight.Gain Windows Simplicity and Manageability for HadoopSimplicity on premise with a virtualized deployment model.Consistent platform on Windows or on Windows Azure with shared codebase.Deploy Hadoop easily thanks to smart packaging and Cloud optimization from Microsoft.Scale on Demand in the CloudBenefit from deployment options for Big Data on both Windows Server and Windows Azure.Enjoy elastic scalability in the cloud.Gain better control of your data and costs.Open Big Data PlatformGain from the strategic Microsoft and Hortonworks partnership. Leverage the benefits of Microsoft HDInsight that offers 100% compatibility with Apache Hadoop.Enterprise-ready HadoopSeamlessly extend access privileges across HDInsight with Active Directory.Manage your HDInsight clusters easily with System Center 2012.Enjoy the reliability and high availability of 100% Apache Hadoop compatible HDInsight.Gain Windows Simplicity and Manageability for HadoopSimplicity on premise with a virtualized deployment model.Consistent platform on Windows or on Windows Azure with shared codebase.Deploy Hadoop easily thanks to smart packaging and Cloud optimization from Microsoft.Scale on Demand in the CloudBenefit from deployment options for Big Data on both Windows Server and Windows Azure.Enjoy elastic scalability in the cloud.Gain better control of your data and costs.Open Big Data PlatformGain from the strategic Microsoft and Hortonworks partnership. Leverage the benefits of Microsoft HDInsight that offers 100% compatibility with Apache Hadoop.Connecting with the World’s DataMicrosoft offers unparalleled opportunities for discovery and enrichment by enabling end users to connect to the world’s data and services.Microsoft offers unparalleled opportunities for discovery and enrichment by enabling end users to connect to the world’s data and services.Connect Hadoop to the World via Windows Azure Marketplace Access a wide variety of data from reliable providers such as the U.S. Census Bureau, United Nations, Dunn and Bradstreet, to name a few.Take advantage of hundreds of applications built on the Windows Azure platform.Integrate smart data mining algorithms, such as Microsoft Translator, which uses machine learning for automated text translation.Enrich Your Data with External Information ServicesConvert raw data into useful information through data transformation and advanced analytics, and mashups with external data.Utilize out-of-the-box tools, such as SQL Server Integration Services (SSIS) and Data Quality Services for data transformation and cleansing.Enrich your raw data using smart analytical algorithms. (For instance, you can use a segmentation model to enhance targeting.)Access Predictive Analytics on HadoopGain new insights through predictive analytics, the process of inferring relationships and predictions from huge quantities of data. Unlock new insights from all of your data using smart data-mining tools in SQL Server Analysis Services.Simplifies the data mining process using the Data Mining Add-in for Excel.Integrate a range of data mining tools from the Open Source Community, such as Mahout and R. Connect Hadoop to the World via Windows Azure Marketplace Access a wide variety of data from reliable providers such as the U.S. Census Bureau, United Nations, Dunn and Bradstreet, to name a few.Take advantage of hundreds of applications built on the Windows Azure platform.Integrate smart data mining algorithms, such as Microsoft Translator, which uses machine learning for automated text translation.Enrich Your Data with External Information ServicesConvert raw data into useful information through data transformation and advanced analytics, and mashups with external data.Utilize out-of-the-box tools, such as SQL Server Integration Services (SSIS) and Data Quality Services for data transformation and cleansing.Enrich your raw data using smart analytical algorithms. (For instance, you can use a segmentation model to enhance targeting.)Access Predictive Analytics on HadoopGain new insights through predictive analytics, the process of inferring relationships and predictions from huge quantities of data. Unlock new insights from all of your data using smart data-mining tools in SQL Server Analysis Services.Simplifies the data mining process using the Data Mining Add-in for Excel.Integrate a range of data mining tools from the Open Source Community, such as Mahout and R. Immersive Insights, Wherever You AreMicrosoft Big Data empowers end users to gain insights from any data, whether structured or unstructured, with the familiar tools they use every day. Developers can build Big Data applications with tools for simplified Hadoop programming.
  7. I see the real breakthrough insights coming through when you take what is the traditional &quot;Business Intelligence&quot; and add more capabilities like machine learning, predictive analysis, statistical analysis, large scale graph processing, pattern mining, trend analysis, economic modeling. All of which today are a reality in Hadoop. The implications of this are quite astounding when you think about it. This is huge.
  8. Big Data; in terms of data volume, variability and velocity at scale are is the first problem. But the Big Data solutions and technology by themselves don&apos;t lead to solving business objectives. We don&apos;t have a Hadoop problem they have analytics, pattern mining, trend analysis, statistical inferenceing, economic modeling, market regression level problems.Data science starts where the utility class services like Big Data Hadoop end. The real opportunity is to expose data science to everyone.As powerful as Hadoop is, today it’s still more of a computer scientist’s or academically-trained analyst’s tool than it is an enterprise analytics product. Hadoop itself is controlled through programming code rather than anything that looks like it was designed for business unit personnel. Hadoop data is often more “raw” and “wild” than data typically fed to data warehouse and OLAP (Online Analytical Processing) systems. This is where I and Microsoft see opportunity.  Essentially; wouldn&apos;t it be cool if mere mortals could use this stuff and consume insights that are directly coming from Hadoop? Microsoft HDInsight enables you to gain insight from virtually any data, connect with the world of data, improve decision making, and enhance the development of the next generation of products and services.Nearly everyone in your organization can analyze and make more informed decisions with the right tools.PowerPivot for Microsoft Excel and Power View for SharePoint give nearly all users a view into structured and unstructured data.With the Hive Add-in for Excel and Hive ODBC Driver, almost anyone in your organization can directly access Hadoop datafrom end-user tools.Hadoop simplifies programming for developers with JavaScript for MapReduce jobs. The JavaScriptimplementation can also reduce your code by up to 10 times compared to Java. 
  9. Front End: Security/Auth and scaled out request handlerPartition Layer: Object Layer, Mapping of objects such as Tables, Blobs, Queues to streams (cached in Front End), CCStream Layer: 3-Node HA, Scale-out stream store