SlideShare a Scribd company logo
1 of 51
Download to read offline
BIG DATA
BIG
ANALYTICS
A OHRI
Pre- Agenda
-Presenter Introduction
-Audience Introduction
-Expectations
--------------------------------------------
Presenter Introduction
www.linkedin.com/in/ajayohri
Working with Analytics since 2004
Educated at IIM Lucknow, DCE, U Tenn
Author (R for Business Analytics (Springer))
Blogger at www.decisionstats.com


Interviewed 100+ Analytics leaders
Audience Introduction

● Affiliation-Academic/ Govt/Private
● Years of working with Big Data-
● Specific Interest Area in Analytics-
Great Expectations
From You
1.No mobile rings , no sleeping (discreet sleeping),
2.Please take notes using pencil,parchment, paper,pen,
computer,tablet,stylus,mobile etc,
3.Please ask Questions in the END(from notes taken at
Step 2)
From Me
1 Breadth of Case Studies (!)
2 Open Source focus (R mostly, clojure, python)
3 Actionable Ideas are useful !
i.e I spent 3 hours in X talk but I did learn to do Y, or I am now interested in trying out Z
Agenda
-Presenter Introduction
-Audience Identification
-Expectations

--------------------------------------------
-Big Data
-Big Data Analytics using R
        -Case Study 1(Amazon AWS,SAP Hana
DB)
-Big Data Analytics using other tools
       -Case Study 2 (BigML.com, Picloud.com)
--------------------------------------------
Big Data
What is Big Data?
"Big data" is a term applied to data sets whose size is beyond the ability of
commonly used software tools to capture, manage, and process the data within
a tolerable elapsed time. Examples include web logs, RFID, sensor networks,
social networks, social data (due to the social data revolution), Internet text and
documents, Internet search indexing, call detail records, astronomy,
atmospheric science, genomics, biogeochemical, biological, and other complex
and often interdisciplinary scientific research, military surveillance, medical
records, photography archives, video archives, and large-scale e-commerce.

IBM-     http://www-01.ibm.com/software/data/bigdata/

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the
data in the world today has been created in the last two years alone. This data
comes from everywhere: sensors used to gather climate information, posts to
social media sites, digital pictures and videos, purchase transaction records,
and cell phone GPS signals to name a few. This data is big data.
Big Data
What is Big Data?



Big Data Conferences
--O'Reilly's Strata
--Hadoop World
--Many many conferences......including ours
Thought for Today
In 2012 , data that is classified as Big Data will
be classified as Little Data by 2018

True ----------False
?
What is Cloud Computing?
Cloud computing is a model for enabling ubiquitous,
convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly
provisioned and released with minimal management effort
or service provider interaction. This cloud model is
composed of five essential characteristics, three service
models, and four deployment models.
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

 National Institute of Standards and Technology
--
Cloud Computing and
Big Data Analytics
Cost of computing Big Data would be too much,
but for cloud computing.

Cloud runs on X OS predominantly, and needs
customized solutions as of 2012

Open source solutions (OS- Analytics) are
more easily customized
Sources of Big Data
--Internet
------Server Logs,Clickstream,Analytics

--Social Media

--Governments and UN bodies

--Internal Data from customers
Storing Big Data for R
--Lots of RAM (?!)
--RDBMS
--Documents  (Couch DB ,MongoDB)


--HDFS (Hadoop)
Storing Big Data for R
--Documents      (Couch DB ,MongoDB)

Package RMongo provides an R interface to a Java client
for `MongoDB' (http://en.wikipedia.org/wiki/MongoDB)
databases, which are queried using JavaScript rather than
SQL. Package rmongodb is another client using
mongodb's C driver.
https://github.com/wactbprot/R4CouchDB
R talking to CouchDB using Couch's ReSTful HTTP API.
construct HTTP calls with RCurl, then move on to the
R4CouchDB package for a higher level interface.
http://digitheadslabnotebook.blogspot.in/2010/10/couchdb-
and-r.html
Big Data Packages in R- 1/2
http://cran.r-project.org/web/views/HighPerformanceComputing.html

●   The biglm package by Lumley uses incremental computations to offers lm()
    and glm() functionality to data sets stored outside of R's main memory.
●   The ff package by Adler et al. offers file-based access to data sets that are
    too large to be loaded into memory, along with a number of higher-level
    functions.
●   The bigmemory package by Kane and Emerson permits storing large
    objects such as matrices in memory (as well as via files) and uses external
    pointer objects to refer to them. This permits transparent access from R
    without bumping against R's internal memory limits. Several R processes
    on the same computer can also shared big memory objects.
●    The HadoopStreaming Provides a framework for writing map/reduce scripts for use in Hadoop Streaming. Also facilitates
    operating on data in a streaming fashion, without Hadoo
Big Data Packages in R -2/2
●   http://cran.r-project.org/web/packages/biganalytics/

This package extends the bigmemory package with various analytics.
Functions bigkmeans and binit may also be used with native R objects
●   http://cran.r-project.org/web/packages/bigtabulate/index.html
This package extends the bigmemory package with table- and split-like support
for big.matrix objects. The functions may also be used with regular R matrices
for improving speed and memory-efficiency.
●   http://cran.at.r-project.org/web/packages/synchronicity/index.html
.For mutex (locking) support for advanced shared-memory usage, see
synchronicity.
https://r-forge.r-project.org/R/?group_id=556 lists more projects. For linear
algebra support, see bigalgebra.
Big Data and Revolution
Analytics
Primary -RevoScaleR package /XDF format

Also sponsored RHadoop
https://github.com/RevolutionAnalytics/RHadoop
RHadoop -rhdfs package
rhdfs-

https://github.com/decisionstats/RHadoop/wiki/rhdfs
Overview
This R package provides basic connectivity to the Hadoop Distributed File System. R programmers can browse, read, write, and
modify files stored in HDFS. The following functions are part of this package
   ●    File Manipulations
   ●    hdfs.copy, hdfs.move, hdfs.rename, hdfs.delete, hdfs.rm, hdfs.del, hdfs.chown, hdfs.put, hdfs.get
   ●    File Read/Write
   ●    hdfs.file, hdfs.write, hdfs.close, hdfs.flush, hdfs.read, hdfs.seek, hdfs.tell, hdfs.line.reader, hdfs.read.text.file
   ●    Directory
   ●    hdfs.dircreate, hdfs.mkdir
   ●    Utility
   ●    hdfs.ls, hdfs.list.files, hdfs.file.info, hdfs.exists
   ●    Initialization
   ●    hdfs.init, hdfs.defaults
http://hadoop.apache.org/hdfs/
Hadoop Distributed File System (HDFS™) is the primary storage system used by Hadoop applications. HDFS creates multiple
replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid
computations
RHadoop -rhbase package
rhbase-

https://github.com/decisionstats/RHadoop/wiki/rhbase
Overview
This R package provides basic connectivity to HBASE, using the Thrift server. R programmers can browse, read, write, and modify
tables stored in HBASE. The following functions are part of this package
   ●    Table Maninpulation
   ●    hb.new.table, hb.delete.table, hb.describe.table, hb.set.table.mode, hb.regions.table
   ●    Read/Write
   ●    hb.insert, hb.get, hb.delete, hb.insert.data.frame, hb.get.data.frame, hb.scan
   ●    Utility
   ●    hb.list.tables
   ●    Initialization
   ●    hb.defaults, hb.init
http://hbase.apache.org/

HBase is the Hadoop database. Think of it as a distributed, scalable, big data store.
RHadoop -rmr package
rmr-
Overview

This R package allows an R programmer to perform
statistical analysis via MapReduce on a Hadoop cluster.

● Average flight delay (Orbitz): original and updated
  version with presentation
● Network analysis: original and a summary
Also see       https://github.com/decisionstats/RHadoop/wiki/Tutorial


for   logistic regression and k-means
Big Data Social Network
 Analysis
Analyzing A Big Social Network using R and
distributed graph engines
http://thinkaurelius.com/2012/02/05/graph-degree-distributions-using-r-over-
hadoop/
Big Data Social Media
Analysis
Can be used for Customers (                                                       and also for latent influencers   )-   http://www.r-
bloggers.com/an-example-of-social-network-analysis-with-r-using-package-igraph/
Big Data Social Media
Analysis
R package twitteR                       can be
                                http://cran.r-project.org/web/packages/twitteR/index.html


used for prototyping but Twitter's API is rate
limited to 1500 per hour(?)/day, so we can use
Datasift APIhttp://datasift.com/pricing#costs
Big Data Social Media
Analysis
 How does information propagate through a
social network?
http://www.r-bloggers.com/information-transmission-in-a-social-network-dissecting-the-spread-of-a-quora-post/
Big Data Social Network
Analysis
Can be used for Terrorists (                             and also for potential protestors   )-
Drew Conway             http://riskecon.com/wp-content/uploads/2012/02/Conway-Socio_Terrorism.pdf

Primary focus is one three aspects of network analysis
1. Identifying leadership and key actors
2. Revealing underlying structure and intra-network community structure
3. Evolution and decay of social networks
Big Data and Revolution
Analytics
Primary -RevoScaleR package /XDF format
Also sponsored RHadoop

● For a case study, UpStream software ( slide 16):
http://www.revolutionanalytics.com/news-events/free-webinars/2012/how-big-data-is-changing-retail-marketing-analytics/

● Big data GLMs (you might find the chart on this page
  useful):
http://blog.revolutionanalytics.com/2012/06/big-data-generalized-linear-models-with-revolution-r-enterprise.html

● Data distillation with Hadoop and R:
http://blog.revolutionanalytics.com/2012/06/data-distillation-with-hadoop-and-r.html

● Analysis of the million row movie data set (building
  recommendation engines):
http://blog.revolutionanalytics.com/2012/04/simple-tools-for-building-a-recommendation-engine.html
Big Data and Revolution
Analytics
marketing analytics company UpStream Software, used map-reduce to convert transactions from Omniture logs (web visits,
emails clicked on, ads displayed) into customer behaviors: response to an offer, research into a product, purchases.
More R and Hadoop Case
Studies
few examples where R and Hadoop are used for data distillation:
 ● Using robust regression on a series of raw voice-over-IP packets to
    calculate how long participants talk during a phone conversation.
 ● Using graph theory (and R's igraph package) to quantify the number of
    close friends of members of a social network.
 ● Orbitz uses R and Hadoop to extract flights and hotels that will be
    presented during a travel search, based on previous transaction.
 ● Using k-means clustering to extract similar "groups" of transactions, which
    are then aggregated and used as the record level for structured analysis
Using RDBMS (Big?) Data
through R
--RDBMS                                       -RODBC
Package
http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases
http://cran.r-project.org/web/packages/RMySQL/index.html RMySQL
http://cran.r-project.org/web/packages/ROracle/index.html ROracle
http://cran.r-project.org/web/packages/RPostgreSQL/index.html RPostgresSQL
http://cran.r-project.org/web/packages/DBI/index.html
http://cran.r-project.org/web/packages/RSQLite/index.html RSQLite
Using RDBMS (is it Big
Data?)
through R
--RDBMS                                      -RODBC
Package
http://cran.r-project.org/web/packages/RODBC/RODBC.pdf
> library(RODBC)
> odbcDataSources(type = c("all", "user", "system"))
          SQLServer          PostgreSQL30        PostgreSQL35W
          "SQL Server""PostgreSQL ANSI(x64)" "PostgreSQL Unicode(x64)"
               MySQL
 "MySQL ODBC 5.1 Driver"
Querying Big Data
--RDBMS-SQL

--Hadoop-Pig (but many ways)
Big Data Analytics
- Challenges

---Traditional statistics theory grew up when data was
constrained

--Traditional analytics programming was NOT parallel
processing

--Shortage of trained people
Big Data Analytics
- Solutions

---Teaching more parallel programming and algorithms

--More focus on data reduction techniques like clustering ,
segmentation than on hypothesis testing. Sampling,
anyone?

--Training more data scientists
Big Data Analytics
- Tools used
-Why R

-High Performance Computing

http://cran.r-project.org/web/views/HighPerformanceComputing.html


-Big Data Within R
http://www.slideshare.net/bytemining/r-hpc
Using R (interfaces)
--Using R Studio for easier development


--Using Rattle GUI for straight off the shelf data
mining and Using R Commander for Extensions

--Using Revolution Analytics RPE
-----Example of Snippets
Using R
--Using R for text mining
---Text Mining from Twitter Case Study
---Datasift Export to Amazon S3




--Using R for geo-coded analysis
---Hana DB

--Using R for Graphical Analysis of Big Data
TablePlot
3D using R Commander

--Using R for forecasting
Using Plugin R Commander E -Pack
Existing Big Data Case
Studies
Departure of Aeroplanes-SAP Hana 200m
http://allthingsr.blogspot.in/#!/2012/04/big-data-r-and-hana-analyze-200-million.html




R using SAP Hana

http://www.decisionstats.com/interview-blag-sap-labs-montreal-using-sap-hana-with-rstats/
SAP Hana DB uses R
http://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
Oracle R Enterprise
Case Studies and Examples
http://www.oracle.com/technetwork/database/options/advanced-analytics/r-
enterprise/index.html
Oracle R Enterprise
Case Studies and Examples
http://www.oracle.com/technetwork/database/options/advanced-analytics/r-
enterprise/index.html
Revolution Analytics
RevoScaleR package
he RevoScaleR package to extract time series data from time-stamped logs (in
this case, the "US Domestic Flights From 1990 to 2009" dataset on
Infochimps):
Analyzing time series data of all sorts is a fundamental business analytics task
to which the R language is beautifully suited. In addition to the time series
functions built into base stats library there are dozens of R packages devoted to
time series...
We have shown how data manipulation functions of the RevoScaleR package
to extract time stamped data from a large data file, aggregate it, and form it into
monthly time series that can easily be analyzed with standard R functions.



http://www.inside-r.org/howto/extracting-time-series-large-data-sets

http://blog.revolutionanalytics.com/2011/09/how-to-extract-time-series-from-
large-timestamped-logs-with-r.html
Using R on Amazon -Case
Study
--Bioconductor in the Cloud

--Custom Amazon Instance




--Concerns for non- American users of Amazon
Using BigML on cloud
Case Study
Classification using Clojure on Cloud
https://bigml.com/gallery/models/fraud_and_crime




--Concerns on depending on third party tools
--Example Cloudnumbers.com
Using Google APIs
https://code.google.com/apis/console/?pli=1



Google Storage API

Google Predictive Analysis API

Introduction to other APIS

----Concerns to users of Google APIs
Using Google APIs case
study
Google Storage API
Google Predictive Analysis API
http://code.google.com/p/google-prediction-api-r-client/
Using Google APIs case
study
Introduction to other Big Data Google APIS

----Concerns to users of Google APIs
Using Python- PiCloud
com/
                        http://www.picloud.
Privacy hazards of big data
analytics.
Big Brother -1984 --- 2012

They know where you are (mobiles)
They know what you are looking for (internet)
They know your past (financial history +social media)
They can use your medical history
Laws authorize them (Patriot Act?)

--example Emotional Analysis of Images http:
//www.affectiva.com/
References and
Acknowledgements
David Smith,      Revolution Analytics
David Champagne, Revolution Analytics
All R Bloggers,Developers, Packagers
Blag - SAP Hana Analytics
Charlie Berger -and Oracle R Team
Jim Kobielus -IBM Big Data Team
R Development Core Team (2012). R: A language and environment for
statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
Thanks
Book- R for Business
Analytics
http://www.springer.com/statistics/book/978-1-4614-4342-1

More Related Content

What's hot

Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Data analytics using the cloud challenges and opportunities for india
Data analytics using the cloud   challenges and opportunities for india Data analytics using the cloud   challenges and opportunities for india
Data analytics using the cloud challenges and opportunities for india Ajay Ohri
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworksAmal Targhi
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012Gigaom
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - IntroductionAlex Meadows
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overviewDorai Thodla
 

What's hot (20)

Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Data analytics using the cloud challenges and opportunities for india
Data analytics using the cloud   challenges and opportunities for india Data analytics using the cloud   challenges and opportunities for india
Data analytics using the cloud challenges and opportunities for india
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
 

Viewers also liked

ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...eswcsummerschool
 
A technical Introduction to Big Data Analytics
A technical Introduction to Big Data AnalyticsA technical Introduction to Big Data Analytics
A technical Introduction to Big Data AnalyticsPethuru Raj PhD
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Open source analytics
Open source analyticsOpen source analytics
Open source analyticsAjay Ohri
 
Rd big data & analytics v1.0
Rd big data & analytics v1.0Rd big data & analytics v1.0
Rd big data & analytics v1.0Yadu Balehosur
 
Benefícios e desafios que Big Data & Analytics traz para as empresas na jorna...
Benefícios e desafios que Big Data & Analytics traz para as empresas na jorna...Benefícios e desafios que Big Data & Analytics traz para as empresas na jorna...
Benefícios e desafios que Big Data & Analytics traz para as empresas na jorna...Flávio Secchieri Mariotti
 
Vertical vs Horizontal Scaling
Vertical vs Horizontal Scaling Vertical vs Horizontal Scaling
Vertical vs Horizontal Scaling Mark Myers
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Klarity - Asia digital analytic summit
Klarity -  Asia digital analytic summitKlarity -  Asia digital analytic summit
Klarity - Asia digital analytic summitNDN Group
 
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...Santiago Castelo
 
Big Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in EmergenciesBig Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in EmergenciesThomas Dybro Lundorf
 
Big Data and Social Media
Big Data and Social MediaBig Data and Social Media
Big Data and Social MediaAmy Shuen
 
Big Data Social Media & Smart Apps
Big Data Social Media & Smart AppsBig Data Social Media & Smart Apps
Big Data Social Media & Smart AppsGiacomo Nasilli
 
Product Placement: The Present & The Future
Product Placement: The Present & The FutureProduct Placement: The Present & The Future
Product Placement: The Present & The Futureitandlaw
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social MediaGerald Hensel
 
Big Data und Social Media
Big Data und Social MediaBig Data und Social Media
Big Data und Social MediaLukas Ott
 

Viewers also liked (20)

ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
 
A technical Introduction to Big Data Analytics
A technical Introduction to Big Data AnalyticsA technical Introduction to Big Data Analytics
A technical Introduction to Big Data Analytics
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
DevOps - Motivadores e Benefícios
DevOps - Motivadores e BenefíciosDevOps - Motivadores e Benefícios
DevOps - Motivadores e Benefícios
 
Rd big data & analytics v1.0
Rd big data & analytics v1.0Rd big data & analytics v1.0
Rd big data & analytics v1.0
 
Benefícios e desafios que Big Data & Analytics traz para as empresas na jorna...
Benefícios e desafios que Big Data & Analytics traz para as empresas na jorna...Benefícios e desafios que Big Data & Analytics traz para as empresas na jorna...
Benefícios e desafios que Big Data & Analytics traz para as empresas na jorna...
 
Vertical vs Horizontal Scaling
Vertical vs Horizontal Scaling Vertical vs Horizontal Scaling
Vertical vs Horizontal Scaling
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Klarity - Asia digital analytic summit
Klarity -  Asia digital analytic summitKlarity -  Asia digital analytic summit
Klarity - Asia digital analytic summit
 
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
 
Big Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in EmergenciesBig Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in Emergencies
 
Social media & big data
Social media & big dataSocial media & big data
Social media & big data
 
Big Data and Social Media
Big Data and Social MediaBig Data and Social Media
Big Data and Social Media
 
Big Data Social Media & Smart Apps
Big Data Social Media & Smart AppsBig Data Social Media & Smart Apps
Big Data Social Media & Smart Apps
 
Product Placement: The Present & The Future
Product Placement: The Present & The FutureProduct Placement: The Present & The Future
Product Placement: The Present & The Future
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social Media
 
Big Data und Social Media
Big Data und Social MediaBig Data und Social Media
Big Data und Social Media
 

Similar to Big data Big Analytics

IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Rajat Mittal
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...IRJET Journal
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCENETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCEcsandit
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET Journal
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopIRJET Journal
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET Journal
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET Journal
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with RTechsparks
 
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCENETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCEcscpconf
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective Viewijtsrd
 

Similar to Big data Big Analytics (20)

IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data
 
R_L1-Aug-2022.pptx
R_L1-Aug-2022.pptxR_L1-Aug-2022.pptx
R_L1-Aug-2022.pptx
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCENETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCENETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective View
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Big Data
Big DataBig Data
Big Data
 

More from Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanishAjay Ohri
 

More from Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanish
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Big data Big Analytics

  • 2. Pre- Agenda -Presenter Introduction -Audience Introduction -Expectations --------------------------------------------
  • 3. Presenter Introduction www.linkedin.com/in/ajayohri Working with Analytics since 2004 Educated at IIM Lucknow, DCE, U Tenn Author (R for Business Analytics (Springer)) Blogger at www.decisionstats.com Interviewed 100+ Analytics leaders
  • 4. Audience Introduction ● Affiliation-Academic/ Govt/Private ● Years of working with Big Data- ● Specific Interest Area in Analytics-
  • 5. Great Expectations From You 1.No mobile rings , no sleeping (discreet sleeping), 2.Please take notes using pencil,parchment, paper,pen, computer,tablet,stylus,mobile etc, 3.Please ask Questions in the END(from notes taken at Step 2) From Me 1 Breadth of Case Studies (!) 2 Open Source focus (R mostly, clojure, python) 3 Actionable Ideas are useful ! i.e I spent 3 hours in X talk but I did learn to do Y, or I am now interested in trying out Z
  • 6. Agenda -Presenter Introduction -Audience Identification -Expectations -------------------------------------------- -Big Data -Big Data Analytics using R -Case Study 1(Amazon AWS,SAP Hana DB) -Big Data Analytics using other tools -Case Study 2 (BigML.com, Picloud.com) --------------------------------------------
  • 7. Big Data What is Big Data? "Big data" is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Examples include web logs, RFID, sensor networks, social networks, social data (due to the social data revolution), Internet text and documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, medical records, photography archives, video archives, and large-scale e-commerce. IBM- http://www-01.ibm.com/software/data/bigdata/ Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.
  • 8. Big Data What is Big Data? Big Data Conferences --O'Reilly's Strata --Hadoop World --Many many conferences......including ours
  • 9. Thought for Today In 2012 , data that is classified as Big Data will be classified as Little Data by 2018 True ----------False ?
  • 10. What is Cloud Computing? Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf National Institute of Standards and Technology --
  • 11. Cloud Computing and Big Data Analytics Cost of computing Big Data would be too much, but for cloud computing. Cloud runs on X OS predominantly, and needs customized solutions as of 2012 Open source solutions (OS- Analytics) are more easily customized
  • 12. Sources of Big Data --Internet ------Server Logs,Clickstream,Analytics --Social Media --Governments and UN bodies --Internal Data from customers
  • 13. Storing Big Data for R --Lots of RAM (?!) --RDBMS --Documents (Couch DB ,MongoDB) --HDFS (Hadoop)
  • 14. Storing Big Data for R --Documents (Couch DB ,MongoDB) Package RMongo provides an R interface to a Java client for `MongoDB' (http://en.wikipedia.org/wiki/MongoDB) databases, which are queried using JavaScript rather than SQL. Package rmongodb is another client using mongodb's C driver. https://github.com/wactbprot/R4CouchDB R talking to CouchDB using Couch's ReSTful HTTP API. construct HTTP calls with RCurl, then move on to the R4CouchDB package for a higher level interface. http://digitheadslabnotebook.blogspot.in/2010/10/couchdb- and-r.html
  • 15. Big Data Packages in R- 1/2 http://cran.r-project.org/web/views/HighPerformanceComputing.html ● The biglm package by Lumley uses incremental computations to offers lm() and glm() functionality to data sets stored outside of R's main memory. ● The ff package by Adler et al. offers file-based access to data sets that are too large to be loaded into memory, along with a number of higher-level functions. ● The bigmemory package by Kane and Emerson permits storing large objects such as matrices in memory (as well as via files) and uses external pointer objects to refer to them. This permits transparent access from R without bumping against R's internal memory limits. Several R processes on the same computer can also shared big memory objects. ● The HadoopStreaming Provides a framework for writing map/reduce scripts for use in Hadoop Streaming. Also facilitates operating on data in a streaming fashion, without Hadoo
  • 16. Big Data Packages in R -2/2 ● http://cran.r-project.org/web/packages/biganalytics/ This package extends the bigmemory package with various analytics. Functions bigkmeans and binit may also be used with native R objects ● http://cran.r-project.org/web/packages/bigtabulate/index.html This package extends the bigmemory package with table- and split-like support for big.matrix objects. The functions may also be used with regular R matrices for improving speed and memory-efficiency. ● http://cran.at.r-project.org/web/packages/synchronicity/index.html .For mutex (locking) support for advanced shared-memory usage, see synchronicity. https://r-forge.r-project.org/R/?group_id=556 lists more projects. For linear algebra support, see bigalgebra.
  • 17. Big Data and Revolution Analytics Primary -RevoScaleR package /XDF format Also sponsored RHadoop https://github.com/RevolutionAnalytics/RHadoop
  • 18. RHadoop -rhdfs package rhdfs- https://github.com/decisionstats/RHadoop/wiki/rhdfs Overview This R package provides basic connectivity to the Hadoop Distributed File System. R programmers can browse, read, write, and modify files stored in HDFS. The following functions are part of this package ● File Manipulations ● hdfs.copy, hdfs.move, hdfs.rename, hdfs.delete, hdfs.rm, hdfs.del, hdfs.chown, hdfs.put, hdfs.get ● File Read/Write ● hdfs.file, hdfs.write, hdfs.close, hdfs.flush, hdfs.read, hdfs.seek, hdfs.tell, hdfs.line.reader, hdfs.read.text.file ● Directory ● hdfs.dircreate, hdfs.mkdir ● Utility ● hdfs.ls, hdfs.list.files, hdfs.file.info, hdfs.exists ● Initialization ● hdfs.init, hdfs.defaults http://hadoop.apache.org/hdfs/ Hadoop Distributed File System (HDFS™) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations
  • 19. RHadoop -rhbase package rhbase- https://github.com/decisionstats/RHadoop/wiki/rhbase Overview This R package provides basic connectivity to HBASE, using the Thrift server. R programmers can browse, read, write, and modify tables stored in HBASE. The following functions are part of this package ● Table Maninpulation ● hb.new.table, hb.delete.table, hb.describe.table, hb.set.table.mode, hb.regions.table ● Read/Write ● hb.insert, hb.get, hb.delete, hb.insert.data.frame, hb.get.data.frame, hb.scan ● Utility ● hb.list.tables ● Initialization ● hb.defaults, hb.init http://hbase.apache.org/ HBase is the Hadoop database. Think of it as a distributed, scalable, big data store.
  • 20. RHadoop -rmr package rmr- Overview This R package allows an R programmer to perform statistical analysis via MapReduce on a Hadoop cluster. ● Average flight delay (Orbitz): original and updated version with presentation ● Network analysis: original and a summary Also see https://github.com/decisionstats/RHadoop/wiki/Tutorial for logistic regression and k-means
  • 21. Big Data Social Network Analysis Analyzing A Big Social Network using R and distributed graph engines http://thinkaurelius.com/2012/02/05/graph-degree-distributions-using-r-over- hadoop/
  • 22. Big Data Social Media Analysis Can be used for Customers ( and also for latent influencers )- http://www.r- bloggers.com/an-example-of-social-network-analysis-with-r-using-package-igraph/
  • 23. Big Data Social Media Analysis R package twitteR can be http://cran.r-project.org/web/packages/twitteR/index.html used for prototyping but Twitter's API is rate limited to 1500 per hour(?)/day, so we can use Datasift APIhttp://datasift.com/pricing#costs
  • 24. Big Data Social Media Analysis How does information propagate through a social network? http://www.r-bloggers.com/information-transmission-in-a-social-network-dissecting-the-spread-of-a-quora-post/
  • 25. Big Data Social Network Analysis Can be used for Terrorists ( and also for potential protestors )- Drew Conway http://riskecon.com/wp-content/uploads/2012/02/Conway-Socio_Terrorism.pdf Primary focus is one three aspects of network analysis 1. Identifying leadership and key actors 2. Revealing underlying structure and intra-network community structure 3. Evolution and decay of social networks
  • 26. Big Data and Revolution Analytics Primary -RevoScaleR package /XDF format Also sponsored RHadoop ● For a case study, UpStream software ( slide 16): http://www.revolutionanalytics.com/news-events/free-webinars/2012/how-big-data-is-changing-retail-marketing-analytics/ ● Big data GLMs (you might find the chart on this page useful): http://blog.revolutionanalytics.com/2012/06/big-data-generalized-linear-models-with-revolution-r-enterprise.html ● Data distillation with Hadoop and R: http://blog.revolutionanalytics.com/2012/06/data-distillation-with-hadoop-and-r.html ● Analysis of the million row movie data set (building recommendation engines): http://blog.revolutionanalytics.com/2012/04/simple-tools-for-building-a-recommendation-engine.html
  • 27. Big Data and Revolution Analytics marketing analytics company UpStream Software, used map-reduce to convert transactions from Omniture logs (web visits, emails clicked on, ads displayed) into customer behaviors: response to an offer, research into a product, purchases.
  • 28. More R and Hadoop Case Studies few examples where R and Hadoop are used for data distillation: ● Using robust regression on a series of raw voice-over-IP packets to calculate how long participants talk during a phone conversation. ● Using graph theory (and R's igraph package) to quantify the number of close friends of members of a social network. ● Orbitz uses R and Hadoop to extract flights and hotels that will be presented during a travel search, based on previous transaction. ● Using k-means clustering to extract similar "groups" of transactions, which are then aggregated and used as the record level for structured analysis
  • 29. Using RDBMS (Big?) Data through R --RDBMS -RODBC Package http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases http://cran.r-project.org/web/packages/RMySQL/index.html RMySQL http://cran.r-project.org/web/packages/ROracle/index.html ROracle http://cran.r-project.org/web/packages/RPostgreSQL/index.html RPostgresSQL http://cran.r-project.org/web/packages/DBI/index.html http://cran.r-project.org/web/packages/RSQLite/index.html RSQLite
  • 30. Using RDBMS (is it Big Data?) through R --RDBMS -RODBC Package http://cran.r-project.org/web/packages/RODBC/RODBC.pdf > library(RODBC) > odbcDataSources(type = c("all", "user", "system")) SQLServer PostgreSQL30 PostgreSQL35W "SQL Server""PostgreSQL ANSI(x64)" "PostgreSQL Unicode(x64)" MySQL "MySQL ODBC 5.1 Driver"
  • 32. Big Data Analytics - Challenges ---Traditional statistics theory grew up when data was constrained --Traditional analytics programming was NOT parallel processing --Shortage of trained people
  • 33. Big Data Analytics - Solutions ---Teaching more parallel programming and algorithms --More focus on data reduction techniques like clustering , segmentation than on hypothesis testing. Sampling, anyone? --Training more data scientists
  • 34. Big Data Analytics - Tools used -Why R -High Performance Computing http://cran.r-project.org/web/views/HighPerformanceComputing.html -Big Data Within R http://www.slideshare.net/bytemining/r-hpc
  • 35. Using R (interfaces) --Using R Studio for easier development --Using Rattle GUI for straight off the shelf data mining and Using R Commander for Extensions --Using Revolution Analytics RPE -----Example of Snippets
  • 36. Using R --Using R for text mining ---Text Mining from Twitter Case Study ---Datasift Export to Amazon S3 --Using R for geo-coded analysis ---Hana DB --Using R for Graphical Analysis of Big Data TablePlot 3D using R Commander --Using R for forecasting Using Plugin R Commander E -Pack
  • 37. Existing Big Data Case Studies Departure of Aeroplanes-SAP Hana 200m http://allthingsr.blogspot.in/#!/2012/04/big-data-r-and-hana-analyze-200-million.html R using SAP Hana http://www.decisionstats.com/interview-blag-sap-labs-montreal-using-sap-hana-with-rstats/
  • 38. SAP Hana DB uses R http://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
  • 39. Oracle R Enterprise Case Studies and Examples http://www.oracle.com/technetwork/database/options/advanced-analytics/r- enterprise/index.html
  • 40. Oracle R Enterprise Case Studies and Examples http://www.oracle.com/technetwork/database/options/advanced-analytics/r- enterprise/index.html
  • 41. Revolution Analytics RevoScaleR package he RevoScaleR package to extract time series data from time-stamped logs (in this case, the "US Domestic Flights From 1990 to 2009" dataset on Infochimps): Analyzing time series data of all sorts is a fundamental business analytics task to which the R language is beautifully suited. In addition to the time series functions built into base stats library there are dozens of R packages devoted to time series... We have shown how data manipulation functions of the RevoScaleR package to extract time stamped data from a large data file, aggregate it, and form it into monthly time series that can easily be analyzed with standard R functions. http://www.inside-r.org/howto/extracting-time-series-large-data-sets http://blog.revolutionanalytics.com/2011/09/how-to-extract-time-series-from- large-timestamped-logs-with-r.html
  • 42. Using R on Amazon -Case Study --Bioconductor in the Cloud --Custom Amazon Instance --Concerns for non- American users of Amazon
  • 43. Using BigML on cloud Case Study Classification using Clojure on Cloud https://bigml.com/gallery/models/fraud_and_crime --Concerns on depending on third party tools --Example Cloudnumbers.com
  • 44. Using Google APIs https://code.google.com/apis/console/?pli=1 Google Storage API Google Predictive Analysis API Introduction to other APIS ----Concerns to users of Google APIs
  • 45. Using Google APIs case study Google Storage API Google Predictive Analysis API http://code.google.com/p/google-prediction-api-r-client/
  • 46. Using Google APIs case study Introduction to other Big Data Google APIS ----Concerns to users of Google APIs
  • 47. Using Python- PiCloud com/ http://www.picloud.
  • 48. Privacy hazards of big data analytics. Big Brother -1984 --- 2012 They know where you are (mobiles) They know what you are looking for (internet) They know your past (financial history +social media) They can use your medical history Laws authorize them (Patriot Act?) --example Emotional Analysis of Images http: //www.affectiva.com/
  • 49. References and Acknowledgements David Smith, Revolution Analytics David Champagne, Revolution Analytics All R Bloggers,Developers, Packagers Blag - SAP Hana Analytics Charlie Berger -and Oracle R Team Jim Kobielus -IBM Big Data Team R Development Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
  • 51. Book- R for Business Analytics http://www.springer.com/statistics/book/978-1-4614-4342-1