Submit Search
Upload
Introduction to Mahout
•
Download as PPTX, PDF
•
12 likes
•
5,431 views
T
Ted Dunning
Follow
Slides for the talk I gave to the Twin cities HUG
Read less
Read more
Technology
Education
Report
Share
Report
Share
1 of 72
Download now
Recommended
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
Varad Meru
Hands on Mahout!
Hands on Mahout!
OSCON Byrum
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
Ted Dunning
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
Apache Spark Machine Learning
Apache Spark Machine Learning
Carol McDonald
Recommended
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
Varad Meru
Hands on Mahout!
Hands on Mahout!
OSCON Byrum
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
Ted Dunning
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
Apache Spark Machine Learning
Apache Spark Machine Learning
Carol McDonald
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Evan Casey
Large Scale Machine learning with Spark
Large Scale Machine learning with Spark
Md. Mahedi Kaysar
Machine Learning with Hadoop
Machine Learning with Hadoop
Sangchul Song
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
Srivatsan Ramanujam
Introduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
datamantra
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
EMC
All thingspython@pivotal
All thingspython@pivotal
Srivatsan Ramanujam
Sparse Data Support in MLlib
Sparse Data Support in MLlib
Xiangrui Meng
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
DB Tsai
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
Spark Summit
Logistic Regression using Mahout
Logistic Regression using Mahout
tanuvir
Data science and OSS
Data science and OSS
Kevin Crocker
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Alpine Data
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
Patrick Pletscher
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
Anass Bensrhir - Senior Data Scientist
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
MLconf
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
What's new in Apache Mahout
What's new in Apache Mahout
Ted Dunning
Intro to Apache Mahout
Intro to Apache Mahout
Grant Ingersoll
Mahout and Recommendations
Mahout and Recommendations
Ted Dunning
More Related Content
What's hot
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Evan Casey
Large Scale Machine learning with Spark
Large Scale Machine learning with Spark
Md. Mahedi Kaysar
Machine Learning with Hadoop
Machine Learning with Hadoop
Sangchul Song
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
Srivatsan Ramanujam
Introduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
datamantra
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
EMC
All thingspython@pivotal
All thingspython@pivotal
Srivatsan Ramanujam
Sparse Data Support in MLlib
Sparse Data Support in MLlib
Xiangrui Meng
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
DB Tsai
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
Spark Summit
Logistic Regression using Mahout
Logistic Regression using Mahout
tanuvir
Data science and OSS
Data science and OSS
Kevin Crocker
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Alpine Data
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
Patrick Pletscher
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
Anass Bensrhir - Senior Data Scientist
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
MLconf
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
What's new in Apache Mahout
What's new in Apache Mahout
Ted Dunning
What's hot
(20)
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Large Scale Machine learning with Spark
Large Scale Machine learning with Spark
Machine Learning with Hadoop
Machine Learning with Hadoop
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
Introduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
All thingspython@pivotal
All thingspython@pivotal
Sparse Data Support in MLlib
Sparse Data Support in MLlib
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
Logistic Regression using Mahout
Logistic Regression using Mahout
Data science and OSS
Data science and OSS
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
What's new in Apache Mahout
What's new in Apache Mahout
Viewers also liked
Intro to Apache Mahout
Intro to Apache Mahout
Grant Ingersoll
Mahout and Recommendations
Mahout and Recommendations
Ted Dunning
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
Kris Jack
Mahout classification presentation
Mahout classification presentation
Naoki Nakatani
Introduction to Apache Mahout
Introduction to Apache Mahout
Aman Adhikari
Mahout
Mahout
Edureka!
Mahout part2
Mahout part2
Yasmine Gaber
Big Data Analytics using Mahout
Big Data Analytics using Mahout
IMC Institute
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
Cataldo Musto
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
Cataldo Musto
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
Viewers also liked
(11)
Intro to Apache Mahout
Intro to Apache Mahout
Mahout and Recommendations
Mahout and Recommendations
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
Mahout classification presentation
Mahout classification presentation
Introduction to Apache Mahout
Introduction to Apache Mahout
Mahout
Mahout
Mahout part2
Mahout part2
Big Data Analytics using Mahout
Big Data Analytics using Mahout
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Similar to Introduction to Mahout
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache Mahout
MapR Technologies
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
MapR Technologies
The power of hadoop in business
The power of hadoop in business
MapR Technologies
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
Sanket Shikhar
New directions for mahout
New directions for mahout
MapR Technologies
Predictive Analytics San Diego
Predictive Analytics San Diego
MapR Technologies
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
MapR Technologies
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
TigerGraph
MapReduce basics
MapReduce basics
Harisankar H
Data Science At Scale for IoT on the Pivotal Platform
Data Science At Scale for IoT on the Pivotal Platform
Gautam S. Muralidhar
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
MLconf
New Directions for Mahout
New Directions for Mahout
Ted Dunning
Which Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
পল্লব রায়
Introduction to Spark
Introduction to Spark
Carol McDonald
Cloudera Data Science Challenge
Cloudera Data Science Challenge
Mark Nichols, P.E.
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
Doug Needham
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout Recommenders
Ted Dunning
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
Etu Solution
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
Similar to Introduction to Mahout
(20)
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache Mahout
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
The power of hadoop in business
The power of hadoop in business
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
New directions for mahout
New directions for mahout
Predictive Analytics San Diego
Predictive Analytics San Diego
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
MapReduce basics
MapReduce basics
Data Science At Scale for IoT on the Pivotal Platform
Data Science At Scale for IoT on the Pivotal Platform
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
New Directions for Mahout
New Directions for Mahout
Which Algorithms Really Matter
Which Algorithms Really Matter
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
Introduction to Spark
Introduction to Spark
Cloudera Data Science Challenge
Cloudera Data Science Challenge
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout Recommenders
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
More from Ted Dunning
Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
Ted Dunning
How to Get Going with Kubernetes
How to Get Going with Kubernetes
Ted Dunning
Progress for big data in Kubernetes
Progress for big data in Kubernetes
Ted Dunning
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
Machine Learning Logistics
Machine Learning Logistics
Ted Dunning
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Ted Dunning
Machine Learning logistics
Machine Learning logistics
Ted Dunning
T digest-update
T digest-update
Ted Dunning
Finding Changes in Real Data
Finding Changes in Real Data
Ted Dunning
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Ted Dunning
Real time-hadoop
Real time-hadoop
Ted Dunning
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
Ted Dunning
Sharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Ted Dunning
Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
Doing-the-impossible
Doing-the-impossible
Ted Dunning
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
More from Ted Dunning
(20)
Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
How to Get Going with Kubernetes
How to Get Going with Kubernetes
Progress for big data in Kubernetes
Progress for big data in Kubernetes
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Machine Learning Logistics
Machine Learning Logistics
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Machine Learning logistics
Machine Learning logistics
T digest-update
T digest-update
Finding Changes in Real Data
Finding Changes in Real Data
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Real time-hadoop
Real time-hadoop
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
Sharing Sensitive Data Securely
Sharing Sensitive Data Securely
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Dunning time-series-2015
Dunning time-series-2015
Doing-the-impossible
Doing-the-impossible
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Recently uploaded
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Databarracks
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Lonnie McRorey
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
UiPathCommunity
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
Recently uploaded
(20)
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
How to write a Business Continuity Plan
How to write a Business Continuity Plan
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
Introduction to Mahout
1.
1©MapR Technologies 2013-
Confidential Introduction to Mahout And How To Build a Recommender
2.
2©MapR Technologies 2013-
Confidential Me, Us Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s Tonight Hash tag - #tchug See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR
3.
3©MapR Technologies 2013-
Confidential Sidebar on Drill Apache Drill – SQL on Hadoop (and other things) – Intended to solve problems for 1-5 years from now Not the problems from 1-10 years ago – Multiple levels of API supported • SQL-2003 • Logical plan language (DAG in JSON) • Physical plan language (DAG with push-down, exchange markers) • Execution plan language (many DAG’s) Current state – SQL 2003 support in place – Logical plan interpreter useful for testing – Value vectors near completion – High performance RPC working
4.
4©MapR Technologies 2013-
Confidential More on Drill Just completed OSCON workshop Workshop materials available shortly – Extracted technology demonstrators – Sample queries Send me email or tweet for more info
5.
5©MapR Technologies 2013-
Confidential What’s Up? What is Mahout? – Math library – Clustering, classifiers, other stuff Recommendation – Generalities – Algorithm Specifics – System Design – Important things never mentioned Final thoughts
6.
6©MapR Technologies 2013-
Confidential What is Mahout? “Scalable machine learning” – not just Hadoop-oriented machine learning – not entirely, that is. Just mostly. Components – math library – clustering – classification – decompositions – recommendations
7.
7©MapR Technologies 2013-
Confidential What is Mahout? “Scalable machine learning” – not just Hadoop-oriented machine learning – not entirely, that is. Just mostly. Components – math library – clustering – classification – decompositions – recommendations
8.
8©MapR Technologies 2013-
Confidential Mahout Math
9.
9©MapR Technologies 2013-
Confidential Mahout Math Goals are – basic linear algebra, – and statistical sampling, – and good clustering, – decent speed, – extensibility, – especially for sparse data But not – totally badass speed – comprehensive set of algorithms – optimization, root finders, quadrature
10.
10©MapR Technologies 2013-
Confidential Matrices and Vectors At the core: – DenseVector, RandomAccessSparseVector – DenseMatrix, SparseRowMatrix Highly composable API Important ideas: – view*, assign and aggregate – iteration m.viewDiagonal().assign(v)
11.
11©MapR Technologies 2013-
Confidential Assign? View? Why assign? – Copying is the major cost for naïve matrix packages – In-place operations critical to reasonable performance – Many kinds of updates required, so functional style very helpful Why view? – In-place operations often required for blocks, rows, columns or diagonals – With views, we need #assign + #views methods – Without views, we need #assign x #views methods Synergies – With both views and assign, many loops become single line
12.
12©MapR Technologies 2013-
Confidential Assign Matrices Vectors Matrix assign(double value); Matrix assign(double[][] values); Matrix assign(Matrix other); Matrix assign(DoubleFunction f); Matrix assign(Matrix other, DoubleDoubleFunction f); Vector assign(double value); Vector assign(double[] values); Vector assign(Vector other); Vector assign(DoubleFunction f); Vector assign(Vector other, DoubleDoubleFunction f); Vector assign(DoubleDoubleFunction f, double y);
13.
13©MapR Technologies 2013-
Confidential Views Matrices Vectors Matrix viewPart(int[] offset, int[] size); Matrix viewPart(int row, int rlen, int col, int clen); Vector viewRow(int row); Vector viewColumn(int column); Vector viewDiagonal(); Vector viewPart(int offset, int length);
14.
14©MapR Technologies 2013-
Confidential Aggregates Matrices Vectors double zSum(); double aggregate( DoubleDoubleFunction reduce, DoubleFunction map); double aggregate(Vector other, DoubleDoubleFunction aggregator, DoubleDoubleFunction combiner); double zSum(); Vector aggregateRows(VectorFunction f); Vector aggregateColumns(VectorFunction f); double aggregate(DoubleDoubleFunction combiner, DoubleFunction mapper);
15.
15©MapR Technologies 2013-
Confidential Predefined Functions Many handy functions ABS LOG2 ACOS NEGATE ASIN RINT ATAN SIGN CEIL SIN COS SQRT EXP SQUARE FLOOR SIGMOID IDENTITY SIGMOIDGRADIENT INV TAN LOGARITHM
16.
16©MapR Technologies 2013-
Confidential Examples double alpha; a.assign(alpha); a.assign(b, Functions.chain( Functions.plus(beta), Functions.times(alpha)); A =a A =aB+ b
17.
17©MapR Technologies 2013-
Confidential Sparse Optimizations DoubleDoubleFunction abstract properties And Vector properties public boolean isLikeRightPlus(); public boolean isLikeLeftMult(); public boolean isLikeRightMult(); public boolean isLikeMult(); public boolean isCommutative(); public boolean isAssociative(); public boolean isAssociativeAndCommutative(); public boolean isDensifying(); public boolean isDense(); public boolean isSequentialAccess(); public double getLookupCost(); public double getIteratorAdvanceCost(); public boolean isAddConstantTime();
18.
18©MapR Technologies 2013-
Confidential More Examples The trace of a matrix Set diagonal to zero Set diagonal to negative of row sums
19.
19©MapR Technologies 2013-
Confidential Examples The trace of a matrix Set diagonal to zero Set diagonal to negative of row sums m.viewDiagonal().zSum()
20.
20©MapR Technologies 2013-
Confidential Examples The trace of a matrix Set diagonal to zero Set diagonal to negative of row sums m.viewDiagonal().zSum() m.viewDiagonal().assign(0)
21.
21©MapR Technologies 2013-
Confidential Examples The trace of a matrix Set diagonal to zero Set diagonal to negative of row sums excluding the diagonal m.viewDiagonal().zSum() m.viewDiagonal().assign(0) Vector diag = m.viewDiagonal().assign(0); diag.assign(m.rowSums().assign(Functions.MINUS));
22.
22©MapR Technologies 2013-
Confidential Iteration Matrices are Iterable in Mahout Vectors are densely or sparsely iterable // compute both row and columns sums in one pass for (MatrixSlice row: m) { rSums.set(row.index(), row.zSum()); cSums.assign(row, Functions.PLUS); } double entropy = 0; for (Vector.Element e: v.nonZeroes()) { entropy += e.get() * Math.log(e.get()); }
23.
23©MapR Technologies 2013-
Confidential Random Sampling Samples from some type Lots of kinds ChineseRestaurant Missing Normal Empirical Multinomial PoissonSampler IndianBuffet MultiNormal Sampler public interface Sampler<T> { T sample(); } public abstract class AbstractSamplerFunction extends DoubleFunction implements Sampler<Double>
24.
24©MapR Technologies 2013-
Confidential Clustering and Such Streaming k-means and ball k-means – streaming reduces very large data to a cluster sketch – ball k-means is a high quality k-means implementation – the cluster sketch is also usable for other applications – single machine threaded and map-reduce versions available SVD and friends – stochastic SVD has in-memory, single machine out-of-core and map-reduce versions – good for reducing very large sparse matrices to tall skinny dense ones Spectral clustering – based on SVD, allows massive dimensional clustering
25.
25©MapR Technologies 2013-
Confidential Mahout Math Summary Matrices, Vectors – views – in-place assignment – aggregations – iterations Functions – lots built-in – cooperate with sparse vector optimizations Sampling – abstract samplers – samplers as functions Other stuff … clustering, SVD
26.
26©MapR Technologies 2013-
Confidential Recommenders
27.
27©MapR Technologies 2013-
Confidential Recommendations Often known as collaborative filtering Actors interact with items – observe successful interaction We want to suggest additional successful interactions Observations inherently very sparse
28.
28©MapR Technologies 2013-
Confidential The Big Ideas Cooccurrence is the core operation (and it is pretty simple) Cooccurrence can be extended to handle important new capabilities Recommendation systems can be deployed ideally using search technology
29.
29©MapR Technologies 2013-
Confidential Examples of Recommendations Customers buying books (Linden et al) Web visitors rating music (Shardanand and Maes) or movies (Riedl, et al), (Netflix) Internet radio listeners not skipping songs (Musicmatch) Internet video watchers watching >30 s (Veoh) Visibility in a map UI (new Google maps)
30.
30©MapR Technologies 2013-
Confidential A simple recommendation architecture Look at the history of interactions Find significant item cooccurrence in user histories Use these cooccurring items as “indicators” For all indicators in user history, accumulate scores for related items
31.
31©MapR Technologies 2013-
Confidential Recommendation Basics History: User Thing 1 3 2 4 3 4 2 3 3 2 1 1 2 1
32.
32©MapR Technologies 2013-
Confidential Recommendation Basics History as matrix: (t1, t3) cooccur 2 times, (t1, t4) once, (t2, t4) once, (t3, t4) once t1 t2 t3 t4 u1 1 0 1 0 u2 1 0 1 1 u3 0 1 0 1
33.
33©MapR Technologies 2013-
Confidential A Quick Simplification Users who do h Also do r Ah AT Ah( ) AT A( )h User-centric recommendations Item-centric recommendations
34.
34©MapR Technologies 2013-
Confidential Recommendation Basics Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2
35.
35©MapR Technologies 2013-
Confidential Problems with Raw Cooccurrence Very popular items co-occur with everything – Welcome document – Elevator music That isn’t interesting – We want anomalous cooccurrence
36.
36©MapR Technologies 2013-
Confidential Recommendation Basics Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2 t3 not t3 t1 2 1 not t1 1 1
37.
37©MapR Technologies 2013-
Confidential Spot the Anomaly Root LLR is roughly like standard deviations A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 2 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 0.44 0.98 2.26 7.15
38.
39©MapR Technologies 2013-
Confidential Threshold by Score Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2
39.
40©MapR Technologies 2013-
Confidential Threshold by Score Significant cooccurrence => Indicators t1 t2 t3 t4 t1 1 0 0 1 t2 0 1 0 1 t3 0 0 1 1 t4 1 0 0 1
40.
41©MapR Technologies 2013-
Confidential So Far, So Good Classic recommendation systems based on these approaches – Musicmatch (ca 2000) – Veoh Networks (ca 2005) Currently available in Mahout – See RowSimilarityJob Very simple to deploy – Compute indicators – Store in search engine – Works very well with enough data
41.
42©MapR Technologies 2013-
Confidential What’s right about this?
42.
43©MapR Technologies 2013-
Confidential Virtues of Current State of the Art Lots of well publicized history – Musicmatch, Veoh, Netflix, Amazon, Overstock Lots of support – Mahout, commercial offerings like Myrrix Lots of existing code – Mahout, commercial codes Proven track record Well socialized solution
43.
44©MapR Technologies 2013-
Confidential What’s wrong about this?
44.
45©MapR Technologies 2013-
Confidential Problems for Recommenders Cold start Disjoint populations Long tail Multiple kinds of evidence (multi-modal recommendations) – unstructured add-on data – other transaction streams – textual descriptions
45.
46©MapR Technologies 2013-
Confidential What is this multi-modal stuff? But people don’t just do one thing One kind of behavior is useful for predicting other kinds Having a complete picture is important for accuracy What has the user said, viewed, clicked, closed, bought lately?
46.
47©MapR Technologies 2013-
Confidential Example Multi-modal Inputs Overlap in restaurant visits is useful Big spender cues Cuisine as an indicator Review text as an indicator
47.
48©MapR Technologies 2013-
Confidential Too Limited People do more than one kind of thing Different kinds of behaviors give different quality, quantity and kind of information We don’t have to do co-occurrence We can do cross-occurrence Result is cross-recommendation
48.
49©MapR Technologies 2013-
Confidential Heh?
49.
51©MapR Technologies 2013-
Confidential For example Users enter queries (A) – (actor = user, item=query) Users view videos (B) – (actor = user, item=video) ATA gives query recommendation – “did you mean to ask for” BTB gives video recommendation – “you might like these videos”
50.
52©MapR Technologies 2013-
Confidential The punch-line BTA recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
51.
53©MapR Technologies 2013-
Confidential Real-life example Query: “Paco de Lucia” Conventional meta-data search results: – “hombres del paco” times 400 – not much else Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
52.
54©MapR Technologies 2013-
Confidential Real-life example
53.
55©MapR Technologies 2013-
Confidential Hypothetical Example Want a navigational ontology? Just put labels on a web page with traffic – This gives A = users x label clicks Remember viewing history – This gives B = users x items Cross recommend – B’A = label to item mapping After several users click, results are whatever users think they should be
54.
56©MapR Technologies 2013-
Confidential
55.
57©MapR Technologies 2013-
Confidential Nice. But we can do better?
56.
58©MapR Technologies 2013-
Confidential Ausers things
57.
59©MapR Technologies 2013-
Confidential A1 A2 é ë ù û users thing type 1 thing type 2
58.
60©MapR Technologies 2013-
Confidential A1 A2 é ë ù û T A1 A2 é ë ù û= A1 T A2 T é ë ê ê ù û ú ú A1 A2 é ë ù û = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú r1 r2 é ë ê ê ù û ú ú = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú h1 h2 é ë ê ê ù û ú ú r1 = A1 T A1 A1 T A2 é ëê ù ûú h1 h2 é ë ê ê ù û ú ú
59.
61©MapR Technologies 2013-
Confidential Summary Input: Multiple kinds of behavior on one set of things Output: Recommendations for one kind of behavior with a different set of things Cross recommendation is a special case
60.
62©MapR Technologies 2013-
Confidential Now again, without the scary math
61.
63©MapR Technologies 2013-
Confidential Input Data User transactions – user id, merchant id – SIC code, amount – Descriptions, cuisine, … Offer transactions – user id, offer id – vendor id, merchant id’s, – offers, views, accepts
62.
64©MapR Technologies 2013-
Confidential Input Data User transactions – user id, merchant id – SIC code, amount – Descriptions, cuisine, … Offer transactions – user id, offer id – vendor id, merchant id’s, – offers, views, accepts Derived user data – merchant id’s – anomalous descriptor terms – offer & vendor id’s Derived merchant data – local top40 – SIC code – vendor code – amount distribution
63.
65©MapR Technologies 2013-
Confidential Cross-recommendation Per merchant indicators – merchant id’s – chain id’s – SIC codes – indicator terms from text – offer vendor id’s Computed by finding anomalous (indicator => merchant) rates
64.
66©MapR Technologies 2013-
Confidential How can we deploy this?
65.
67©MapR Technologies 2013-
Confidential Search-based Recommendations Sample document – Merchant Id – Field for text description – Phone – Address – Location
66.
68©MapR Technologies 2013-
Confidential Search-based Recommendations Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40
67.
69©MapR Technologies 2013-
Confidential Search-based Recommendations Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40 Sample query – Current location – Recent merchant descriptions – Recent merchant id’s – Recent SIC codes – Recent accepted offers – Local top40
68.
70©MapR Technologies 2013-
Confidential Search-based Recommendations Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40 Sample query – Current location – Recent merchant descriptions – Recent merchant id’s – Recent SIC codes – Recent accepted offers – Local top40 Original data and meta-data Derived from cooccurrence and cross-occurrence analysis Recommendation query
69.
71©MapR Technologies 2013-
Confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Complete history Analyze with Map-Reduce
70.
72©MapR Technologies 2013-
Confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history Deploy with Conventional Search System
71.
73©MapR Technologies 2013-
Confidential Objective Results At a very large credit card company History is all transactions Development time to minimal viable product about 4 months General release 2-3 months later Search-based recs at or equal in quality to other techniques
72.
74©MapR Technologies 2013-
Confidential Contact: – tdunning@maprtech.com – @ted_dunning – @apachemahout – @user-subscribe@mahout.apache.org Slides and such http://www.slideshare.net/tdunning Hash tags: #mapr #apachemahout #recommendations
Download now