SlideShare a Scribd company logo
1 of 23
Likes and LocationsAdventure in Social Data Mining Gene Chuang – Exec Dir of Social Eng, ATTi Masahji Stewart – Founder, Synctree Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
Dedication
Background
Social Local Mobile Loco
Why Mine Social and Local Data? Signals to improve user experience Timely and “Placely” Engagement Provide value – save time, save money Opt In, Privacy
Yp.com Infrastructure Ruby on Rails for Web, Login and API Solr/Lucene for Search Hadoop for Data pipeline Hive for Ad Hoc queries on Hadoop Ruby ETL scripts
Oauth 2 Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens Think Valet Key
YP.comLogin/Registration
Login Layer A
Oauth 2 Dance
Semi-Social Search
Social Mining - Extract Extract Script Pull data out of a database (like Oracle), Hive, Files, hit Facebook, or any other source and output JSON data to STDOUT: For example to get count of the total users signed up by day: $ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14 {"day":"2011-02-14","count":891,"total":1328636} {"day":"2011-02-15","count":1088,"total":1329724} {"day":"2011-02-16","count":1016,"total":1330740} {"day":"2011-02-17","count":1359,"total":1332099} {"day":"2011-02-18","count":1143,"total":1333242} {"day":"2011-02-19","count":660,"total":1333902} {"day":"2011-02-20","count":597,"total":1334499} {"day":"2011-02-21","count":874,"total":1335373}
Social Mining - Transform Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT For example, to add ypids to existing facebook likes then filter out location and ypid matching data: $ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id {"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]} {"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"} {"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"} {"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
Social Mining - Load Load Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard) For example loading total facebook accounts by day into the web dashboard $ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
Location Real-Time Fuzzy Matcher FP0 (exact match)     Append LISTING_NAME + ADDRESS + CITY + PHONE     Tokenize, normalize, strip punctuation, and stem     Append tokens FP3 (fuzzy match)     Append LISTING_NAME + ADDRESS + CITY + PHONE     Tokenize, normalize, strip punctuation, and stem     Remove tokens that are less than 2 chars long     Remove upper-case short tokens (i.e., MD, CPA, DDS, etc)     Remove non-phone, short, numerical tokens      Remove stopwords based on top 170 most occurring listing_name tokens     Order tokens alphabetically     Append tokens Example: Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710 FP Method Value  FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
Social Data Valid Facebook Access Tokens: 14K Total Unique Likes: 300K % Likes with Locations and/or Phones: 19% % Likes mapped to YPID: 38% Total Check-Ins: 530
Social Mining Mother Lode Social Search Local Recommendation Engine Discovery Wall Top 10 List Social e-Commerce Online Presence Management – Social CRM
Questions? genechuang@gmail.com http://www.twitter.com/genechuang http://www.quora.com/Gene-Chuang http://www.linkedin.com/in/genechuang

More Related Content

Viewers also liked

Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...Social Care Ireland
 
Multifacet Themes of Diversity
Multifacet Themes of DiversityMultifacet Themes of Diversity
Multifacet Themes of DiversityAbrazil
 
Slide 1
Slide 1Slide 1
Slide 1izadat
 
Things you should know before you build your site
Things you should know before you build your siteThings you should know before you build your site
Things you should know before you build your sitePanu Ausavasereelert
 
ç. Z. kuramı
ç. Z. kuramıç. Z. kuramı
ç. Z. kuramıc_lagan
 
Social media updates oct (comms day)
Social media updates oct (comms day)Social media updates oct (comms day)
Social media updates oct (comms day)Ashleey Leong
 
Evaluation qu's 1&2
Evaluation qu's 1&2Evaluation qu's 1&2
Evaluation qu's 1&2billy-sav
 
Penn State #OERSummit16 Keynote
Penn State #OERSummit16 KeynotePenn State #OERSummit16 Keynote
Penn State #OERSummit16 KeynoteNicole Allen
 
DIPLOMA - young artists 2016
DIPLOMA - young artists 2016DIPLOMA - young artists 2016
DIPLOMA - young artists 2016Silvia Floares
 
2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorksNicole Allen
 
2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost CrisisNicole Allen
 
Classroom Management
Classroom ManagementClassroom Management
Classroom ManagementJane Wolff
 
Estrategias y tecnicas de estudio noviembre 2015
Estrategias y  tecnicas de estudio noviembre 2015Estrategias y  tecnicas de estudio noviembre 2015
Estrategias y tecnicas de estudio noviembre 2015JFCOPGLEZ
 
George Business Consultancy Operating Model
George Business Consultancy Operating ModelGeorge Business Consultancy Operating Model
George Business Consultancy Operating Modelpaulageorge
 

Viewers also liked (20)

Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...
 
Multifacet Themes of Diversity
Multifacet Themes of DiversityMultifacet Themes of Diversity
Multifacet Themes of Diversity
 
VAICIURGIS Dominycas
VAICIURGIS DominycasVAICIURGIS Dominycas
VAICIURGIS Dominycas
 
Slide 1
Slide 1Slide 1
Slide 1
 
Things you should know before you build your site
Things you should know before you build your siteThings you should know before you build your site
Things you should know before you build your site
 
ç. Z. kuramı
ç. Z. kuramıç. Z. kuramı
ç. Z. kuramı
 
Social media updates oct (comms day)
Social media updates oct (comms day)Social media updates oct (comms day)
Social media updates oct (comms day)
 
italien presentation
italien presentationitalien presentation
italien presentation
 
Undrah
UndrahUndrah
Undrah
 
Evaluation qu's 1&2
Evaluation qu's 1&2Evaluation qu's 1&2
Evaluation qu's 1&2
 
Penn State #OERSummit16 Keynote
Penn State #OERSummit16 KeynotePenn State #OERSummit16 Keynote
Penn State #OERSummit16 Keynote
 
DIPLOMA - young artists 2016
DIPLOMA - young artists 2016DIPLOMA - young artists 2016
DIPLOMA - young artists 2016
 
2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks
 
Ficha planificación espacio
Ficha planificación espacioFicha planificación espacio
Ficha planificación espacio
 
2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis
 
эко урок
эко урокэко урок
эко урок
 
Classroom Management
Classroom ManagementClassroom Management
Classroom Management
 
Estrategias y tecnicas de estudio noviembre 2015
Estrategias y  tecnicas de estudio noviembre 2015Estrategias y  tecnicas de estudio noviembre 2015
Estrategias y tecnicas de estudio noviembre 2015
 
Sgp
SgpSgp
Sgp
 
George Business Consultancy Operating Model
George Business Consultancy Operating ModelGeorge Business Consultancy Operating Model
George Business Consultancy Operating Model
 

Similar to Likes and Locations - Adventure in Social Data Mining

IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...J T "Tom" Johnson
 
Archive It Dlc Oct08
Archive It Dlc Oct08Archive It Dlc Oct08
Archive It Dlc Oct08James Jacobs
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)ibwhite
 
Apache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 SessionApache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 SessionSerge Huber
 
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter ResumeNational Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter ResumeWalmart Super Center
 
Lessons Learned - Building YDN
Lessons Learned - Building YDNLessons Learned - Building YDN
Lessons Learned - Building YDNDan Theurer
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planningmkhinke
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban PlanningChris Haller
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthruibwhite
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsVincent Burckhardt
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015StampedeCon
 
Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using HadoopSteve Watt
 
How hackers collate information about employees
How hackers collate information about employees How hackers collate information about employees
How hackers collate information about employees begmohsin
 
Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009David Wallace
 
Scraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap ListScraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap Listadityaverita237
 
Veryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax codingVeryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax codingErnest Semerda
 

Similar to Likes and Locations - Adventure in Social Data Mining (20)

IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
 
Apache Unomi Project In-depth
Apache Unomi Project In-depthApache Unomi Project In-depth
Apache Unomi Project In-depth
 
Archive It Dlc Oct08
Archive It Dlc Oct08Archive It Dlc Oct08
Archive It Dlc Oct08
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
 
Apache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 SessionApache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 Session
 
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter ResumeNational Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
 
Lessons Learned - Building YDN
Lessons Learned - Building YDNLessons Learned - Building YDN
Lessons Learned - Building YDN
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
 
Microsoft Flow For Developers
Microsoft Flow For DevelopersMicrosoft Flow For Developers
Microsoft Flow For Developers
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthru
 
SearchMonkey
SearchMonkeySearchMonkey
SearchMonkey
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With Analytics
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using Hadoop
 
How hackers collate information about employees
How hackers collate information about employees How hackers collate information about employees
How hackers collate information about employees
 
Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009
 
Scraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap ListScraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap List
 
Veryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax codingVeryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax coding
 

Recently uploaded

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Recently uploaded (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Likes and Locations - Adventure in Social Data Mining

  • 1. Likes and LocationsAdventure in Social Data Mining Gene Chuang – Exec Dir of Social Eng, ATTi Masahji Stewart – Founder, Synctree Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
  • 4.
  • 6. Why Mine Social and Local Data? Signals to improve user experience Timely and “Placely” Engagement Provide value – save time, save money Opt In, Privacy
  • 7. Yp.com Infrastructure Ruby on Rails for Web, Login and API Solr/Lucene for Search Hadoop for Data pipeline Hive for Ad Hoc queries on Hadoop Ruby ETL scripts
  • 8. Oauth 2 Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens Think Valet Key
  • 13.
  • 14.
  • 15. Social Mining - Extract Extract Script Pull data out of a database (like Oracle), Hive, Files, hit Facebook, or any other source and output JSON data to STDOUT: For example to get count of the total users signed up by day: $ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14 {"day":"2011-02-14","count":891,"total":1328636} {"day":"2011-02-15","count":1088,"total":1329724} {"day":"2011-02-16","count":1016,"total":1330740} {"day":"2011-02-17","count":1359,"total":1332099} {"day":"2011-02-18","count":1143,"total":1333242} {"day":"2011-02-19","count":660,"total":1333902} {"day":"2011-02-20","count":597,"total":1334499} {"day":"2011-02-21","count":874,"total":1335373}
  • 16. Social Mining - Transform Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT For example, to add ypids to existing facebook likes then filter out location and ypid matching data: $ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id {"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]} {"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"} {"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"} {"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
  • 17. Social Mining - Load Load Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard) For example loading total facebook accounts by day into the web dashboard $ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
  • 18.
  • 19.
  • 20. Location Real-Time Fuzzy Matcher FP0 (exact match) Append LISTING_NAME + ADDRESS + CITY + PHONE Tokenize, normalize, strip punctuation, and stem Append tokens FP3 (fuzzy match) Append LISTING_NAME + ADDRESS + CITY + PHONE Tokenize, normalize, strip punctuation, and stem Remove tokens that are less than 2 chars long Remove upper-case short tokens (i.e., MD, CPA, DDS, etc) Remove non-phone, short, numerical tokens Remove stopwords based on top 170 most occurring listing_name tokens Order tokens alphabetically Append tokens Example: Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710 FP Method Value FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
  • 21. Social Data Valid Facebook Access Tokens: 14K Total Unique Likes: 300K % Likes with Locations and/or Phones: 19% % Likes mapped to YPID: 38% Total Check-Ins: 530
  • 22. Social Mining Mother Lode Social Search Local Recommendation Engine Discovery Wall Top 10 List Social e-Commerce Online Presence Management – Social CRM
  • 23. Questions? genechuang@gmail.com http://www.twitter.com/genechuang http://www.quora.com/Gene-Chuang http://www.linkedin.com/in/genechuang