SlideShare a Scribd company logo
1 of 18
Download to read offline
Modeling taste with Cassandra




Affinity is based on user tastes, preferences, and interests

                                                               1
What is a taste profile?

               Operational definition: the set of things you like and dislike

Stuff I like                                   Stuff I don’t like




     Challenge: how do you build a set of things you like and dislike
           Operational definition: the taste profile for someone?               2
Thesis: Likes are correlated
Inferring correlations
        D               1)   User A:
                              •   Democrat
                              •   Likes Arugula
                        2)   User B:
                    C
                              •   Republican
    E
            ?                 •   Dislikes Arugula
                        3)   User C indicates:
                              •   Democrat

                        What would we infer is User C’s affinity for
                        Arugula?

A
                        Answer: User C would like Arugula
                B




                                                                       4
Inferring correlations

               Like arugula
                                      User A


                                      <3, 2.5>

                     <1,1>
Dislike                           Like
Obama                             Obama
          User B


      <-2,-1.5>


    <-3,-3>
              Dislike arugula

                              User C           If someone’s affinity
                                               for Obama is 2.0,
                              <2,?>
                                               what is their affinity
                                               for arugula?

                                                                        5
Discovering latent factors
                                                            Obama
                                                                             Liberal
                                                     Arugula        <5, 5>
                                Like arugula
                                                                <4, 4>
                                                   User A

                                                     <3, 2>

                                      <1,1>
                 Dislike                            Like
                 Obama                              Obama
                           User B


                        <-2,-1.5>

            Iceberg
                       <-3,-3>
               <-4, -4>        Dislike arugula
    GOP


 <-5, -5>                                      User C         Predict 1.5 for how
                                                              much this person will
                                               <2,1.5>
Conservative                                                  like arugula.


                                                                                       6
Taste space = many latent factors

                                      <0.7, 4.4, -.1>
                        Liberal

                                  <0.5, 2.4, -.4>
                                     A
                                     Extroverted


Masculine                                          Feminine



                        <-0.5, -3.1, 0.1>
        Introverted
                       B
                      Conservative




                                                              7
What is a taste profile profile?

                 Operational definition: a coordinate in taste space

Stuff I like (close to me in taste space)   Stuff I don’t like (far away in taste space)




           Operational definition: the set of things you like and dislike
        Challenge: how do you calculate taste coordinates?                            8
Calculating taste coordinates
                       D                     Edge weight = dot product of nodes
? <x, y>
                                             to constrain similar items to be
                   2            <1, -1>
                                             close to each other.
                                     C       Assume edge weights of:
               E                                +2 = “love”
                                                -2 = “hate”
      2    <1, -0.5>
                                             Democratic node must solve:
                                               1*x -2*y = 2 (edge from A)
           2
                           -2                  1*x -1*y = 2 (edge from C)
      A
                                             Solution = <2, 0>
 <1, -2>                         B
                           <-1, 2>




                                                                             9
Updating taste coordinates

          User A purchases a camera...

<1, -1>
                                                          <1, -0.5>
                          2         <1, -1>
                                                                                      2         <1, -1>
                                         C
                                                                                                     C
                                              <-1, 0.5>
                                                                                                          <-1, 0.5>
                  <1, -0.5>
                                                                      2       <1, -0.5>

              2
                               -2                                             2
                                              2                                            -2
          A                                                                                               2
                                                                      A
   <1, -2>                           B
                                                               <0.75, -2.5>                      B
                              <-1, 2>
                                                                                          <-1, 2>

          Resulting in blue coordinates changing.
v1 System overview - Model updates

                                     1) Receive event
  Rec.              Updater          (eg, Purchase)
  Engine



    3) Write user             2a) Write Purchase edge
    and item                  2b) Read other edges
    coordinates               for this user and item




Reco. DB            Taste graph
User -> coord
Item -> coord
v1 System overview - Rec serving

                     1) Page load        Rec.          Updater
                     requests            Engine
                     recommendations


                               2) Rec. engine
                               finds other
                               cameras close
                               to user’s
3) Recommendations             coordinates
shown to user


                                       Reco. DB        Taste graph
                                       User -> coord
                                       Item -> coord
v1 Taste Graph data size


40 billion edges
2 billion item nodes
200 million user nodes

5TB of data, takes up 10TB with Replication Factor of 2

We expect this to quadruple next year as we get more events and add
  new types of edges




                                                                      13
v1 Taste Graph DB configuration


32 Linux machines
  128GB RAM
  1TB iSCSI SSD
  10 GigE NIC


Cassandra version 1.0.8

8GB JVM heap space

Size-tiered compaction strategy
v1 Taste Graph schema

User Edges
              (timestamp, edge_type, item_id)   …
   user_id               <empty>
Item Edges
              (timestamp, edge_type, user_id)   …
    item_id              <empty>
User Nodes
                   tastevector
    user_id   200 bytes (50 floats)
Item Nodes
                   tastevector
    item_id   200 bytes (50 floats)
v1 Real-time taste updates

Edges and nodes read per second
v1 Real-time taste updates

Edges and nodes written per second
Questions?


tp@hunch.com




                        18

More Related Content

What's hot

Redis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Labs
 
Machine Learning for retail and ecommerce
Machine Learning for retail and ecommerceMachine Learning for retail and ecommerce
Machine Learning for retail and ecommerceAndrei Lopatenko
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarNaresh Jain
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Deep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendationsDeep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendationsAryan Khandal
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix ScaleAish Fenton
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Shrutika Oswal
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Deep neural network for youtube recommendations
Deep neural network for youtube recommendationsDeep neural network for youtube recommendations
Deep neural network for youtube recommendationsKan-Han (John) Lu
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFYusuke Yamamoto
 

What's hot (20)

Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Redis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Streams for Event-Driven Microservices
Redis Streams for Event-Driven Microservices
 
Machine Learning for retail and ecommerce
Machine Learning for retail and ecommerceMachine Learning for retail and ecommerce
Machine Learning for retail and ecommerce
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Deep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendationsDeep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendations
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
gRPC
gRPCgRPC
gRPC
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
LMAX Architecture
LMAX ArchitectureLMAX Architecture
LMAX Architecture
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Deep neural network for youtube recommendations
Deep neural network for youtube recommendationsDeep neural network for youtube recommendations
Deep neural network for youtube recommendations
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 

Viewers also liked

Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendationsproksik
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph DatabasesInfiniteGraph
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Neo4j
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph DatabasesAntonio Maccioni
 
Graph Database, a little connected tour - Castano
Graph Database, a little connected tour - CastanoGraph Database, a little connected tour - Castano
Graph Database, a little connected tour - CastanoCodemotion
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - ImportNeo4j
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDaysNeo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 

Viewers also liked (17)

Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendations
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Lju Lazarevic
Lju LazarevicLju Lazarevic
Lju Lazarevic
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
Graph Database, a little connected tour - Castano
Graph Database, a little connected tour - CastanoGraph Database, a little connected tour - Castano
Graph Database, a little connected tour - Castano
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDays
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Graph Based Recommendation Systems at eBay

  • 1. Modeling taste with Cassandra Affinity is based on user tastes, preferences, and interests 1
  • 2. What is a taste profile? Operational definition: the set of things you like and dislike Stuff I like Stuff I don’t like Challenge: how do you build a set of things you like and dislike Operational definition: the taste profile for someone? 2
  • 3. Thesis: Likes are correlated
  • 4. Inferring correlations D 1) User A: • Democrat • Likes Arugula 2) User B: C • Republican E ? • Dislikes Arugula 3) User C indicates: • Democrat What would we infer is User C’s affinity for Arugula? A Answer: User C would like Arugula B 4
  • 5. Inferring correlations Like arugula User A <3, 2.5> <1,1> Dislike Like Obama Obama User B <-2,-1.5> <-3,-3> Dislike arugula User C If someone’s affinity for Obama is 2.0, <2,?> what is their affinity for arugula? 5
  • 6. Discovering latent factors Obama Liberal Arugula <5, 5> Like arugula <4, 4> User A <3, 2> <1,1> Dislike Like Obama Obama User B <-2,-1.5> Iceberg <-3,-3> <-4, -4> Dislike arugula GOP <-5, -5> User C Predict 1.5 for how much this person will <2,1.5> Conservative like arugula. 6
  • 7. Taste space = many latent factors <0.7, 4.4, -.1> Liberal <0.5, 2.4, -.4> A Extroverted Masculine Feminine <-0.5, -3.1, 0.1> Introverted B Conservative 7
  • 8. What is a taste profile profile? Operational definition: a coordinate in taste space Stuff I like (close to me in taste space) Stuff I don’t like (far away in taste space) Operational definition: the set of things you like and dislike Challenge: how do you calculate taste coordinates? 8
  • 9. Calculating taste coordinates D Edge weight = dot product of nodes ? <x, y> to constrain similar items to be 2 <1, -1> close to each other. C Assume edge weights of: E +2 = “love” -2 = “hate” 2 <1, -0.5> Democratic node must solve: 1*x -2*y = 2 (edge from A) 2 -2 1*x -1*y = 2 (edge from C) A Solution = <2, 0> <1, -2> B <-1, 2> 9
  • 10. Updating taste coordinates User A purchases a camera... <1, -1> <1, -0.5> 2 <1, -1> 2 <1, -1> C C <-1, 0.5> <-1, 0.5> <1, -0.5> 2 <1, -0.5> 2 -2 2 2 -2 A 2 A <1, -2> B <0.75, -2.5> B <-1, 2> <-1, 2> Resulting in blue coordinates changing.
  • 11. v1 System overview - Model updates 1) Receive event Rec. Updater (eg, Purchase) Engine 3) Write user 2a) Write Purchase edge and item 2b) Read other edges coordinates for this user and item Reco. DB Taste graph User -> coord Item -> coord
  • 12. v1 System overview - Rec serving 1) Page load Rec. Updater requests Engine recommendations 2) Rec. engine finds other cameras close to user’s 3) Recommendations coordinates shown to user Reco. DB Taste graph User -> coord Item -> coord
  • 13. v1 Taste Graph data size 40 billion edges 2 billion item nodes 200 million user nodes 5TB of data, takes up 10TB with Replication Factor of 2 We expect this to quadruple next year as we get more events and add new types of edges 13
  • 14. v1 Taste Graph DB configuration 32 Linux machines 128GB RAM 1TB iSCSI SSD 10 GigE NIC Cassandra version 1.0.8 8GB JVM heap space Size-tiered compaction strategy
  • 15. v1 Taste Graph schema User Edges (timestamp, edge_type, item_id) … user_id <empty> Item Edges (timestamp, edge_type, user_id) … item_id <empty> User Nodes tastevector user_id 200 bytes (50 floats) Item Nodes tastevector item_id 200 bytes (50 floats)
  • 16. v1 Real-time taste updates Edges and nodes read per second
  • 17. v1 Real-time taste updates Edges and nodes written per second