Experimentation to Productization : Building a Dynamic Bidding system for a location aware Ecosystem, Slides from my Fifth Elephant talk, Bangalore, 2014
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Building a Dynamic Bidding system for a location based Display advertising Platform
1. Experimentation to Productization
of a Location based Dynamic Bidding system
Ekta Grover
Data Scientist, AdNear
26th July, 2014
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 1/22
2. Structure of this Talk
Introduction to a Real Time Bidder(RTB)
System design to deliver performance at scale
Three specific Data products that we built
Building a low latency self learning system
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 2/22
3. The Data we get our hands on
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 3/22
5. What happens when a mobile user logs in
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 5/22
6. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
7. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
8. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
9. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
10. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
11. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
12. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
13. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
14. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
15. System design
Simulation
A/B testing
framework
Reporting
Data products &
Experimentation
Bidder
Spark-In-memory
processing
of logs in δt
Update snap-
shots in Redis
to consume
(Multiple) Kafka
consumers
Access Busi-
ness risk
target&control
groups
Parse json
logs & dump
to Spark
Feedback Loop
Dumpraw
Jsonlogsvia
consumers
Experiments
run live
Livefeeds
Bidder gets all
attributes it needs
Online experimentation at Microsoft - Kohavi, Crook, Longbotham(2009)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 6/22
32. What signals can we extract from a weblog(and
more..)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 9/22
33. Problem # 1: Dynamic bidding system
Guiding principle
Price that we bid at should reflect the quality of inventory
probclick = function(Engagementapp,appcategory ,
SessionContextdepth,length,
Engagementcreativeattributes,
Engagementvertical ,
Engagementuserprofile,collaborativeprofile,
EngagementHandsetattributes,
Timeday,week,seasonality )
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 10/22
34. Problem # 1 : And, hence the price should reflect this
Quality
price|probclick = Constant1 ∗ (0 ≤ probclick ≤ threshold1)+
Constant2 ∗ (threshold1 ≤ probclick ≤ threshold2)+
Constant3 ∗ (threshold2 ≤ probclick ≤ threshold3)
Modelled as a logistic regression with L1 regularization1 with
bagging
Converges & scales faster for large datasets: Use the start˙params
from the last optimization call - Better fit & AUC
1
Bid optimizing and inventory scoring in targeted online advertising - Perlich, Dalessandro, Hook, Stitelman,
Raeder, Provost(2012)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 11/22
36. Problem #2 :Setting the context
Chih Lee,Jalali,Dasdan: Real time bid optimization with smooth budget delivery in online
advertising(ADKDD, 2013)
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 13/22
37. Problem #2: Comprehensive Mobile App-Ranking system
Primary goal:
Capture the cream in the apps, at the right time, for a right price
Conventional approach : Scrape the # downloads, appcat,
in-app-purchases, trends - Lot of noise !
Key observations : Peculiarity in apps wrt time of day, win
rate & demand signals
Combat Winner’s curse - Uncover a right spread for the
price to bid, so we the bid reflects the of click, and at a right
price
The Sweet Spot - tames the market in long turn by shaving
off the bid price & helps in price discovery
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 14/22
38. Problem #2: How we approached this problem
Guiding principle:
An app that ”delivers” should be in high-demand, and hence
”should” show up with low win rate in the live feeds
Stage 1:
H0 : CTR depends on Winrate, BidFloor, PriceSpread, Density, Category
CTRappidi ,timet
= function(Winrateappidi ,timet
, Bidfloorappidi ,timet
,
PriceSpreadappidi ,timet
, densityappidi ,timet
, Categoryappidi ,timet
)
δappidi ,time(t+1) = function(Winrateappidi ,timet
, Bidfloorappidi ,timet
CTRappidi ,timet
, densityappidi ,timet
, Categoryappidi ,timet
)
BidPriceappidi ,time(t+1) = δappidi ,timet
+ Winpriceappidi ,timet
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 15/22
39. Problem #2 : But Mobile Apps have their own nuances
The ”outcome” class is yet to come
Low win rate could also be because of expectation of high
CTR
Stage 1 is tightly coupled with campaign budgets
Over penalizes a rock star app with too few moments of
truth in the last snapshot
Other idiosyncracies, broken input pipes, incoherent data
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 16/22
40. Problem #2 cont..
Stage 2: Decay factor of 80% every hour
This means that the signal is only 26.2% informative after 6 time periods
CTRappidi ,timet
= function((0.80)1
∗ CTRappidi ,timet−1
,
(0.80)2
∗ CTRappidi ,timet−2 , (0.80)3
∗ CTRappidi ,timet−3 ,
(0.80)4
∗ CTRappidi ,timet−4
, (0.80)5
∗ CTRappidi ,timet−5
)
Stage 3: Different time periods decay differently, for each appidi
CTRappidi ,timet
= function(CTRappidi ,timet−1
, CTRappidi ,timet−2
,
CTRappidi ,timet−3
, CTRappidi ,timet−4
,
Winrateappidi ,timet
, Winrateappidi ,timet−1
,
Winrateappidi ,timet−2
, Winrateappidi ,timet−3
,
Winrateappidi ,timet−4
, Winrateappidi ,timet−5
,
categoryappidi
, densityappidi ,timet
..)
Trade-off: dampens the CTR signal, while cushioning for system
failures, broken pipes & outliers
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 17/22
41. Problem #2 : But still has another limitation
Chicken & Egg Problem !
We need to have sufficient mass of each mobile application. Enter
Pooled learning algorithms - Hybrid of Fuzzy, Levenshtein distance
Distance metric helps map the performing & non-performing
apps from mutiple exchanges
which means, we have larger ”support”
And, we create data points that are better than blind/naive
bidding strategy
Can we reduce the candidate set? Lot of Bookkeeping to
maintain the appids across exchanges
What we actually implemented : Hybrid approach of all these
models and Iterate multiple times over !
Levenshtein with C bindings & pandas itertuples
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 18/22
42. Problem #3: User-mobility patterns to generate user
profiles
Guiding principle
Generate a probabilistic of picture activity patterns & affinity towards
activities
Data nodes : Users, Categories of places checked in, Category of
Apps
Represent this as a bipartite graph, then just need to get the top-k,
or activate k segments over a certain critical mass
Can we do better ?
Need to get this for each lat-long, once - Memoization & Book-
keeping
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 19/22
43. Problem #3: Representation
User 1
User 2
User 3
User 4
Category6
Category7
Category8
Category9
Movies
Entertainment
Arts
Workplace
Food
Users
App category
Checked in
0.0568
0.0043
0.0029
0.0091
0.0033
0.0903
0.0903
0.0953
0.456
0.0667
0.0867
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 20/22
44. Problem #3 :Why Graphs?
Tractability of the problem
Interesting properties : v∈V deg+, deg−, sink, joining
communities
Abstraction & reusability - Multiple ways of Similarity of
Apps, Users, Places
Behavior Dilution & Ghost clicks
Better Hypothesis - maturing your data Products
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 21/22
45. What we learnt: Build a self learning, assisted healing
system
Own the Statistics
Cover the base-line : Universal sink
Log everything
Forward lookup - Abstract the error & Try-catch
Proximity to data producer
Reuse data Prep cycles, Better still productize it
Loose couple each system - fail fast & forward
Feature engineering & Meta-data Engine - It’s more than just
YOUR data
@ektagrover(Twitter)/ekta1007@gmail.com The Fifth Elephant, 2014 22/22