This talk focuses on Netflix's transition from Oracle to SimpleDB -- a cloud-hosted, key-value store -- during Netflix's transition to the cloud (i.e. AWS). Stay tuned for future talks as Netflix evaluates more technologies, e.g. Cassandra.
3. Why Are You Here?
”What I need is an exact list of specific unknown
problems we might encounter."
-- anonymous
@r39132 - #netflixcloud 3
4.
5. Motivation
Circa late 2008, Netflix had a single data center
Single-point-of-failure (a.k.a. SPOF)
Approaching limits on cooling, power, space, traffic
capacity
Alternatives
Build more data centers
Outsource the majority of our capacity planning and
scale out
@r39132 - #netflixcloud 5
6. Motivation
Winner : Outsource the majority of our capacity planning and
scale out
Leverage a leading Infrastructure-as-a-service provider
Amazon Web Services
Footnote : As it has taken us a while (i.e. ~2+ years) to realize
our vision of running on the cloud, we needed a interim solution
to handle growth
We did build a second data center along the way
We did outgrow it
6@r39132 - #netflixcloud
7.
8. Cloud Migration Strategy
Components
Applications and Software Infrastructure
Data
Migration Considerations
Security
PII and PCI DSS stays in our DC, rest can go to the cloud
Scalability and Availability for Business Success
@r39132 - #netflixcloud 8
9. Cloud Migration Strategy
Scalability and Availability for Business Success
High Growth or High Traffic Growth Data
Video starts, Personalized Video choosing
High Traffic Growth Applications
Same as above
Log Processing
Time-to-market Critical Batch Processing
Video encoding
Not Included
DVD inventory and shipment
We are a streaming company that also ships DVD
@r39132 - #netflixcloud 9
10. Cloud Migration Strategy
Examples of Data that can be moved
Video-centric data
Critics’ reviews
Metadata
User-video-centric data – some of our largest data sets
User-video queue
Previously streamed and shipped video history
Ratings (i.e. a 5-star rating system)
Video streaming metadata (e.g. streaming bookmarks)
@r39132 - #netflixcloud 10
11.
12. Cloud Migration Strategy
High-level Requirements for our Site
No big-bang migrations
New functionality needs to launch in the cloud when
possible
High-level Requirements for our Data
Data needs to migrate before applications
Data needs to be shared between applications running in
the cloud and our data center during the transition period
@r39132 - #netflixcloud 12
14. Cloud Migration Strategy
Low-level Requirements for our Data
Pick a (key-value) data store in the cloud
Challenges
Translate RDBMS concepts to KV store concepts
Work-around Issues specific to the chosen KV store
Create a bi-directional DC-Cloud data replication
pipeline
@r39132 - #netflixcloud 14
15.
16. Pick a Data Store in the Cloud
An ideal storage solution should have the following features:
Hosted
Managed Distribution Model
Works in AWS
AP from CAP
Handles a majority of use-cases accessing high-growth, high-traffic data
Specifically, key access by customer id, movie id, or both
@r39132 - #netflixcloud 16
17. Pick a Data Store in the Cloud
We picked SimpleDB and S3
SimpleDB was targeted as the AP equivalent of our RDBMS
databases in our Data Center
S3 was used for data sets where item or row data
exceeded SimpleDB limits and could be looked up purely
by a single key (i.e. does not require secondary indices and
complex query semantics)
Video encodes
Streaming device activity logs (i.e. CLOB, BLOB, etc…)
Compression of old Rental History
@r39132 - #netflixcloud 17
18.
19. Technology Overview : SimpleDB
SimpleDB Hash Table Relational Databases
Domain Hash Table Table
Item Entry Row
Item Name Key Mandatory Primary Key
Attribute Part of the Entry Value Column
@r39132 - #netflixcloud 19
Terminology
20. Technology Overview : SimpleDB
@r39132 - #netflixcloud 20
Soccer Players
Key Value
ab12ocs12v9 First Name = Harold Last Name = Kewell
Nickname = Wizard of
Oz
Teams = Leeds United,
Liverpool, Galatasaray
b24h3b3403b First Name = Pavel Last Name = Nedved
Nickname = Czech
Cannon
Teams = Lazio,
Juventus
cc89c9dc892 First Name = Cristiano Last Name = Ronaldo
Teams = Sporting,
Manchester United,
Real Madrid
SimpleDB’s salient characteristics
• SimpleDB offers a range of consistency options
• SimpleDB domains are sparse and schema-less
• The Key and all Attributes are indexed
• Each item must have a unique Key
• An item contains a set of Attributes
• Each Attribute has a name
• Each Attribute has a set of values
• All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates)
21. Technology Overview : SimpleDB
What does the API look like?
Manage Domains
CreateDomain
DeleteDomain
ListDomains
DomainMetaData
Access Data
Retrieving Data
GetAttributes – returns a single item
Select – returns multiple items using SQL syntax
Writing Data
PutAttributes – put single item
BatchPutAttributes – put multiple items
Removing Data
DeleteAttributes – delete single item
BatchDeleteAttributes – delete multiple items
@r39132 - #netflixcloud 21
22. Technology Overview : SimpleDB
@r39132 - #netflixcloud 22
Options available on reads and writes
Consistent Read
Read the most recently committed write
May have lower throughput/higher latency/lower
availability
Conditional Put/Delete
i.e. Optimistic Locking
Useful if you want to build a consistent multi-master data
store – you will still require your own anti-entropy
We do not use this currently, so we don’t know how it
performs
23.
24. Translate RDBMS Concepts to Key-Value Store
Concepts
Relational Databases are known for relations
First, a quick refresher on Normal forms
@r39132 - #netflixcloud 24
25. Normalization
NF1 : All occurrences of a record type must contain the same number of
fields -variable repeating fields and groups are not allowed
NF2 : Second normal form is violated when a non-key field is a fact about
a subset of a key
Violated here
Fixed here
@r39132 - #netflixcloud 25
Part Warehouse Quantity Warehouse-
Address
Part Warehouse Quantity Warehouse Warehouse-
Address
26. Normalization
Issues
Wastes Storage
The warehouse address is repeated for every Part-WH pair
Update Performance Suffers
If the address of the warehouse changes, I must update
many Part-WH pairs
Data inconsistencies possible
I can update the warehouse address for one Part-WH pair
and miss Parts for the same WH
Data Loss Possible
If at some point in time there are no parts, the WH address
will be lost
@r39132 - #netflixcloud 26
27. Normalization
RDBMS KV Store migrations can’t simply accept
denormalization!
Especially many-to-many and many-to-one entity relationships
Instead, pick your data set candidates carefully!
Keep relational data in RDBMS
Move key-look-ups to KV stores
Luckily for Netflix, most data is accessed by Customer, Video,
or both : i.e. Key Lookups
@r39132 - #netflixcloud 27
28. Translate RDBMS Concepts to Key-Value Store
Concepts
Aside from relations, relational databases typically
offer the following:
Transactions
Locks
Sequences
Triggers
Clocks
A structured query language (i.e. SQL)
Database server-side coding constructs (i.e. PL/SQL)
Constraints
@r39132 - #netflixcloud 28
29. Translate RDBMS Concepts to Key-Value Store
Concepts
Partial or no SQL support. Loosely-speaking, SimpleDB supports a
subset of SQL
BEST PRACTICE
Do GROUP BY and JOIN operations in the application layer
involving smallish data sets
No relations between domains
BEST PRACTICE
Compose relations in the application layer
No transactions
BEST PRACTICE
Use SimpleDB’s Optimistic Concurrency Control API: ConditionalPut
and ConditionalDelete
@r39132 - #netflixcloud 29
30. Translate RDBMS Concepts to Key-Value Store
Concepts
No schema - This is non-obvious. A query for a misspelled attribute
name will not fail with an error
BEST PRACTICE
Implement a schema validator in a common data access layer
No sequences
BEST PRACTICE
Sequences are often used as primary keys
In this case, use a naturally occurring unique key
If no naturally occurring unique key exists, use a UUID
Sequences are also often used for ordering
Use a distributed sequence generator
@r39132 - #netflixcloud 30
31. Translate RDBMS Concepts to Key-Value Store
Concepts
No clock operations, PL/SQL, Triggers
BEST PRACTICE
Do without
No constraints. Specifically,
No uniqueness constraints
No foreign key or referential constraints
No integrity constraints
BEST PRACTICE
Read Repair and Anti-entropy processes using Conditional
Put/Delete
@r39132 - #netflixcloud 31
32.
33. Work-around Issues specific to the chosen KV
store
Missing / Strange Functionality
No back-up and recovery
No native support for types (e.g. Number, Float, Date, etc…)
You cannot update one attribute and null out another one for an
item in a single API call
Mis-cased or misspelled attribute names in operations fail silently.
Why is SimpleDB case-sensitive?
Neglecting "limit N" returns a subset of information. Why does the
absence of an optional parameter not return all of the data?
Users need to deal with data set partitioning
Beware of Nulls
Poor Performance
@r39132 - #netflixcloud 33
34. Work-around Issues specific to the chosen KV
store
No Native Types – Sorting, Inequalities Conditions,
etc…
Since sorting is lexicographical, if you plan on sorting by certain
attributes, then
zero-pad logically-numeric attributes
e.g. –
000000000000000111111 this is bigger
000000000000000011111
use Joda time to store logical dates
e.g. –
2010-02-10T01:15:32.864Z this is more recent
2010-02-10T01:14:42.864Z
@r39132 - #netflixcloud 34
35. Work-around Issues specific to the chosen KV
store
Anti-pattern : Avoid the anti-pattern Select SOME_FIELD_1 from
MY_DOMAIN where SOME_FIELD_2 is null as this is a full domain
scan
Nulls are not indexed in a sparse-table
BEST PRACTICE
Instead, replace this check with a (indexed) flag column
called IS_FIELD_2_NULL: Select SOME_FIELD_1 from
MY_DOMAIN where IS_FIELD_2_NULL = 'Y'
Anti-pattern : When selecting data from a domain and sorting by an
attribute, items missing that attribute will not be returned
In Oracle, rows with null columns are still returned
BEST PRACTICE
Use a flag column as shown previously
@r39132 - #netflixcloud 35
36. Work-around Issues specific to the chosen KV
store
BEST PRACTICE : Aim for high index selectivity when you formulate
your select expressions for best performance
SimpleDB select performance is sensitive to index selectivity
Index Selectivity
Definition : # of distinct attribute values in specified attribute /
# of items in domain
e.g. Good Index Selectivity (i.e. 1 is the best)
A table having 100 records and one of its indexed column
has 88 distinct values, then the selectivity of this index is
88 / 100= 0.88
e.g. Bad Index Selectivity
lf an index on a table of 1000 records had only 5 distinct
values, then the index's selectivity is 5 / 1000 = 0.005
@r39132 - #netflixcloud 36
37. Work-around Issues specific to the chosen KV
store
Sharding Domains
There are 2 reasons to shard domains
You are trying to avoid running into one of the sizing limits
e.g. 10GB of space or 1 Billion Attributes
You are trying to scale your writes
To scale your writes further, use BatchPutAttributes and
BatchDeleteAttributes where possible
@r39132 - #netflixcloud 37
38.
39. Create a Bi-directional DC-Cloud Data
Replication Pipeline
Home-grown Data Replication Framework known as IR for Item
Replication
2 schemes in use currently
Polls the main table (a.k.a. Simple IR)
Doesn’t capture deletes but easy to implement
Polls a journal table that is populated via a trigger on the
main table (a.k.a. Trigger-journaled IR)
Captures every CRUD, but requires the development
of triggers
@r39132 - #netflixcloud 39
41. Create a Bi-directional DC-Cloud Data
Replication Pipeline
How often do we poll Oracle?
Every 5 seconds
What does the poll query look like?
select *
from QLOG_0
where LAST_UPDATE_TS > :CHECKPOINT Get recent
and LAST_UPDATE_TS < :NOW_MINUS_30s Exclude
most recent
order by LAST_UPDATE_TS Process in order
@r39132 - #netflixcloud 41
42. Create a Bi-directional DC-Cloud Data
Replication Pipeline
Data Replication Challenges & Best Practices
SimpleDB throttles traffic aggressively via 503 HTTP Response
codes (“Service Unavailable”)
With Singleton writes, I see 70-120 write TPS/domain
IR
Shard domains (i.e. partition data sets) to work-around these limits
Employs Slow ramp up
Uses BatchPutAttributes instead of (Singleton) PutAttributes call
Exercises an exponential bounded-back-off algorithm
Uses attribute-level replace=false when fork-lifting data
@r39132 - #netflixcloud 42
44. Create a Bi-directional DC-Cloud Data
Replication Pipeline
Data Replication Challenges & Best Practices
Implementing Multi-mastering and an Eventually-consistent
Replication Pipeline
SimpleDB offers optimistic concurrency control in the form of
conditional put (and deletes)
For our data, it is ok to be “consistent, but not accurate”
With this relaxation, we do not need to be concerned with
synchronizing logical clocks
We simply just need to ensure that each conditional put puts a large
strictly increasing value into the “version” column
@r39132 - #netflixcloud 44
Editor's Notes
Existing functionality needs to move in phases
Limits the risk and exposure to bugs
Limits conflicts with new product launches
Existing functionality needs to move in phases
Limits the risk and exposure to bugs
Limits conflicts with new product launches
Existing functionality needs to move in phases
Limits the risk and exposure to bugs
Limits conflicts with new product launches
Dynamo storage doesn’t suffer from this!
This is an issue with any SQL-like Query layer over a Sparse-data model. It can happen in other technologies.
Cannot treat SimpleDB like a black-box for performance critical applications.
We found the write availability was affected by the right partitioning scheme. We use a combination of forwarding tables and modulo addressing
Mention trickle lift
We like that it is available, hosted, and managed.
We don’t like the performance issues
We are looking into Cassandra and other KV stores