Svccg nosql 2011_v4

Netflix’s Transition to NoSQL P.1 Siddharth “Sid” Anand @r39132 Silicon Valley Cloud Computing Group February 2011

Special Thanks Sebastian Stadil – Scalr, SVCCG Meetup Organizer SriSatish Ambati – Datastax Lynn Bender – Geek Austin Scott MacVicar – Facebook 2 @r39132 - #netflixcloud

Motivation Netflix’s motivation for moving to the cloud

Motivation Circa late 2008, Netflix had a single data center Single-point-of-failure (a.k.a. SPOF) Approaching limits on cooling, power, space, traffic capacity Alternatives Build more data centers Outsource the majority of our capacity planning and scale out Allows us to focus on core competencies @r39132 - #netflixcloud 4

Motivation Winner: Outsource the majority of our capacity planning and scale out Leverage a leading Infrastructure-as-a-service provider Amazon Web Services Footnote : As it has taken us a while (i.e.~2+ years) to realize our vision of running on the cloud, we needed an interim solution to handle growth We did build a second data center along the way We did outgrow it 5 @r39132 - #netflixcloud

Cloud Migration Strategy What to Migrate?

Cloud Migration Strategy Components Applications and Software Infrastructure Data Migration Considerations Avoid sensitive data for now PII and PCI DSS stays in our DC, rest can go to the cloud Favor Web Scale applications & data @r39132 - #netflixcloud 7

Cloud Migration Strategy Examples of Data that can be moved Video-centric data Critics’ and Users’ reviews Video Metadata (e.g. director, actors, plot description, etc…) User-video-centric data – some of our largest data sets Video Queue Watched History Video Ratings (i.e. a 5-star rating system) Video Playback Metadata (e.g. streaming bookmarks, activity logs) @r39132 - #netflixcloud 8

Cloud Migration Strategy How and When to Migrate?

Cloud Migration Strategy High-level Requirements for our Site No big-bang migrations New functionality needs to launch in the cloud when possible High-level Requirements for our Data Data needs to migrate before applications Data needs to be shared between applications running in the cloud and our data center during the transition period @r39132 - #netflixcloud 10

Cloud Migration Strategy @r39132 - #netflixcloud 11

Cloud Migration Strategy Pick a (key-value) data store in the cloud Challenges Translate RDBMS concepts to KV store concepts Work-around Issues specific to the chosen KV store Create a bi-directional DC-Cloud data replication pipeline @r39132 - #netflixcloud 14

Pick a Data Store in the Cloud

Pick a Data Store in the Cloud An ideal storage solution should have the following features: ,[object Object]

Handles a majority of use-cases accessing high-growth, high-traffic data

Specifically, key access by customer id, movie id, or both@r39132 - #netflixcloud 16

Pick a Data Store in the Cloud We picked SimpleDB and S3 SimpleDB was targeted as the AP equivalent of our RDBMS databases in our Data Center S3 was used for data sets where item or row data exceeded SimpleDB limits and could be looked up purely by a single key (i.e. does not require secondary indices and complex query semantics) Video encodes Streaming device activity logs (i.e. CLOB, BLOB, etc…) Compressed (old) Rental History @r39132 - #netflixcloud 17

Technology Overview : SimpleDB @r39132 - #netflixcloud 19 Terminology

Technology Overview : SimpleDB @r39132 - #netflixcloud 20 SimpleDB’s salient characteristics ,[object Object]

SimpleDB domains are sparse and schema-less

The Key and all Attributes are indexed

Each item must have a unique Key

An item contains a set of Attributes

Each Attribute has a set of values

All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates),[object Object]

Technology Overview : SimpleDB @r39132 - #netflixcloud 22 Options available on reads and writes Consistent Read Read the most recently committed write May have lower throughput/higher latency/lower availability Conditional Put/Delete i.e. Optimistic Locking Useful if you want to build a consistent multi-master data store – you will still require your own anti-entropy We do not use this currently, so we don’t know how it performs

Challenge 1 Translate RDBMS Concepts to Key-Value Store Concepts

Translate RDBMS Concepts to Key-Value Store Concepts Relational Databases are known for relations First, a quick refresher on Normal forms @r39132 - #netflixcloud 24

Normalization NF1 : All occurrences of a record type must contain the same number of fields -- variable repeating fields and groups are not allowed NF2 : Second normal form is violated when a non-key field is a fact about a subset of a key Violated here Fixed here @r39132 - #netflixcloud 25

Normalization Issues Wastes Storage The warehouse address is repeated for every Part-WH pair Update Performance Suffers If the address of a warehouse changes, I must update every part in that warehouse – i.e. many rows Data Inconsistencies Possible I can update the warehouse address for one Part-WH pair and miss Parts for the same WH (a.k.a. update anomaly) Data Loss Possible An empty warehouse does not have a row, so the address will be lost. (a.k.a. deletion anomaly) @r39132 - #netflixcloud 26

Normalization RDBMS  KV Store migrations can’t simply accept denormalization! Especially many-to-many and many-to-one entity relationships Instead, pick your data set candidates carefully! Keep relational data in RDBMS Move key-look-ups to KV stores Luckily for Netflix, most Web Scale data is accessed by Customer, Video, or both i.e. Key Lookups that do not violate 2NF or 3NF @r39132 - #netflixcloud 27

Translate RDBMS Concepts to Key-Value Store Concepts Aside from relations, relational databases typically offer the following: Transactions Locks Sequences Triggers Clocks A structured query language (i.e. SQL) Database server-side coding constructs (i.e. PL/SQL) Constraints @r39132 - #netflixcloud 28

Translate RDBMS Concepts to Key-Value Store Concepts Partial or no SQL support (e.g. no Joins, Group Bys, etc…) BEST PRACTICE Carry these out in the application layer for smallish data No relations between domains BEST PRACTICE Compose relations in the application layer No transactions BEST PRACTICE SimpleDB : Conditional Put/Delete (best effort) w/ fixer jobs Cassandra : Batch Mutate + the same column TS for all writes @r39132 - #netflixcloud 29

Translate RDBMS Concepts to Key-Value Store Concepts No schema - This is non-obvious. A query for a misspelled attribute name will not fail with an error BEST PRACTICE Implement a schema validator in a common data access layer No sequences BEST PRACTICE Sequences are often used as primary keys In this case, use a naturally occurring unique key If no naturally occurring unique key exists, use a UUID Sequences are also often used for ordering Use a distributed sequence generator or rely on client timestamps @r39132 - #netflixcloud 30

Translate RDBMS Concepts to Key-Value Store Concepts No clock operations, PL/SQL, Triggers BEST PRACTICE Clocks : Instead rely on client-generated clocks and run NTP. If using clocks to determine order, be aware that this is problematic over long distances. PL/SQL, Triggers : Do without No constraints. Specifically, No uniqueness constraints No foreign key or referential constraints No integrity constraints BEST PRACTICE Applications must implement this functionality @r39132 - #netflixcloud 31

Challenge 2 Work-around Issues specific to the chosen KV store SimpleDB

Work-around Issues specific to the chosen KV store Missing / Strange Functionality No back-up and recovery No native support for types (e.g. Number, Float, Date, etc…) You cannot update one attribute and null out another one for an item in a single API call Mis-cased or misspelled attribute names in operations fail silently. Why is SimpleDB case-sensitive? Neglecting "limit N" returns a subset of information. Why does the absence of an optional parameter not return all of the data? Users need to deal with data set partitioning Beware of Nulls Write throughput not as high as we need for certain use-cases @r39132 - #netflixcloud 33

Work-around Issues specific to the chosen KV store No Native Types – Sorting, Inequalities Conditions, etc… Since sorting is lexicographical, if you plan on sorting by certain attributes, then zero-pad logically-numeric attributes e.g. – 000000000000000111111  this is bigger 000000000000000011111 use Joda time to store logical dates e.g. – 2010-02-10T01:15:32.864Z this is more recent 2010-02-10T01:14:42.864Z @r39132 - #netflixcloud 34

Work-around Issues specific to the chosen KV store Anti-pattern : Avoid the anti-pattern Select SOME_FIELD_1 from MY_DOMAIN where SOME_FIELD_2 is null as this is a full domain scan Nulls are not indexed in a sparse-table BEST PRACTICE Instead, replace this check with a (indexed) flag column called IS_FIELD_2_NULL: Select SOME_FIELD_1 from MY_DOMAIN where IS_FIELD_2_NULL = 'Y' Anti-pattern : When selecting data from a domain and sorting by an attribute, items missing that attribute will not be returned In Oracle, rows with null columns are still returned BEST PRACTICE Use a flag column as shown previously @r39132 - #netflixcloud 35

Work-around Issues specific to the chosen KV store BEST PRACTICE : Aim for high index selectivity when you formulate your select expressions for best performance SimpleDB select performance is sensitive to index selectivity Index Selectivity Definition : # of distinct attribute values in specified attribute/# of items in domain e.g. Good Index Selectivity (i.e. 1 is the best) A table having 100 records and one of its indexed column has 88 distinct values, then the selectivity of this index is 88 / 100= 0.88 e.g. Bad Index Selectivity lf an index on a table of 1000 records had only 5 distinct values, then the index's selectivity is 5 / 1000 = 0.005 @r39132 - #netflixcloud 36

Work-around Issues specific to the chosen KV store Sharding Domains There are 2 reasons to shard domains You are trying to avoid running into one of the sizing limits e.g. 10GB of space or 1 Billion Attributes You are trying to scale your writes To scale your writes further, use BatchPutAttributes and BatchDeleteAttributes where possible @r39132 - #netflixcloud 37

Challenge 3 Create a Bi-directional DC-Cloud Data Replication Pipeline

Create a Bi-directional DC-Cloud Data Replication Pipeline Home-grown Data Replication Framework known as IR for Item Replication 2 schemes in use currently Polls the main table (a.k.a. Simple IR) Doesn’t capture deletes but easy to implement Polls a journal table that is populated via a trigger on the main table (a.k.a. Trigger-journaled IR) Captures every CRUD, but requires the development of triggers @r39132 - #netflixcloud 39

Create a Bi-directional DC-Cloud Data Replication Pipeline @r39132 - #netflixcloud 40

Create a Bi-directional DC-Cloud Data Replication Pipeline How often do we poll Oracle? Every 5 seconds What does the poll query look like? select * from QLOG_0 where LAST_UPDATE_TS > :CHECKPOINT  Get recent and LAST_UPDATE_TS < :NOW_MINUS_30s  Exclude most recent order by LAST_UPDATE_TS  Process in order @r39132 - #netflixcloud 41

Create a Bi-directional DC-Cloud Data Replication Pipeline Data Replication Challenges & Best Practices SimpleDB throttles traffic aggressively via 503 HTTP Response codes (“Service Unavailable”) With Singleton writes, I see 70-120 write TPS/domain IR Shard domains (i.e. partition data sets) to work-around these limits Employs Slow ramp up Uses BatchPutAttributes instead of (Singleton) PutAttributes call Exercises an exponential bounded-back-off algorithm Uses attribute-level replace=false when fork-lifting data @r39132 - #netflixcloud 42

Netflix’s Transition to NoSQL P.1.5 Cassandra

Data Model : Cassandra @r39132 - #netflixcloud 45 Terminology

Data Model : Cassandra @r39132 - #netflixcloud 46

Data Model : Cassandra @r39132 - #netflixcloud 47

APIs for Reads Reads I want to continue watching Tron from where I left off (quorum reads)? datastore.get(“Netflix”, ”Sid_Anand”, Streaming Bookmarks  Tron , ConsistencyLevel.QUORUM) When did the True Grit DVD get shipped and returned (fastest read)? datastore.get_slice(“Netflix”, ”Sid_Anand”, (DVD) Rental History  5678, [“Ship_TS”, “Return_TS”], ConsistencyLevel.ONE) How many DVD have been shipped to me (fastest read)? datastore.get_count(“Netflix”, ”Sid_Anand”, (DVD) Rental History, ConsistencyLevel.ONE) @r39132 - #netflixcloud 49

APIs for Writes Writes Replicate Netflix Hub Operation Shipments as Batched Writes : True Grit and Tron shipped together to Sid datastore.batch_mutate(“Netflix”, mutation_map, ConsistencyLevel.QUORUM) @r39132 - #netflixcloud 50

The Promise of Performance @r39132 - #netflixcloud 52

The Promise of Performance Manage Reads Rows and Top-Level Columns are stored and indexed in sorted order giving logarithmic time complexity for look up These help Bloom Filters at the Row Level Key Cache Large OS Page Cache These do not help Disk seeks on reads It gets worse with more row-redundancy across SSTables  Compaction is a necessary evil Compaction wipes out the Key Cache @r39132 - #netflixcloud 53

The Promise of Performance @r39132 - #netflixcloud 54

Svccg nosql 2011_v4

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Svccg nosql 2011_v4

Similar to Svccg nosql 2011_v4 (20)

More from Sid Anand

More from Sid Anand (20)

Svccg nosql 2011_v4

Editor's Notes