SlideShare a Scribd company logo
1 of 65
Netflix’s Transition to NoSQL P.1 Siddharth “Sid” Anand @r39132 Silicon Valley Cloud Computing Group  February 2011
Special Thanks Sebastian Stadil – Scalr, SVCCG Meetup Organizer SriSatish Ambati – Datastax Lynn Bender – Geek Austin Scott MacVicar – Facebook 2 @r39132 - #netflixcloud
Motivation Netflix’s motivation for moving to the cloud
Motivation Circa late 2008, Netflix had a single data center Single-point-of-failure (a.k.a. SPOF)  Approaching limits on cooling, power, space, traffic capacity Alternatives Build more data centers Outsource the majority of our capacity planning and scale out  Allows us to focus on core competencies @r39132 - #netflixcloud 4
Motivation Winner: Outsource the majority of our capacity planning and scale out Leverage a leading Infrastructure-as-a-service provider  Amazon Web Services Footnote : As it has taken us a while (i.e.~2+ years) to realize our vision of running on the cloud, we needed an interim solution to handle growth We did build a second data center along the way We did outgrow it 5 @r39132 - #netflixcloud
Cloud Migration Strategy What to Migrate?
Cloud Migration Strategy Components Applications and Software Infrastructure Data Migration Considerations Avoid sensitive data for now PII and PCI DSS stays in our DC, rest can go to the cloud Favor Web Scale applications & data @r39132 - #netflixcloud 7
Cloud Migration Strategy Examples of Data that can be moved Video-centric data Critics’ and Users’ reviews  Video Metadata (e.g. director, actors, plot description, etc…) User-video-centric data – some of our largest data sets Video Queue Watched History Video Ratings (i.e. a 5-star rating system) Video Playback Metadata (e.g. streaming bookmarks, activity logs) @r39132 - #netflixcloud 8
Cloud Migration Strategy How and When to Migrate?
Cloud Migration Strategy High-level Requirements for our Site No big-bang migrations   New functionality needs to launch in the cloud when possible High-level Requirements for our Data  Data needs to migrate before applications Data needs to be shared between applications running in the cloud and our data center during the transition period @r39132 - #netflixcloud 10
Cloud Migration Strategy @r39132 - #netflixcloud 11
Cloud Migration Strategy @r39132 - #netflixcloud 12
Cloud Migration Strategy @r39132 - #netflixcloud 13
Cloud Migration Strategy Pick a (key-value) data store in the cloud Challenges Translate RDBMS concepts to KV store concepts Work-around Issues specific to the chosen KV store  Create a bi-directional DC-Cloud data replication pipeline  @r39132 - #netflixcloud 14
Pick a Data Store in the Cloud
Pick a Data Store in the Cloud An ideal storage solution should have the following features: ,[object Object]
Managed Distribution Model
Works in AWS
AP from CAP
Handles a majority of use-cases accessing high-growth, high-traffic data
Specifically, key access by customer id, movie id, or both@r39132 - #netflixcloud 16
Pick a Data Store in the Cloud We picked SimpleDB and S3 SimpleDB was targeted as the AP equivalent of our RDBMS databases in our Data Center S3 was used for data sets where item or row data exceeded SimpleDB limits and could be looked up purely by a single key (i.e. does not require secondary indices and complex query semantics) Video encodes Streaming device activity logs (i.e. CLOB, BLOB, etc…) Compressed (old) Rental History @r39132 - #netflixcloud 17
Technology Overview SimpleDB
Technology Overview : SimpleDB @r39132 - #netflixcloud 19 Terminology
Technology Overview : SimpleDB @r39132 - #netflixcloud 20 SimpleDB’s salient characteristics ,[object Object]
 SimpleDB domains are sparse and schema-less
 The Key and all Attributes are indexed
 Each item must have a unique Key
 An item contains a set of Attributes
Each Attribute has a name
 Each Attribute has a set of values
 All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates),[object Object]
Technology Overview : SimpleDB @r39132 - #netflixcloud 22 Options available on reads and writes Consistent Read  Read the most recently committed write  May have lower throughput/higher latency/lower availability Conditional Put/Delete i.e. Optimistic Locking  Useful if you want to build a consistent multi-master data store – you will still require your own anti-entropy  We do not use this currently, so we don’t know how it performs
Challenge 1 Translate RDBMS Concepts to Key-Value Store Concepts
Translate RDBMS Concepts to Key-Value Store Concepts Relational Databases are known for relations First, a quick refresher on Normal forms @r39132 - #netflixcloud 24
Normalization NF1 : All occurrences of a record type must contain the same number of fields -- variable repeating fields and groups are not allowed NF2 : Second normal form is violated when a non-key field is a fact about a subset of a key Violated here Fixed here @r39132 - #netflixcloud 25
Normalization Issues Wastes Storage The warehouse address is repeated for every Part-WH pair Update Performance Suffers If the address of a warehouse changes, I must update every part in that warehouse – i.e. many rows Data Inconsistencies Possible I can update the warehouse address for one Part-WH pair and miss Parts for the same WH (a.k.a. update anomaly) Data Loss Possible An empty warehouse does not have a row, so the address will be lost. (a.k.a. deletion anomaly) @r39132 - #netflixcloud 26
Normalization RDBMS  KV Store migrations can’t simply accept denormalization! Especially many-to-many and many-to-one entity relationships Instead, pick your data set candidates carefully! Keep relational data in RDBMS	 Move key-look-ups to KV stores Luckily for Netflix, most Web Scale data is accessed by Customer, Video, or both   i.e. Key Lookups that do not violate 2NF or 3NF @r39132 - #netflixcloud 27
Translate RDBMS Concepts to Key-Value Store Concepts Aside from relations, relational databases typically offer the following: Transactions Locks Sequences Triggers Clocks A structured query language (i.e. SQL) Database server-side coding constructs (i.e. PL/SQL) Constraints @r39132 - #netflixcloud 28
Translate RDBMS Concepts to Key-Value Store Concepts Partial or no SQL support (e.g. no Joins, Group Bys, etc…) BEST PRACTICE Carry these out in the application layer for smallish data No relations between domains BEST PRACTICE Compose relations in the application layer No transactions BEST PRACTICE SimpleDB : Conditional Put/Delete (best effort) w/ fixer jobs Cassandra : Batch Mutate + the same column TS for all writes  @r39132 - #netflixcloud 29
Translate RDBMS Concepts to Key-Value Store Concepts No schema - This is non-obvious. A query for a misspelled attribute name will not fail with an error BEST PRACTICE Implement a schema validator in a common data access layer No sequences BEST PRACTICE Sequences are often used as primary keys In this case, use a naturally occurring unique key If no naturally occurring unique key exists, use a UUID Sequences are also often used for ordering Use a distributed sequence generator or rely on client timestamps @r39132 - #netflixcloud 30
Translate RDBMS Concepts to Key-Value Store Concepts No clock operations, PL/SQL, Triggers BEST PRACTICE Clocks : Instead rely on client-generated clocks and run NTP. If using clocks to determine order, be aware that this is problematic over long distances. PL/SQL, Triggers : Do without No constraints. Specifically,  No uniqueness constraints No foreign key or referential constraints No integrity constraints BEST PRACTICE Applications must implement this functionality @r39132 - #netflixcloud 31
Challenge 2 Work-around Issues specific to the chosen KV store  SimpleDB
Work-around Issues specific to the chosen KV store Missing / Strange Functionality No back-up and recovery No native support for types (e.g. Number, Float, Date, etc…) You cannot update one attribute and null out another one for an item in a single API call Mis-cased or misspelled attribute names in operations fail silently. Why is SimpleDB case-sensitive? Neglecting "limit N" returns a subset of information. Why does the absence of an optional parameter not return all of the data? Users need to deal with data set partitioning Beware of Nulls Write throughput not as high as we need for certain use-cases @r39132 - #netflixcloud 33
Work-around Issues specific to the chosen KV store No Native Types – Sorting, Inequalities Conditions, etc… Since sorting is lexicographical, if you plan on sorting by certain attributes, then zero-pad logically-numeric attributes e.g. –  000000000000000111111  this is bigger 000000000000000011111 use Joda time to store logical dates e.g. –  2010-02-10T01:15:32.864Z this is more recent 2010-02-10T01:14:42.864Z @r39132 - #netflixcloud 34
Work-around Issues specific to the chosen KV store Anti-pattern : Avoid the anti-pattern Select SOME_FIELD_1 from MY_DOMAIN where SOME_FIELD_2 is null as this is a full domain scan Nulls are not indexed in a sparse-table BEST PRACTICE Instead, replace this check with a (indexed) flag column called IS_FIELD_2_NULL: Select SOME_FIELD_1 from MY_DOMAIN where IS_FIELD_2_NULL = 'Y' Anti-pattern : When selecting data from a domain and sorting by an attribute, items missing that attribute will not be returned In Oracle, rows with null columns are still returned BEST PRACTICE Use a flag column as shown previously @r39132 - #netflixcloud 35
Work-around Issues specific to the chosen KV store BEST PRACTICE : Aim for high index selectivity when you formulate your select expressions for best performance SimpleDB select performance is sensitive to index selectivity Index Selectivity Definition : # of distinct attribute values in specified attribute/# of items in domain e.g. Good Index Selectivity (i.e. 1 is the best) A table having 100 records and one of its indexed column has 88 distinct values, then the selectivity of this index is 88 / 100= 0.88 e.g. Bad Index Selectivity lf an index on a table of 1000 records had only 5 distinct values, then the index's selectivity is 5 / 1000 = 0.005 @r39132 - #netflixcloud 36
Work-around Issues specific to the chosen KV store Sharding Domains There are 2 reasons to shard domains You are trying to avoid running into one of the sizing limits e.g. 10GB of space or 1 Billion Attributes You are trying to scale your writes To scale your writes further, use BatchPutAttributes and BatchDeleteAttributes where possible @r39132 - #netflixcloud 37
Challenge 3 Create a Bi-directional DC-Cloud Data Replication Pipeline
Create a Bi-directional DC-Cloud Data Replication Pipeline Home-grown Data Replication Framework known as IR for Item Replication 2 schemes in use currently Polls the main table (a.k.a. Simple IR) Doesn’t capture deletes but easy to implement Polls a journal table that is populated via a trigger on the main table (a.k.a. Trigger-journaled IR) Captures every CRUD, but requires the development of triggers @r39132 - #netflixcloud 39
Create a Bi-directional DC-Cloud Data Replication Pipeline @r39132 - #netflixcloud 40
Create a Bi-directional DC-Cloud Data Replication Pipeline How often do we poll Oracle? Every 5 seconds What does the poll query look like? select *  	from QLOG_0  	where LAST_UPDATE_TS > :CHECKPOINT  Get recent 	and LAST_UPDATE_TS < :NOW_MINUS_30s  Exclude most recent 	order by LAST_UPDATE_TS  Process in order @r39132 - #netflixcloud 41
Create a Bi-directional DC-Cloud Data Replication Pipeline Data Replication Challenges & Best Practices SimpleDB throttles traffic aggressively via 503 HTTP Response codes (“Service Unavailable”) With Singleton writes, I see 70-120 write TPS/domain IR Shard domains (i.e. partition data sets) to work-around these limits Employs Slow ramp up Uses BatchPutAttributes instead of (Singleton) PutAttributes call Exercises an exponential bounded-back-off algorithm Uses attribute-level replace=false when fork-lifting data  @r39132 - #netflixcloud 42
Netflix’s Transition to NoSQL P.1.5 Cassandra
Data Model Cassandra
Data Model : Cassandra @r39132 - #netflixcloud 45 Terminology
Data Model : Cassandra @r39132 - #netflixcloud 46
Data Model : Cassandra @r39132 - #netflixcloud 47
API in Action Cassandra
APIs for Reads Reads I want to continue watching Tron from where I left off (quorum reads)? datastore.get(“Netflix”, ”Sid_Anand”, Streaming Bookmarks  Tron , ConsistencyLevel.QUORUM) When did the True Grit DVD get shipped and returned (fastest read)? datastore.get_slice(“Netflix”, ”Sid_Anand”, (DVD) Rental History  5678, [“Ship_TS”, “Return_TS”], ConsistencyLevel.ONE) How many DVD have been shipped to me (fastest read)? datastore.get_count(“Netflix”, ”Sid_Anand”, (DVD) Rental History, ConsistencyLevel.ONE) @r39132 - #netflixcloud 49
APIs for Writes Writes Replicate Netflix Hub Operation Shipments as Batched Writes : True Grit and Tron shipped together to Sid datastore.batch_mutate(“Netflix”, mutation_map, ConsistencyLevel.QUORUM) @r39132 - #netflixcloud 50
Performance Model Cassandra
The Promise of Performance @r39132 - #netflixcloud 52
The Promise of Performance Manage Reads Rows and Top-Level Columns are stored and indexed in sorted order giving logarithmic time complexity for look up These help Bloom Filters at the Row Level Key Cache Large OS Page Cache These do not help Disk seeks on reads It gets worse with more row-redundancy across SSTables  Compaction is a necessary evil Compaction wipes out the Key Cache @r39132 - #netflixcloud 53
The Promise of Performance @r39132 - #netflixcloud 54

More Related Content

What's hot

How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
 
Docker on Power Systems
Docker on Power SystemsDocker on Power Systems
Docker on Power SystemsCesar Maciel
 
サンプルアプリケーションで学ぶApache Cassandraを使ったJavaアプリケーションの作り方
サンプルアプリケーションで学ぶApache Cassandraを使ったJavaアプリケーションの作り方サンプルアプリケーションで学ぶApache Cassandraを使ったJavaアプリケーションの作り方
サンプルアプリケーションで学ぶApache Cassandraを使ったJavaアプリケーションの作り方Yuki Morishita
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker IntroductionPeng Xiao
 
Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker, Inc.
 
Docker 101 - Nov 2016
Docker 101 - Nov 2016Docker 101 - Nov 2016
Docker 101 - Nov 2016Docker, Inc.
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidImply
 
Implementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetImplementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetJames Wernicke
 
Argo Workflows 3.0, a detailed look at what’s new from the Argo Team
Argo Workflows 3.0, a detailed look at what’s new from the Argo TeamArgo Workflows 3.0, a detailed look at what’s new from the Argo Team
Argo Workflows 3.0, a detailed look at what’s new from the Argo TeamLibbySchulze
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigLester Martin
 
Distributed SQL Databases Deconstructed
Distributed SQL Databases DeconstructedDistributed SQL Databases Deconstructed
Distributed SQL Databases DeconstructedYugabyte
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflixVinay Kumar Chella
 
NOSQLの基礎知識(講義資料)
NOSQLの基礎知識(講義資料)NOSQLの基礎知識(講義資料)
NOSQLの基礎知識(講義資料)CLOUDIAN KK
 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in KubernetesRafał Leszko
 
Lifecycle of a pod
Lifecycle of a podLifecycle of a pod
Lifecycle of a podHarshal Shah
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsNilesh Gule
 
Introduction to Blockchain Business Models
Introduction to Blockchain Business ModelsIntroduction to Blockchain Business Models
Introduction to Blockchain Business ModelsGokul Alex
 

What's hot (20)

How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 
Docker on Power Systems
Docker on Power SystemsDocker on Power Systems
Docker on Power Systems
 
サンプルアプリケーションで学ぶApache Cassandraを使ったJavaアプリケーションの作り方
サンプルアプリケーションで学ぶApache Cassandraを使ったJavaアプリケーションの作り方サンプルアプリケーションで学ぶApache Cassandraを使ったJavaアプリケーションの作り方
サンプルアプリケーションで学ぶApache Cassandraを使ったJavaアプリケーションの作り方
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 
Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker Slides
 
Docker 101 - Nov 2016
Docker 101 - Nov 2016Docker 101 - Nov 2016
Docker 101 - Nov 2016
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
 
Implementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetImplementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over Ethernet
 
Argo Workflows 3.0, a detailed look at what’s new from the Argo Team
Argo Workflows 3.0, a detailed look at what’s new from the Argo TeamArgo Workflows 3.0, a detailed look at what’s new from the Argo Team
Argo Workflows 3.0, a detailed look at what’s new from the Argo Team
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
 
Distributed SQL Databases Deconstructed
Distributed SQL Databases DeconstructedDistributed SQL Databases Deconstructed
Distributed SQL Databases Deconstructed
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
NOSQLの基礎知識(講義資料)
NOSQLの基礎知識(講義資料)NOSQLの基礎知識(講義資料)
NOSQLの基礎知識(講義資料)
 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in Kubernetes
 
Lifecycle of a pod
Lifecycle of a podLifecycle of a pod
Lifecycle of a pod
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 
Introduction to Blockchain Business Models
Introduction to Blockchain Business ModelsIntroduction to Blockchain Business Models
Introduction to Blockchain Business Models
 

Similar to Svccg nosql 2011_v4

Netflix's Transition to High-Availability Storage (QCon SF 2010)
Netflix's Transition to High-Availability Storage (QCon SF 2010)Netflix's Transition to High-Availability Storage (QCon SF 2010)
Netflix's Transition to High-Availability Storage (QCon SF 2010)Sid Anand
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Sid Anand
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsScyllaDB
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2Sid Anand
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastDatabricks
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraJeff Smoley
 
NoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBNoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBsadegh salehi
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraJeff Bollinger
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDatabricks
 

Similar to Svccg nosql 2011_v4 (20)

Netflix's Transition to High-Availability Storage (QCon SF 2010)
Netflix's Transition to High-Availability Storage (QCon SF 2010)Netflix's Transition to High-Availability Storage (QCon SF 2010)
Netflix's Transition to High-Availability Storage (QCon SF 2010)
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
AWS User Group October
AWS User Group OctoberAWS User Group October
AWS User Group October
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
NoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBNoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DB
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
 

More from Sid Anand

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Sid Anand
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionSid Anand
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)Sid Anand
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Sid Anand
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Sid Anand
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Sid Anand
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari Sid Anand
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Sid Anand
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Sid Anand
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)Sid Anand
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Sid Anand
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with MavenSid Anand
 
Learning git
Learning gitLearning git
Learning gitSid Anand
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)Sid Anand
 

More from Sid Anand (20)

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with Maven
 
Learning git
Learning gitLearning git
Learning git
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)
 

Svccg nosql 2011_v4

  • 1. Netflix’s Transition to NoSQL P.1 Siddharth “Sid” Anand @r39132 Silicon Valley Cloud Computing Group February 2011
  • 2. Special Thanks Sebastian Stadil – Scalr, SVCCG Meetup Organizer SriSatish Ambati – Datastax Lynn Bender – Geek Austin Scott MacVicar – Facebook 2 @r39132 - #netflixcloud
  • 3. Motivation Netflix’s motivation for moving to the cloud
  • 4. Motivation Circa late 2008, Netflix had a single data center Single-point-of-failure (a.k.a. SPOF) Approaching limits on cooling, power, space, traffic capacity Alternatives Build more data centers Outsource the majority of our capacity planning and scale out Allows us to focus on core competencies @r39132 - #netflixcloud 4
  • 5. Motivation Winner: Outsource the majority of our capacity planning and scale out Leverage a leading Infrastructure-as-a-service provider Amazon Web Services Footnote : As it has taken us a while (i.e.~2+ years) to realize our vision of running on the cloud, we needed an interim solution to handle growth We did build a second data center along the way We did outgrow it 5 @r39132 - #netflixcloud
  • 6. Cloud Migration Strategy What to Migrate?
  • 7. Cloud Migration Strategy Components Applications and Software Infrastructure Data Migration Considerations Avoid sensitive data for now PII and PCI DSS stays in our DC, rest can go to the cloud Favor Web Scale applications & data @r39132 - #netflixcloud 7
  • 8. Cloud Migration Strategy Examples of Data that can be moved Video-centric data Critics’ and Users’ reviews Video Metadata (e.g. director, actors, plot description, etc…) User-video-centric data – some of our largest data sets Video Queue Watched History Video Ratings (i.e. a 5-star rating system) Video Playback Metadata (e.g. streaming bookmarks, activity logs) @r39132 - #netflixcloud 8
  • 9. Cloud Migration Strategy How and When to Migrate?
  • 10. Cloud Migration Strategy High-level Requirements for our Site No big-bang migrations New functionality needs to launch in the cloud when possible High-level Requirements for our Data Data needs to migrate before applications Data needs to be shared between applications running in the cloud and our data center during the transition period @r39132 - #netflixcloud 10
  • 11. Cloud Migration Strategy @r39132 - #netflixcloud 11
  • 12. Cloud Migration Strategy @r39132 - #netflixcloud 12
  • 13. Cloud Migration Strategy @r39132 - #netflixcloud 13
  • 14. Cloud Migration Strategy Pick a (key-value) data store in the cloud Challenges Translate RDBMS concepts to KV store concepts Work-around Issues specific to the chosen KV store Create a bi-directional DC-Cloud data replication pipeline @r39132 - #netflixcloud 14
  • 15. Pick a Data Store in the Cloud
  • 16.
  • 20. Handles a majority of use-cases accessing high-growth, high-traffic data
  • 21. Specifically, key access by customer id, movie id, or both@r39132 - #netflixcloud 16
  • 22. Pick a Data Store in the Cloud We picked SimpleDB and S3 SimpleDB was targeted as the AP equivalent of our RDBMS databases in our Data Center S3 was used for data sets where item or row data exceeded SimpleDB limits and could be looked up purely by a single key (i.e. does not require secondary indices and complex query semantics) Video encodes Streaming device activity logs (i.e. CLOB, BLOB, etc…) Compressed (old) Rental History @r39132 - #netflixcloud 17
  • 24. Technology Overview : SimpleDB @r39132 - #netflixcloud 19 Terminology
  • 25.
  • 26. SimpleDB domains are sparse and schema-less
  • 27. The Key and all Attributes are indexed
  • 28. Each item must have a unique Key
  • 29. An item contains a set of Attributes
  • 31. Each Attribute has a set of values
  • 32.
  • 33. Technology Overview : SimpleDB @r39132 - #netflixcloud 22 Options available on reads and writes Consistent Read Read the most recently committed write May have lower throughput/higher latency/lower availability Conditional Put/Delete i.e. Optimistic Locking Useful if you want to build a consistent multi-master data store – you will still require your own anti-entropy We do not use this currently, so we don’t know how it performs
  • 34. Challenge 1 Translate RDBMS Concepts to Key-Value Store Concepts
  • 35. Translate RDBMS Concepts to Key-Value Store Concepts Relational Databases are known for relations First, a quick refresher on Normal forms @r39132 - #netflixcloud 24
  • 36. Normalization NF1 : All occurrences of a record type must contain the same number of fields -- variable repeating fields and groups are not allowed NF2 : Second normal form is violated when a non-key field is a fact about a subset of a key Violated here Fixed here @r39132 - #netflixcloud 25
  • 37. Normalization Issues Wastes Storage The warehouse address is repeated for every Part-WH pair Update Performance Suffers If the address of a warehouse changes, I must update every part in that warehouse – i.e. many rows Data Inconsistencies Possible I can update the warehouse address for one Part-WH pair and miss Parts for the same WH (a.k.a. update anomaly) Data Loss Possible An empty warehouse does not have a row, so the address will be lost. (a.k.a. deletion anomaly) @r39132 - #netflixcloud 26
  • 38. Normalization RDBMS  KV Store migrations can’t simply accept denormalization! Especially many-to-many and many-to-one entity relationships Instead, pick your data set candidates carefully! Keep relational data in RDBMS Move key-look-ups to KV stores Luckily for Netflix, most Web Scale data is accessed by Customer, Video, or both i.e. Key Lookups that do not violate 2NF or 3NF @r39132 - #netflixcloud 27
  • 39. Translate RDBMS Concepts to Key-Value Store Concepts Aside from relations, relational databases typically offer the following: Transactions Locks Sequences Triggers Clocks A structured query language (i.e. SQL) Database server-side coding constructs (i.e. PL/SQL) Constraints @r39132 - #netflixcloud 28
  • 40. Translate RDBMS Concepts to Key-Value Store Concepts Partial or no SQL support (e.g. no Joins, Group Bys, etc…) BEST PRACTICE Carry these out in the application layer for smallish data No relations between domains BEST PRACTICE Compose relations in the application layer No transactions BEST PRACTICE SimpleDB : Conditional Put/Delete (best effort) w/ fixer jobs Cassandra : Batch Mutate + the same column TS for all writes @r39132 - #netflixcloud 29
  • 41. Translate RDBMS Concepts to Key-Value Store Concepts No schema - This is non-obvious. A query for a misspelled attribute name will not fail with an error BEST PRACTICE Implement a schema validator in a common data access layer No sequences BEST PRACTICE Sequences are often used as primary keys In this case, use a naturally occurring unique key If no naturally occurring unique key exists, use a UUID Sequences are also often used for ordering Use a distributed sequence generator or rely on client timestamps @r39132 - #netflixcloud 30
  • 42. Translate RDBMS Concepts to Key-Value Store Concepts No clock operations, PL/SQL, Triggers BEST PRACTICE Clocks : Instead rely on client-generated clocks and run NTP. If using clocks to determine order, be aware that this is problematic over long distances. PL/SQL, Triggers : Do without No constraints. Specifically, No uniqueness constraints No foreign key or referential constraints No integrity constraints BEST PRACTICE Applications must implement this functionality @r39132 - #netflixcloud 31
  • 43. Challenge 2 Work-around Issues specific to the chosen KV store SimpleDB
  • 44. Work-around Issues specific to the chosen KV store Missing / Strange Functionality No back-up and recovery No native support for types (e.g. Number, Float, Date, etc…) You cannot update one attribute and null out another one for an item in a single API call Mis-cased or misspelled attribute names in operations fail silently. Why is SimpleDB case-sensitive? Neglecting "limit N" returns a subset of information. Why does the absence of an optional parameter not return all of the data? Users need to deal with data set partitioning Beware of Nulls Write throughput not as high as we need for certain use-cases @r39132 - #netflixcloud 33
  • 45. Work-around Issues specific to the chosen KV store No Native Types – Sorting, Inequalities Conditions, etc… Since sorting is lexicographical, if you plan on sorting by certain attributes, then zero-pad logically-numeric attributes e.g. – 000000000000000111111  this is bigger 000000000000000011111 use Joda time to store logical dates e.g. – 2010-02-10T01:15:32.864Z this is more recent 2010-02-10T01:14:42.864Z @r39132 - #netflixcloud 34
  • 46. Work-around Issues specific to the chosen KV store Anti-pattern : Avoid the anti-pattern Select SOME_FIELD_1 from MY_DOMAIN where SOME_FIELD_2 is null as this is a full domain scan Nulls are not indexed in a sparse-table BEST PRACTICE Instead, replace this check with a (indexed) flag column called IS_FIELD_2_NULL: Select SOME_FIELD_1 from MY_DOMAIN where IS_FIELD_2_NULL = 'Y' Anti-pattern : When selecting data from a domain and sorting by an attribute, items missing that attribute will not be returned In Oracle, rows with null columns are still returned BEST PRACTICE Use a flag column as shown previously @r39132 - #netflixcloud 35
  • 47. Work-around Issues specific to the chosen KV store BEST PRACTICE : Aim for high index selectivity when you formulate your select expressions for best performance SimpleDB select performance is sensitive to index selectivity Index Selectivity Definition : # of distinct attribute values in specified attribute/# of items in domain e.g. Good Index Selectivity (i.e. 1 is the best) A table having 100 records and one of its indexed column has 88 distinct values, then the selectivity of this index is 88 / 100= 0.88 e.g. Bad Index Selectivity lf an index on a table of 1000 records had only 5 distinct values, then the index's selectivity is 5 / 1000 = 0.005 @r39132 - #netflixcloud 36
  • 48. Work-around Issues specific to the chosen KV store Sharding Domains There are 2 reasons to shard domains You are trying to avoid running into one of the sizing limits e.g. 10GB of space or 1 Billion Attributes You are trying to scale your writes To scale your writes further, use BatchPutAttributes and BatchDeleteAttributes where possible @r39132 - #netflixcloud 37
  • 49. Challenge 3 Create a Bi-directional DC-Cloud Data Replication Pipeline
  • 50. Create a Bi-directional DC-Cloud Data Replication Pipeline Home-grown Data Replication Framework known as IR for Item Replication 2 schemes in use currently Polls the main table (a.k.a. Simple IR) Doesn’t capture deletes but easy to implement Polls a journal table that is populated via a trigger on the main table (a.k.a. Trigger-journaled IR) Captures every CRUD, but requires the development of triggers @r39132 - #netflixcloud 39
  • 51. Create a Bi-directional DC-Cloud Data Replication Pipeline @r39132 - #netflixcloud 40
  • 52. Create a Bi-directional DC-Cloud Data Replication Pipeline How often do we poll Oracle? Every 5 seconds What does the poll query look like? select * from QLOG_0 where LAST_UPDATE_TS > :CHECKPOINT  Get recent and LAST_UPDATE_TS < :NOW_MINUS_30s  Exclude most recent order by LAST_UPDATE_TS  Process in order @r39132 - #netflixcloud 41
  • 53. Create a Bi-directional DC-Cloud Data Replication Pipeline Data Replication Challenges & Best Practices SimpleDB throttles traffic aggressively via 503 HTTP Response codes (“Service Unavailable”) With Singleton writes, I see 70-120 write TPS/domain IR Shard domains (i.e. partition data sets) to work-around these limits Employs Slow ramp up Uses BatchPutAttributes instead of (Singleton) PutAttributes call Exercises an exponential bounded-back-off algorithm Uses attribute-level replace=false when fork-lifting data @r39132 - #netflixcloud 42
  • 54. Netflix’s Transition to NoSQL P.1.5 Cassandra
  • 56. Data Model : Cassandra @r39132 - #netflixcloud 45 Terminology
  • 57. Data Model : Cassandra @r39132 - #netflixcloud 46
  • 58. Data Model : Cassandra @r39132 - #netflixcloud 47
  • 59. API in Action Cassandra
  • 60. APIs for Reads Reads I want to continue watching Tron from where I left off (quorum reads)? datastore.get(“Netflix”, ”Sid_Anand”, Streaming Bookmarks  Tron , ConsistencyLevel.QUORUM) When did the True Grit DVD get shipped and returned (fastest read)? datastore.get_slice(“Netflix”, ”Sid_Anand”, (DVD) Rental History  5678, [“Ship_TS”, “Return_TS”], ConsistencyLevel.ONE) How many DVD have been shipped to me (fastest read)? datastore.get_count(“Netflix”, ”Sid_Anand”, (DVD) Rental History, ConsistencyLevel.ONE) @r39132 - #netflixcloud 49
  • 61. APIs for Writes Writes Replicate Netflix Hub Operation Shipments as Batched Writes : True Grit and Tron shipped together to Sid datastore.batch_mutate(“Netflix”, mutation_map, ConsistencyLevel.QUORUM) @r39132 - #netflixcloud 50
  • 63. The Promise of Performance @r39132 - #netflixcloud 52
  • 64. The Promise of Performance Manage Reads Rows and Top-Level Columns are stored and indexed in sorted order giving logarithmic time complexity for look up These help Bloom Filters at the Row Level Key Cache Large OS Page Cache These do not help Disk seeks on reads It gets worse with more row-redundancy across SSTables  Compaction is a necessary evil Compaction wipes out the Key Cache @r39132 - #netflixcloud 53
  • 65. The Promise of Performance @r39132 - #netflixcloud 54
  • 67. Distribution Model No time here.. Read up on the following: Merkle Trees + Gossip  Anti-Entropy Read-Repair Consistent Hashing SEDA (a.k.a. Staged Event Driven Architecture) paper Dynamo paper @r39132 - #netflixcloud 56
  • 68. Features We Like Cassandra
  • 69. Features We Like @r39132 - #netflixcloud 58 Rich Value Model : Value is a set of columns or super-columns Efficiently address, change, and version individual columns Does not require read-whole-row-before-alter semantics Effectively No Column or Column Family Limits SimpleDB Limits 256 Attributes / Item 1 Billion Attributes / Domain 1 KB Attribute Value Length Growing a Cluster a.k.a. Resharding a KeySpace is Manageable SimpleDB Users must solve this problem : application code needs to do the migration
  • 70. Features We Like @r39132 - #netflixcloud 59 Better handing of Types SimpleDB Everything is a UTF-8 string Cassandra Native Support for Key and Column Key types (for sorting) Column values are never looked at and are just []byte Open Source & Java Implement our own Backup & Recovery Implement our own Replication Strategy We know Java best, though we think Erlang is cool, with the exception of the fact that each digit in an integer is 1 byte of memory!!! We can make it work in AWS
  • 71. Features We Like @r39132 - #netflixcloud 60 No Update-Delete Anomalies Specify a batch mutation with a delete and a mutation for a single row_key/column-family pair in a single batch Must use same column Time Stamp Tunable Consistency/Availability Tradeoff Strong Consistency Quorum Reads & Writes Eventual Consistency R=1, W=1 (fastest read and fastest write) R=1 , W=QUORUM (fastest read and potentially-slower write) R=QUORUM, W=1 (potentially-slower read and fastest write)
  • 72. Where Does That Leave Us? Cassandra
  • 73. Where Are We With This List? KV Store Missing / Strange Functionality No back-up and recovery No native support for types (e.g. Number, Float, Date, etc…) You cannot update one attribute and null out another one for an item in a single API call Mis-cased or misspelled attribute names in operations fail silently. Neglecting "limit N" returns a subset of information. Why does the absence of an optional parameter not return all of the data? Users need to deal with data set partitioning Beware of Nulls Write throughput not as high as we need for certain use-cases @r39132 - #netflixcloud 62
  • 74.
  • 78. Handles a majority of use-cases accessing high-growth, high-traffic data
  • 79. Specifically, key access by customer id, movie id, or both
  • 82.
  • 83. HBase
  • 84. Other Custom SystemsThere are a bunch of Netflix and Cloud Systems folks here, feel free to reach out to them!!

Editor's Notes

  1. Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  2. Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  3. Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  4. Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  5. Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  6. Dynamo storage doesn’t suffer from this!
  7. This is an issue with any SQL-like Query layer over a Sparse-data model. It can happen in other technologies.
  8. Cannot treat SimpleDB like a black-box for performance critical applications.
  9. We found the write availability was affected by the right partitioning scheme. We use a combination of forwarding tables and modulo addressing
  10. Mention trickle lift
  11. Dynamo storage doesn’t suffer from this!