Update talk on Cassandra at Netflix, presented at the Silicon Valley NoSQL meetup on 9 Feb 2012. Includes an introduction to Astyanax, an open source cassandra client written in java.
AI You Can Trust - Ensuring Success with Data Integrity Webinar
Cassandra from the trenches: migrating Netflix (update)
1. Cassandra from the trenches:
migrating Netflix
Jason Brown
Senior Software Engineer
Netflix
@jasobrown jasedbrown@gmail.com
http://www.linkedin.com/in/jasedbrown
2. History, 2008
• In the beginning, there was the webapp
– And a database
– In one datacenter
• Then we grew, and grew, and grew
– More databases, all conjoined
– Database links, PL/SQL, Materialized views
– Multi-Master replication (MMR)
• Then it melted down
– Couldn’t ship DVDs for ~3 days
3. History, 2009
• Time to rethink everything
– Abandon our datacenter
– Ditch the monolithic webapp
– Migrate single point of failure database to …
4. History, 2010
• SimpleDB/S3
– Managed by Amazon, not us
– Got us started with NoSQL in the cloud
– Problems:
• High latency, rate limiting (throttling)
• (no) auto-sharding, no backups
5. Shiny new toy (2011)
• We switched to Cassandra
– Similar to SimpleDB, with limits removed
– Dynamo-model appealed to us
– Column-based, key-value data model seemed
sufficient for most needs
– Performance looked great (rudimentary tests)
7. About Netflix’s AB Testing
• Basic concepts
– Test – An experiment where several competing
behaviors are implemented and compared
– Cell – different experiences within a test that are
being compared against each other
– Allocation – a customer-specific assignment to a
cell within a test
8. Data Modeling - background
• AB has two sets of data
– metadata about tests
– allocations
9. AB - allocations
• Single table to hold allocations
– Currently at > 1 billion records
– Plus indices!
• One record for every test that every customer
is allocated into
• Unique constraint on customer/test
10. AB – relational model
• Typical parent-child table relationship
• Not updated frequently, so service can cache
11. Data modeling in Cassandra
• Every where I looked, the Internet told me to
understand my data use patterns
• Identify the questions that you need to
answer from the data
• Know how to query your data set and make
the persistence model match
12. Identifying the AB questions that need
to be answered
• High traffic
– get all allocations for a customer
• Low traffic
– get count of customers in test/cell
– find all customers in a test/cell
– find all customers in a test who were added within
a date range
13. Modeling allocations in Cassandra
• Read all allocations for a customer
– as fast as possible
• Find all of customers in a test/cell
– reverse index
• Get count of customers in test/cell
– count the entries in the reverse index
14. Denormalization - HOWTO
• No real world examples
– ‘Normalization is for sissies’, Pat Helland
• Denormalize allocations per customer
– Trivial with a schema-less database
16. Implementing allocations
• As allocation for a customer has a handful of
data points, they logically can be grouped
together
• Avoided blobs, json or otherwise
• Using a standard column family, with
composite columns
17. Composite columns
• Composite columns are sorted by each ‘token’
in name
• Allocation column naming convention
– <testId>:<field>
– 42:cell = 2
– 42:enabled = Y
– 47:cell = 0
– 47:enabled = Y
18. Modeling AB metadata in cassandra
• Explored several models, including json
blobs, spreading across multiple CFs, differing
degrees of denormalization
• Reverse index to identify all tests for loading
19. Implementing metadata
• One CF, one row for all test’s data
– Every data point is a column – no blobs
• Composite columns
– type:id:field
• Types = base info, cells, allocation plans
• Id = cell number, allocation plan (gu)id
• Field = type-specific
– Base info = test name, description, enabled
– Cell’s name / description
– Plan’s start/end dates, country to allocate to
20. Implementing indices
• Cassandra’s secondary indices vs. hand-built
and maintained alternate indices
• Secondary indices work great on uniform data
between rows
• But sparse column data not easy to index
21. Hand-built Indices, 1
• Reverse index
– Test/cell (key) to custIds (columns)
• Column value is timestamp
• Updating index when allocating a customer
into test (double write)
22. Hand-built indices, 2
• Counter column family
– Test/cell to count of customers in test columns
– Mutate on allocating a customer into test
• Counters are not idempotent!
• Mutates need to write to every node that
hosts that key
23. Index rebuilding
• To keep the index consistent, it needs to be
rebuilt occasionally
• Even Oracle needs to have it’s indices rebuilt
32. Astyanax latency aware
• Samples response times from Cassandra
nodes
• Favors faster responding nodes in pool
• Use with token aware connection pooling
33. Allocation mutates
• AB allocations are immutable, so we need to
prevent mutating
• Oracle - unique table constraint
• Cassandra - read before write
– data race!
34. Running cassandra
• Compactions happen
– how Cassandra is maintained
– Mutations are written to memory (Memtable)
– Flushed to disk (SSTable) on triggering threshold
– Eventually, Cassandra merges SSTables as data for
individual rows becomes scattered
35. Compactions, 2
• Latency spikes happen, especially on read-
heavy systems
– Everything can slow down
– Throttling in newer Cassandra versions helps
– Astyanax avoids this problem with latency
awareness
36. Tunings, 1
• Key and row caches
– Left unbounded can consume JVM memory
needed for normal work
– Latencies will spike as the JVM fights for free
memory
– Off-heap row cache is better but still maintains
data structures on-heap
37. Tunings, 2
• mmap() as in-memory cache
– When the Cassandra process is terminated, mmap
pages are returned to the free list
• Row cache helps at startup
38. Tunings, 3
• Sizing memtable flushes for optimizing
compactions
– Easier when writes are uniformly
distributed, timewise – easier to reason about
flush patterns
– Best to optimize flushes based on memtable
size, not time
39. Tunings, 4
• Sharding
– If a single row has disproportionately high
gets/mutates, the nodes holding it will become
hot spots
– If a row grows too large, it can’t fit into memory
40. Takeaways
• Netflix is making all of our components
distributed and fault tolerant as we grow
domestically and internationally.
• Cassandra is a core piece of our cloud
infrastructure.
• Netflix is open sourcing it’s cloud
platform, including Cassandra support