Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Running Cassandra in the Cloud: An Introduction to Priam
1. Running Cassandra
in the Cloud:
An Introduction to Priam
Jason Brown
@jasobrown jasedbrown@gmail.com
www.linkedin.com/in/jasedbrown
2. About me
● Senior Software Engineer, Netflix
● Apache Cassandra committer
● E-Commerce Architect, Major League
Baseball Advanced Media
● Wireless developer (J2ME and BREW)
4. Cassandra meet EC2
● shell script(s)
● python scripts
● backup / restore
● centralized model
● installing 2.7 broke CentOS yum
● first time we ran it in prod, my cluster was
destroyed
5. Hello, Priam!
Priam, the father of Cassandra
(http://en.wikipedia.org/wiki/Priam)
Java web app
● Token Assignment
● Backup / Restore
● Multi-region support
● Configuration management
6. Branches
each priam branch corresponds to a c* version
● priam 1.1 -> c* 1.1
● priam master -> c* 1.2
● ??? -> c* trunk
7. Token Assignment
● Cassandra needs an assigned token
● Priam tries to
○ replace a dead instance
○ join as a new node
● External storage for known cluster members
○ host name/IP addr/instance id
○ token
○ region/availability zone
8.
9. Replacing a dead node
● Get known nodes in region/AZ from storage
○ {A, B, C}
● Get live nodes in region/AZ from ASG api
○ {A, B}
● Take over a dead node's token
○ C
● uses c*'s replace_token
10. Joining as a new node
● Calculate token
○ per-region offset
○ determine 'slot' in region/AZ
○ derive token
11. Region hash offset
● Each region needs a different base offset
○ avoids token collisions
int hash = "us-east-1".hashCode();
13. Node Slotting Layout
+--------+--------+--------+
| zone A | zone B | zone C |
+--------+--------+--------+
| 0 | 1 | 2 |
+--------+--------+--------+
| 3 | 4 | 5 |
+--------+--------+--------+
| 6 | 7 | 8 |
+--------+--------+--------+
| 9 | 10 | 11 |
+--------------------------+
(ascii art rocks)
14. Here's your token
MAXIMUM_TOKEN
.divide(regionNodeCount)
.multiply(mySlot)
.add(regionHashOffset);
example:
100 / 10 (ten nodes in region)
3 + (in slot three) + 12
= 42
15. Seeds
● first node in each AZ, in every region
● except if current node is in the first slot
○ seeds cannot auto bootstrap
16. Multi-region communication
AWS security groups block ingress requests
Intra-region: whitelist by other in-region SG
Inter-region: whitelist by IP address
○ must use public IP address!
17. Whitelisting IP address
● Seed nodes compare
○ current region's SG IP address
○ entries in SimpleDB database
● Add new nodes's to SG
● Remove dead nodes from SG
21. Backup path
Bucket: netflix-cassandra-data
Path:
base dir / region / cluster name / token / snapshot time /
[SNP | SST | META] / keyspace / column family / data file
example:
test_backup/us-east-1/cass_jasobrown/42/1234567/
SNP/jasobrown/dog/jasobrown-dog-ja-1-Data.db
22. Restore
● best with same size cluster as source
● best if tokens match with source
Uses (besides the obvious)
● prod to test refresh
● reproduce prod data problems
● incremental restore - WIP
23. Configuration Management
Control aspects of priam and c*
● yaml
● startup script(s) env values
Netflix needs this as we have ~55 production
clusters, with slightly different configs
24. So, does Netflix actually use Priam?
55 production clusters, > 750 nodes
Internal extensions
● Hook into internal DNS, properties systems
● Alternative storage to SimpleDB
● BI messaging integration - WIP
● C* JMX monitoring
25. Monitoring
● Poll C* every 60 seconds
● selected JMX metrics
● publish to internal metrics aggregator
○ currently uses Netflix's OSS Servo library (github.
com/Netflix/servo)
26. Next directions
Commit log backups
Datastax Enterprise support
● security
● solr
● configuration
c* 1.2 virtual nodes (a/k/a vnodes)
auto scaling