SlideShare a Scribd company logo
1 of 21
Cassandra as the central nervous
system of your distributed systems

            /*
            Joe Stein
            http://www.linkedin.com/in/charmalloc
            @allthingshadoop
            @cassandranosql
            @allthingsscala
            @charmalloc
            */



            http://www.medialets.com




                         1
Overview
• Architecture
• Aggregate Metrics/Time Series
• Implementation Over Cassandra




                  2
Medialets

Architecture




      3
Medialets
•   Largest deployment of rich media ads for mobile devices
•   Over 300,000,000 devices supported
•   3-4 TB of new data every day
•   Thousands of services in production
•   Hundreds of thousands ofevents received every second
•   Response times are measured in microseconds
•   Languages
     – 35% JVM (20% Scala& 10% Java)
     – 30% Ruby
     – 20% C/C++
     – 13% Python
     – 2% Bash


                                 4
The million foot view



AdServi    Collecti
  ng         on

           Kafka
 mysql                Hadoop




          Cassandr     mysql
             a
                      Muse

                       mysql
Medialets

Aggregate Metrics/Time Series




              6
Lets look at just one data point captured

•   09/10/2011 11:12:13
•   App = Yahoo!
•   Platform = iOS
•   OS = 4.3.4
•   Device = iPad2,1
•   Resolution = 768x1024
•   Events
    –videoPlayPercent = 38
    –Taste = great




                             7
The time series part of it

• 09/10/2011 11:12:13

       Quarter                   Q3
       Month                     201109
       Week                      201136
       Day                       20110910
       Hour                      2011091011
       Minute                    201109101112
       Second                    20110910111213




                             8
Metrics For Different Wants

Yahoo! + iOS + 4.3.4 + iPad2,1 + 768x1024

Yahoo! + videoPlayPercent = 30 + Taste = great

Yahoo! + Taste = great

Yahoo! + videoPlayPercent = 30

iPad2,1 + videoPlayPercent = 30 + Taste = great

768x1024 + videoPlayPercent = 30 + Taste = great

iOS + 4.3.4 + iPad2,1

                         9
Medialets

Implementation Over Cassandra




              10
Storing the time series

CREATE COLUMN FAMILY ByDay                                   Column Families hold your
WITH default_validation_class=CounterColumnType              rows of data. Each row in
AND key_validation_class=UTF8Type AND comparator=UTF8Type;   each column family will be
                                                             equal to the time period you
CREATE COLUMN FAMILY ByHour                                  are dealing with. So an
WITH default_validation_class=CounterColumnType              “event” occurring at
AND key_validation_class=UTF8Type AND comparator=UTF8Type;
                                                             09/10/2011 12:13:14 will
                                                             become 4 rows
CREATE COLUMN FAMILY ByMinute
WITH default_validation_class=CounterColumnType              BySecond = 20110910121314
AND key_validation_class=UTF8Type AND comparator=UTF8Type;   ByMinute= 201109101213
                                                             ByHour= 2011091012
CREATE COLUMN FAMILY BySecond                                ByDay=20110910
WITH default_validation_class=CounterColumnType
AND key_validation_class=UTF8Type AND comparator=UTF8Type;




                                            11
Why multiple column families?
http://www.datastax.com/docs/1.0/configuration/storage_configuration




                                 12
Generically group by
• app+platform+osversion+device+resolution

• app+event1+event2

• app+event1

• app+event2

• device+event1+event2

• resolution+event1+event2

• platform+osversion+device



                              13
As columns – names are composites

• app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iPad2,1+768x1024

• app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great

• app+event1#Yahoo!+Taste=great

• app+event2#Yahoo!+videoPlayPercent=30

• device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=great

• resolution+event1+event2#768x1024+videoPlayPercent=30+Taste=great

• platform+osversion+device#iOS+4.3.4+iPad2,1




                                            14
The rows

• ByHour=2011091011
   – app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iPad2,1+768x1024
   – app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great
   – app+event1#Yahoo!+Taste=great
   – app+event2#Yahoo!+videoPlayPercent=30
   – device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=great
   – resolution+event1+event2#768x1024+videoPlayPercent=30+Taste=great
   – platform+osversion+device#iOS+4.3.4+iPad2,1

• ByDay=20110910
   – app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iPad2,1+768x1024
   – app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great
   – app+event1#Yahoo!+Taste=great
   – app+event2#Yahoo!+videoPlayPercent=30
   – device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=great
   – resolution+event1+event2#768x1024+videoPlayPercent=30+Taste=great
   – platform+osversion+device#iOS+4.3.4+iPad2,1




                                            15
Inserting data with Hector
• mutator.insertCounter(“20110910, “ByDay”,
  HFactory.createCounterColumn(“app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iP
  ad2,1+768x1024”), 1))

• mutator.insertCounter(“20110910, “ByDay”,
  HFactory.createCounterColumn(“app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great”)
  , 1))

• mutator.insertCounter(“20110910, “ByDay”,
  HFactory.createCounterColumn(“app+event1#Yahoo!+Taste=great”), 1))

• mutator.insertCounter(“20110910, “ByDay”,
  HFactory.createCounterColumn(“app+event2#Yahoo!+videoPlayPercent=30”), 1))

• mutator.insertCounter(“20110910, “ByDay”,
  HFactory.createCounterColumn(“device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=gre
  at”), 1))

• mutator.insertCounter(“20110910, “ByDay”,
  HFactory.createCounterColumn(“resolution+event1+event2#768x1024+videoPlayPercent=30+Tast
  e=great”), 1))

• mutator.insertCounter(“20110910, “ByDay”,
  HFactory.createCounterColumn(“platform+osversion+device#iOS+4.3.4+iPad2,1


                                             16
Inserting data with Skeletor
           Skeletor is the Scala wrapper of Hector for Cassandra
                     https://github.com/joestein/skeletor
aggregateColumnNames(”AppPlatformOSVersionDeviceResolution") =
   "app+platform+osversion+device+resolution#”

def ccAppPlatformOSVersionDeviceResolution(c: (String) => Unit) = {
c(aggregateColumnNames(”AppPlatformOSVersionDeviceResolution”) + app + p(platform) + p(osversion) +
   p(device) + p(resolution))
}

//rows we are going to write too
aggregateKeys(KEYSPACE  ”ByMonth") = month //201109
aggregateKeys(KEYSPACE  "ByDay") = day //20110910
aggregateKeys(KEYSPACE  ”ByHour") = hour //2011091012
aggregateKeys(KEYSPACE  ”ByMinute") = minute //201109101213


def r(columnName: String): Unit = {
aggregateKeys.foreach{tuple:(ColumnFamily, String) => {
val (columnFamily,row) = tuple
         if (row !=null &&row.size> 0)
                   rows add (columnFamily -> row has columnName inc) //increment the counter
         }
  }
}

ccAppPlatformOSVersionDeviceResolution(r)
                                                   17
Retrieving Data
                    MultigetSliceCounterQuery

•   setColumnFamily(“ByDay”)
•   setKeys("20110910")
•   setRange(”app+event1=","app+event1=~",false,1000)
•   We will get all the apps and counts for event1

• setRange(”app+event2=","app+event2=~",false,1000)
• We will get all the apps and the counts for event2

By app tastes great vs less filling

• Sample code for the aggregate metrics and retrieving them
  https://github.com/joestein/apophis

• What is with the tilde?
                               18
Sort for success
Not magic, just Cassandra




           19
A few more things about retrieving data

• You need to start backwards from here.

• If you want to-do things adhoc then map/reduce is better

• Sometimes more rowsarebetterallowing more nodes to-do work
  – If you need to look at 100,000 metrics it is better to pull this out
    of 100 rows than out of 1
  – Don’t be afraid to make CF and composite keys out of Time+
    Aggregate data
      • 20111023+app=Yahoo!
      • This could be the row that holds ALL of the app information
        for that day, if you want to look at 100 apps at once with 1000
        metrics for each per time period, this could be the way to go




                                   20
Q&A
/*
* Joe Stein
*http://www.linkedin.com/in/charmalloc
*@allthingshadoop
*@cassandranosql
*@allthingsscala
*@charmalloc
*http://github.com/joestein
*/


Medialets
The rich media
adplatform for mobile.
                       connect@medialets.com
                       www.medialets.com/showcase




              21

More Related Content

Viewers also liked

Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache MesosJoe Stein
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonJoe Stein
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaJoe Stein
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosJoe Stein
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosJoe Stein
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosJoe Stein
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Michael Noll
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 

Viewers also liked (20)

Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 

Similar to jstein.cassandra.nyc.2011

3 Mobile App Dev Problems - Monospace
3 Mobile App Dev Problems - Monospace3 Mobile App Dev Problems - Monospace
3 Mobile App Dev Problems - MonospaceFrank Krueger
 
John Resig Beijing 2010 (English Version)
John Resig Beijing 2010 (English Version)John Resig Beijing 2010 (English Version)
John Resig Beijing 2010 (English Version)Jia Mi
 
Stress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesStress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesHerval Freire
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminEve Lyons-Berg
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminAggregage
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenHuy Nguyen
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan
 
Twelve ways to make your apps suck less
Twelve ways to make your apps suck lessTwelve ways to make your apps suck less
Twelve ways to make your apps suck lessFons Sonnemans
 
How to build a SaaS solution in 60 days
How to build a SaaS solution in 60 daysHow to build a SaaS solution in 60 days
How to build a SaaS solution in 60 daysBrett McLain
 
MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile Apps
MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile AppsMongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile Apps
MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile AppsMongoDB
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and ElasticElasticsearch
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Future of Data Meetup
 
Laurentiu macovei meteor. a better way of building apps
Laurentiu macovei   meteor. a better way of building appsLaurentiu macovei   meteor. a better way of building apps
Laurentiu macovei meteor. a better way of building appsCodecamp Romania
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data projectMichael Peacock
 
Google I/O 2011, Android Honeycomb Highlights
Google I/O 2011, Android Honeycomb HighlightsGoogle I/O 2011, Android Honeycomb Highlights
Google I/O 2011, Android Honeycomb HighlightsRomain Guy
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondIke Walker
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon RedshiftAmazon Web Services
 
RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011Gerardo Pardo-Castellote
 

Similar to jstein.cassandra.nyc.2011 (20)

3 Mobile App Dev Problems - Monospace
3 Mobile App Dev Problems - Monospace3 Mobile App Dev Problems - Monospace
3 Mobile App Dev Problems - Monospace
 
Intro to appcelerator
Intro to appceleratorIntro to appcelerator
Intro to appcelerator
 
John Resig Beijing 2010 (English Version)
John Resig Beijing 2010 (English Version)John Resig Beijing 2010 (English Version)
John Resig Beijing 2010 (English Version)
 
Stress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesStress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year Eves
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Twelve ways to make your apps suck less
Twelve ways to make your apps suck lessTwelve ways to make your apps suck less
Twelve ways to make your apps suck less
 
How to build a SaaS solution in 60 days
How to build a SaaS solution in 60 daysHow to build a SaaS solution in 60 days
How to build a SaaS solution in 60 days
 
MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile Apps
MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile AppsMongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile Apps
MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile Apps
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and Elastic
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
 
Laurentiu macovei meteor. a better way of building apps
Laurentiu macovei   meteor. a better way of building appsLaurentiu macovei   meteor. a better way of building apps
Laurentiu macovei meteor. a better way of building apps
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
 
Google I/O 2011, Android Honeycomb Highlights
Google I/O 2011, Android Honeycomb HighlightsGoogle I/O 2011, Android Honeycomb Highlights
Google I/O 2011, Android Honeycomb Highlights
 
Swift meetup22june2015
Swift meetup22june2015Swift meetup22june2015
Swift meetup22june2015
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and Beyond
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
 
RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

jstein.cassandra.nyc.2011

  • 1. Cassandra as the central nervous system of your distributed systems /* Joe Stein http://www.linkedin.com/in/charmalloc @allthingshadoop @cassandranosql @allthingsscala @charmalloc */ http://www.medialets.com 1
  • 2. Overview • Architecture • Aggregate Metrics/Time Series • Implementation Over Cassandra 2
  • 4. Medialets • Largest deployment of rich media ads for mobile devices • Over 300,000,000 devices supported • 3-4 TB of new data every day • Thousands of services in production • Hundreds of thousands ofevents received every second • Response times are measured in microseconds • Languages – 35% JVM (20% Scala& 10% Java) – 30% Ruby – 20% C/C++ – 13% Python – 2% Bash 4
  • 5. The million foot view AdServi Collecti ng on Kafka mysql Hadoop Cassandr mysql a Muse mysql
  • 7. Lets look at just one data point captured • 09/10/2011 11:12:13 • App = Yahoo! • Platform = iOS • OS = 4.3.4 • Device = iPad2,1 • Resolution = 768x1024 • Events –videoPlayPercent = 38 –Taste = great 7
  • 8. The time series part of it • 09/10/2011 11:12:13 Quarter Q3 Month 201109 Week 201136 Day 20110910 Hour 2011091011 Minute 201109101112 Second 20110910111213 8
  • 9. Metrics For Different Wants Yahoo! + iOS + 4.3.4 + iPad2,1 + 768x1024 Yahoo! + videoPlayPercent = 30 + Taste = great Yahoo! + Taste = great Yahoo! + videoPlayPercent = 30 iPad2,1 + videoPlayPercent = 30 + Taste = great 768x1024 + videoPlayPercent = 30 + Taste = great iOS + 4.3.4 + iPad2,1 9
  • 11. Storing the time series CREATE COLUMN FAMILY ByDay Column Families hold your WITH default_validation_class=CounterColumnType rows of data. Each row in AND key_validation_class=UTF8Type AND comparator=UTF8Type; each column family will be equal to the time period you CREATE COLUMN FAMILY ByHour are dealing with. So an WITH default_validation_class=CounterColumnType “event” occurring at AND key_validation_class=UTF8Type AND comparator=UTF8Type; 09/10/2011 12:13:14 will become 4 rows CREATE COLUMN FAMILY ByMinute WITH default_validation_class=CounterColumnType BySecond = 20110910121314 AND key_validation_class=UTF8Type AND comparator=UTF8Type; ByMinute= 201109101213 ByHour= 2011091012 CREATE COLUMN FAMILY BySecond ByDay=20110910 WITH default_validation_class=CounterColumnType AND key_validation_class=UTF8Type AND comparator=UTF8Type; 11
  • 12. Why multiple column families? http://www.datastax.com/docs/1.0/configuration/storage_configuration 12
  • 13. Generically group by • app+platform+osversion+device+resolution • app+event1+event2 • app+event1 • app+event2 • device+event1+event2 • resolution+event1+event2 • platform+osversion+device 13
  • 14. As columns – names are composites • app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iPad2,1+768x1024 • app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great • app+event1#Yahoo!+Taste=great • app+event2#Yahoo!+videoPlayPercent=30 • device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=great • resolution+event1+event2#768x1024+videoPlayPercent=30+Taste=great • platform+osversion+device#iOS+4.3.4+iPad2,1 14
  • 15. The rows • ByHour=2011091011 – app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iPad2,1+768x1024 – app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great – app+event1#Yahoo!+Taste=great – app+event2#Yahoo!+videoPlayPercent=30 – device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=great – resolution+event1+event2#768x1024+videoPlayPercent=30+Taste=great – platform+osversion+device#iOS+4.3.4+iPad2,1 • ByDay=20110910 – app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iPad2,1+768x1024 – app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great – app+event1#Yahoo!+Taste=great – app+event2#Yahoo!+videoPlayPercent=30 – device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=great – resolution+event1+event2#768x1024+videoPlayPercent=30+Taste=great – platform+osversion+device#iOS+4.3.4+iPad2,1 15
  • 16. Inserting data with Hector • mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iP ad2,1+768x1024”), 1)) • mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great”) , 1)) • mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“app+event1#Yahoo!+Taste=great”), 1)) • mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“app+event2#Yahoo!+videoPlayPercent=30”), 1)) • mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=gre at”), 1)) • mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“resolution+event1+event2#768x1024+videoPlayPercent=30+Tast e=great”), 1)) • mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“platform+osversion+device#iOS+4.3.4+iPad2,1 16
  • 17. Inserting data with Skeletor Skeletor is the Scala wrapper of Hector for Cassandra https://github.com/joestein/skeletor aggregateColumnNames(”AppPlatformOSVersionDeviceResolution") = "app+platform+osversion+device+resolution#” def ccAppPlatformOSVersionDeviceResolution(c: (String) => Unit) = { c(aggregateColumnNames(”AppPlatformOSVersionDeviceResolution”) + app + p(platform) + p(osversion) + p(device) + p(resolution)) } //rows we are going to write too aggregateKeys(KEYSPACE ”ByMonth") = month //201109 aggregateKeys(KEYSPACE "ByDay") = day //20110910 aggregateKeys(KEYSPACE ”ByHour") = hour //2011091012 aggregateKeys(KEYSPACE ”ByMinute") = minute //201109101213 def r(columnName: String): Unit = { aggregateKeys.foreach{tuple:(ColumnFamily, String) => { val (columnFamily,row) = tuple if (row !=null &&row.size> 0) rows add (columnFamily -> row has columnName inc) //increment the counter } } } ccAppPlatformOSVersionDeviceResolution(r) 17
  • 18. Retrieving Data MultigetSliceCounterQuery • setColumnFamily(“ByDay”) • setKeys("20110910") • setRange(”app+event1=","app+event1=~",false,1000) • We will get all the apps and counts for event1 • setRange(”app+event2=","app+event2=~",false,1000) • We will get all the apps and the counts for event2 By app tastes great vs less filling • Sample code for the aggregate metrics and retrieving them https://github.com/joestein/apophis • What is with the tilde? 18
  • 19. Sort for success Not magic, just Cassandra 19
  • 20. A few more things about retrieving data • You need to start backwards from here. • If you want to-do things adhoc then map/reduce is better • Sometimes more rowsarebetterallowing more nodes to-do work – If you need to look at 100,000 metrics it is better to pull this out of 100 rows than out of 1 – Don’t be afraid to make CF and composite keys out of Time+ Aggregate data • 20111023+app=Yahoo! • This could be the row that holds ALL of the app information for that day, if you want to look at 100 apps at once with 1000 metrics for each per time period, this could be the way to go 20