SlideShare a Scribd company logo
1 of 78
Download to read offline
To be relational, or not ?
      That's not the question!
@sbtourist aka sergio bossa
@alexsnaps aka alex snaps

                                   sergio bossa & alex snaps - @sbtourist & @alexsnaps
Agenda

Act 1 – Relational databases:
   a difficult love affair.
Interlude 1 – Problems in the modern era:
    big data and the CAP theorem.
Act 2 : Non-relational databases: love is in the air.
Interlude 2 : Relational or non-relational?
    Not the correct question.
Act 3 : Building scalable apps.
The End
                                                        sergio bossa & alex snaps - @sbtourist & @alexsnaps
About us

Sergio Bossa
   Software Engineer at Bwin Italy.
   Long time open source contributor.
   (Micro)Blogger - @sbtourist.
Alex Snaps
   Software engineer at Terracotta …
   … after 10 years of industry experience.
   www.codespot.net — @alexsnaps
                                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
Act I
       Relational databases
A difficult love affair …



                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
The relational model

Defines constraints
    Finite model
      aka relation variable, relation or table
    Candidate keys
    Foreign keys
Queries are relations themselves
    Heading & body
SQL differs slightly from the "pure" relational model
                                                        sergio bossa & alex snaps - @sbtourist & @alexsnaps
The ACID guarantees

Let people easily reason about the problem.
    Atomic.
         We see all changes, or no changes at all.
    Consistent.
         Changes respect our rules and constraints.
    Isolated.
         We see all changes as independently happening.
    Durable.
         We keep the effect of our changes forever.
Fits a simplified model of our reality.

                                                          sergio bossa & alex snaps - @sbtourist & @alexsnaps
The SQL ubiquity

SQL is everywhere
    Still trying to figure out why my blog
      uses a relational database to be honest
SQL is known by everyone
    Raise your hand if you've never written a SQL query
... and if you don't want to
    ORM are there for you
    ActiveRecord if you're into Rails
Simple persistence for all our objects!
                                                sergio bossa & alex snaps - @sbtourist & @alexsnaps
Interlude 1
Problems of  the Modern Era



                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
Big Data

What's Big Data for you?
Not a question of quantity.
     Gigabytes?
     Terabytes?
     Petabytes?
It's all about supporting your business growth.
     Growth in terms of schema evolution.
     Growth in terms of data storage.
     Growth in terms of data processing.
Is your data stack capable of handling such a growth?

                                                        sergio bossa & alex snaps - @sbtourist & @alexsnaps
CAP Theorem

Year 2000: Formulated by Eric Brewer.
Year 2002: Demonstrated by Lynch and Gilbert.
Nowadays: A religion for many distributed system guys.
CAP Theorem in a few words:
    Consistency.
    Availability.
    Partition-Tolerance.
    Pick (at most) two.
More later.

                                                         sergio bossa & alex snaps - @sbtourist & @alexsnaps
Act II
Non-relational databases
Love is in the air

                           sergio bossa & alex snaps - @sbtourist & @alexsnaps
Non-relational databases


   From the origins to the current explosion...
              How do they differ?




                                           sergio bossa & alex snaps - @sbtourist & @alexsnaps
Data Model (1)


Column-family.
    Key-identified rows
      with a sparse number of columns.
    Columns grouped in families.
    Multiple families for the same key.
    Dynamically add/remove columns.
    Efficiently access same-family columns




                                             sergio bossa & alex snaps - @sbtourist & @alexsnaps
Data Model (2)


Graph.
    Vertices represent your data.
    Edges represent meaningful relations between nodes.
    Key/Value properties attached to both.
    Indexed properties.
    Efficient traversal.




                                         sergio bossa & alex snaps - @sbtourist & @alexsnaps
Data Model (3)


Document.
   Schemaless documents.
       With denormalized data.
   Stored as a whole unit.
   Clients can update/query contents.




                                        sergio bossa & alex snaps - @sbtourist & @alexsnaps
Data Model (4)


Key/Value.
    Opaque values.
    Maximize flexibility.
    Efficiently store and retrieve by key.




                                             sergio bossa & alex snaps - @sbtourist & @alexsnaps
Consistency Model (1)


Strict (Sequential) Consistency.
    Every read/write operation act on either:
        The last value read.
        The last value written.




                                        sergio bossa & alex snaps - @sbtourist & @alexsnaps
Consistency Model (2)


Eventual Consistency.
    All read/write operations will eventually reach
      consistent state.
        Stale data may be served.
        Versions may diverge.
        Repair may be needed.




                                          sergio bossa & alex snaps - @sbtourist & @alexsnaps
Partitioning (1)


Client-side partitioning.
    Every server is self-contained.
    Clients partition data per-request.




                                          sergio bossa & alex snaps - @sbtourist & @alexsnaps
Partitioning (2)


Server-side partitioning.
    Servers automatically partition data.
    Consistent-hashing, ring-based
      is the most used.




                                            sergio bossa & alex snaps - @sbtourist & @alexsnaps
Replication (1)


Master/Slave.
   Master propagates changes to slaves.
       Log replication.
       Statement replication.
   Slaves may take over master
     in case of failures.



                                     sergio bossa & alex snaps - @sbtourist & @alexsnaps
Replication (2)


N-Replication.
    Nodes are all peers.
        No master, no slaves.
    Each node replicates its data
     to a subset (N) of nodes.




                                    sergio bossa & alex snaps - @sbtourist & @alexsnaps
Processing


A distributed system is built by:
            Moving data toward its behavior.
                                               ... or ...
            Moving behavior toward its data.
An efficient distributed system is built by:
            Moving behavior toward its data.
Map/Reduce is the most common and efficient.




                                                            sergio bossa & alex snaps - @sbtourist & @alexsnaps
CAP - Problem

CAP Theorem.
    Consistency.
    Availability.
    Partition tolerance.
    Pick two.
    Do you remember?
Makes sense only
  when dealing with partitions/failures ...
                                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
CAP - Trade-offs

Consistency + Availability.
      Requests will wait until partitions heal.
             Full consistency.
             Full availability.
             Max latency.
Consistency + Partition tolerance.
      Some requests will act on partial data.
      Some requests will be refused.
             Full consistency.
             Reduced availability.
             Min latency.
Availability + Partition tolerance.
      All requests will be fulfilled.
             Sacrifice consistency.
             Max availability.
             Min latency.

                                                  sergio bossa & alex snaps - @sbtourist & @alexsnaps
Interlude II
Relational or non-relational? 
Not the correct question!

                                 sergio bossa & alex snaps - @sbtourist & @alexsnaps
Relational or non-relational? Not the correct question!

Freedom to build the right solution for your problems.
Freedom to scale your solution as your problems scale.
Know your use case.
      Understand the problem domain, then choose technology.
Know your data.
      Understand your data and data access patterns, then choose technology.
Know your tools.
      Understand available tools, don't go blind.
Pick the simple solution.
      Choose the simpler technology that works for your problem.
Build on that.

                                                                   sergio bossa & alex snaps - @sbtourist & @alexsnaps
Act III
Building scalable apps



                         sergio bossa & alex snaps - @sbtourist & @alexsnaps
Building scalable apps
Relational databases



                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Scaling out...


Adding app servers will work...
  ... for a little while!
But at some point we need to scale the database as well

                                                   RDBMS
                                           WRITE

                                                                    READ




                                                     PowerBook G4
                            PowerBook G4                                   PowerBook G4




                                                                                          sergio bossa & alex snaps - @sbtourist & @alexsnaps
Master-Slave

One master
    gets all the writes
    replicates to slaves
Slaves (& master)
    gets the read operations

We didn't really scale writes 
Static topology
SPOF remains
                                 sergio bossa & alex snaps - @sbtourist & @alexsnaps
Master-Master


Multiple masters
      Writes & reads all participants
Writes are replicated to masters
Synchronously
      Expensive 2PC
Asynchronously
      Conflicts have to be resolved

Complex
Static topology
Limited performance again
Solved SPOF, sort of…


                                        sergio bossa & alex snaps - @sbtourist & @alexsnaps
Vertical Partitioning


Data is split across multiple database servers based
    on
     The functional area

Joins are moved to the application
     Not relational anymore

SPOF back
What about one functional area growing "out of hand" ?


                                                         sergio bossa & alex snaps - @sbtourist & @alexsnaps
Horizontal partitioning


Data is split across multiple database servers based on
     Key sharding

Joins are moved to the application
     Not relational anymore

SPOF back
What about one functional area growing "out of hand" ?
Routing required
     Where's Order#123 ?

                                                          sergio bossa & alex snaps - @sbtourist & @alexsnaps
Building scalable apps
Caching



                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Caching


If going to the database is so
    expensive...
    ... we should just avoid it!
Put a cache in front of the database

Data remains closer to processing unit
If needed, distribute

                                         sergio bossa & alex snaps - @sbtourist & @alexsnaps
Distributed Caching


What if data isn't perfectly partitioned ?
How do we keep this all in sync ?                               RDBMS
                                                        WRITE

Peer-to-peer ?                                                                   READ




                                        Cache                   Cache                     Cache

                                                                  PowerBook G4
                                         PowerBook G4                                        PowerBook G4




                                                                 sergio bossa & alex snaps - @sbtourist & @alexsnaps
Distributed Caching


Cached data remains close to the
   processing unit                                RDBMS



Central unit                                       L2



   orchestrates it all             L1              L1                                L1




                                                    PowerBook G4
                                   PowerBook G4                                       PowerBook G4




                                                                   sergio bossa & alex snaps - @sbtourist & @alexsnaps
Distributed Caching




                           RDBMS




                            L2



           L1               L1              L1




                             PowerBook G4
            PowerBook G4                    PowerBook G4




                                                           sergio bossa & alex snaps - @sbtourist & @alexsnaps
Distributed Caching


     SLOWER                         LARGER

                      RAM




                      L2


              L1       L1     L1



              Core    Core   Core
     FASTER                         SMALLER




                                       sergio bossa & alex snaps - @sbtourist & @alexsnaps
Distributed Caching


    SLOWER                                                  LARGER




                            RDBMS




                             L2



             L1              L1              L1




    FASTER   PowerBook G4
                              PowerBook G4
                                             PowerBook G4




                                                            SMALLER




                                                             sergio bossa & alex snaps - @sbtourist & @alexsnaps
Distributed Caching
                            We've scaled our reads
                            But what about writes ?
    SLOWER                                                            LARGER




                                      RDBMS




                                       L2



             L1                        L1              L1




    FASTER   PowerBook G4
                                        PowerBook G4
                                                       PowerBook G4




                                                                      SMALLER




                                                                       sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind Cache

Rather than write changes directly to the slowest participant
    Write to fastest durable store (persistent queue)
        required for recovery in the face of failure
    Only write to database later
        in batches and/or coalesced
        while still controlling
            the lag
            the load on the database
In a distributed environment handling failures 
    we enforce happens at least once 
    loosens the contract vs. "once and only once"!

                                                                sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-through


                   RDBMS




                   Cache




                Application
                  code




                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-through


                   RDBMS




                   Cache




                Application
                  code




                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-through


                   RDBMS




                   Cache




                Application
                  code




                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-through


                   RDBMS




                   Cache




                Application
                  code




                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                  Cache




               Application
                 code




                             sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                  Cache




               Application
                 code




                             sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                  Cache      Writer




               Application
                 code




                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                  Cache      Writer




               Application
                 code




                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                  Cache      Writer




               Application
                 code




                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                  Cache      Writer




               Application
                 code




                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                  Cache      Writer




               Application
                 code




                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                  Cache      Writer




               Application
                 code




                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                  Cache      Writer




               Application
                 code




                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                             Writer
                  Cache       Writer




               Application
                 code




                                sergio bossa & alex snaps - @sbtourist & @alexsnaps
Write-behind


                  RDBMS




                             Writer
                  Cache       Writer




               Application
                 code




                                sergio bossa & alex snaps - @sbtourist & @alexsnaps
Building scalable apps
Non-relational databases



                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
The case for non-relationals


We've seen how to scale our relational database.
We've seen how to add caching.
We've seen how to make caching scale.
So why going non-relational?
     A matter of use case




                                                   sergio bossa & alex snaps - @sbtourist & @alexsnaps
Rich Data

Frequent schema changes.
    Relational databases
     cannot easily handle table modifications.
    Column, Document and Graph databases can.
Tracking/Processing of large relations.
    Relational databases keep and
     traverse table relations by foreign-key joins.
         Joins are expensive.
    Graph databases provide cheap and
     fast traversal operations.

                                                 sergio bossa & alex snaps - @sbtourist & @alexsnaps
Runtime Data

Poorly structured data.
     Relational model
       doesn't fit unstructured data.
          Unless you want BLOBs.
     Non-relational databases provide more flexible models.
Throw-away data.
     Relational databases provide maximum data durability.
     But not all kind of data are the same.
     When you can afford to possibly lose some data, non-relational
      databases let you trade durability for higher flexibility and
      performance.

                                                     sergio bossa & alex snaps - @sbtourist & @alexsnaps
Massive Data

Large quantity of data.
      Relational databases are typically single instance.
            Otherwise cost too much.
      Choose a partitioned non-relational database.
High volume of writes.
      Scaling writes with relational databases is difficult.
            Even if using write-behind caching.
      Choose a partitioned non-relational database.
      Eventual consistency may also help.
Complex, aggregated, queries.
      Complex aggregations and queries
        are expensive in standard relational databases.
            Remember joins?
      Choose a non-relational database supporting distributed data processing.
            Hint: Map/Reduce.


                                                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
Building scalable apps
Multi-paradigm



                               sergio bossa & alex snaps - @sbtourist & @alexsnaps
A case for multi-paradigm: CQRS




                                  sergio bossa & alex snaps - @sbtourist & @alexsnaps
CQRS - Explained

Command Store.
       Strictly modeled after the problem domain.
       Commands model domain changes.
       Domain changes cause events publishing.
Query Store.
       Fed by published events.
       Strictly modeled after the user interface.
       Possibly denormalized to accommodate queries.
Offline Store.
       Fed by published events.
       Strictly modeled after processing needs.
                 Business Intelligence.
                 Reporting.
                 Statistical aggregations, …

                                                       sergio bossa & alex snaps - @sbtourist & @alexsnaps
What about stores implementation?



                             sergio bossa & alex snaps - @sbtourist & @alexsnaps
Introducing Ehcache


Started in 2003 by Greg Luck
Apache 2.0 License
Integrated by lots of projects, products
Hibernate Provider implemented 2003
Web Caching 2004
Distributed Caching 2006
REST and SOAP APIs 2008
Acquired
    by Terracotta Sept. 2009

                                           sergio bossa & alex snaps - @sbtourist & @alexsnaps
Introducing Ehcache 2.4


New Search API
New consistency modes
New transactional modes

Still:
         Small memory footprint
         Cache Writers
         JTA
         Bulk load
Grows with your application
    with only two lines of configuration
         Scale Up - BigMemory (100’s of Gig, in process, NO GC)
         Scale Out - Clustering Platform (Up to 2 Terabytes, HA)


                                                                   sergio bossa & alex snaps - @sbtourist & @alexsnaps
Ehcache & Terracotta

Terabyte scale with minimal footprint
Server array with striping for linear scale
High availability
High performance data persistence
In-memory performance as you scale



                                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
Introducing Terrastore

Document Store.
   Ubiquitous.
   Consistent.
   Distributed.
   Scalable.
Written in Java.
Based on Terracotta.
Open Source.
Apache-licensed.
                                           sergio bossa & alex snaps - @sbtourist & @alexsnaps
Terrastore Architecture




                          sergio bossa & alex snaps - @sbtourist & @alexsnaps
CQRS - Implemented




                     sergio bossa & alex snaps - @sbtourist & @alexsnaps
CQRS - Implemented
Command Store.
     Express your domain as an object model.
     Process and store it with Ehcache and Terracotta.
     Optionally write it back to a relational database.
Query Store.
     Map queries to user (screen) views.
     Map views to JSON documents.
     Store and get them back with Terrastore.
Offline Store.
     Collect data from input commands.
     Keep data denormalized.
     Aggregate and process with Terrastore Map/Reduce.

                                                          sergio bossa & alex snaps - @sbtourist & @alexsnaps
The end
Be a polyglot!



                 sergio bossa & alex snaps - @sbtourist & @alexsnaps
Conclusion

Be polyglot
    Go out and play with all this
    Know your options (all of them)
Understand your domain
    It’s current and future requirements
Be wise, and choose well!



                                           sergio bossa & alex snaps - @sbtourist & @alexsnaps
Creative Commons by wwworks




                              sergio bossa & alex snaps - @sbtourist & @alexsnaps
More info

www.ehcache.org
www.terracotta.org
code.google.com/p/terrastore




                               sergio bossa & alex snaps - @sbtourist & @alexsnaps

More Related Content

Similar to To be relational, or not to be relational? That's NOT the question!

A. Sarkissian Death of Relational Databases
A. Sarkissian Death of Relational DatabasesA. Sarkissian Death of Relational Databases
A. Sarkissian Death of Relational DatabasesMediabistro
 
Why you shouldnt use django for that
Why you shouldnt use django for thatWhy you shouldnt use django for that
Why you shouldnt use django for thatIván Stepaniuk
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagramferreroroche11
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagramiammutex
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagramMohit Jain
 
Реляционные или нереляционные (Josh Berkus)
Реляционные или нереляционные (Josh Berkus)Реляционные или нереляционные (Josh Berkus)
Реляционные или нереляционные (Josh Berkus)Ontico
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneDoug Needham
 
Hadoop databases for oracle DBAs
Hadoop databases for oracle DBAsHadoop databases for oracle DBAs
Hadoop databases for oracle DBAsMaxym Kharchenko
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinAmazon Web Services
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by AccidentGleicon Moraes
 
GRAKN.AI: The Hyper-Relational Database for Knowledge-Oriented Systems
GRAKN.AI: The Hyper-Relational Database for Knowledge-Oriented SystemsGRAKN.AI: The Hyper-Relational Database for Knowledge-Oriented Systems
GRAKN.AI: The Hyper-Relational Database for Knowledge-Oriented SystemsVaticle
 
Application architecture
Application architectureApplication architecture
Application architectureIván Stepaniuk
 
web 3.0 part1
web 3.0 part1web 3.0 part1
web 3.0 part1harisgx
 
Games for the Masses (Jax)
Games for the Masses (Jax)Games for the Masses (Jax)
Games for the Masses (Jax)Wooga
 
Velox at SF Data Mining Meetup
Velox at SF Data Mining MeetupVelox at SF Data Mining Meetup
Velox at SF Data Mining MeetupDan Crankshaw
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search medcl
 
[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architectureNikolay Matvienko
 

Similar to To be relational, or not to be relational? That's NOT the question! (20)

A. Sarkissian Death of Relational Databases
A. Sarkissian Death of Relational DatabasesA. Sarkissian Death of Relational Databases
A. Sarkissian Death of Relational Databases
 
Why you shouldnt use django for that
Why you shouldnt use django for thatWhy you shouldnt use django for that
Why you shouldnt use django for that
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
 
No Sql
No SqlNo Sql
No Sql
 
NCompass Live: RDA: Are We There Yet?
NCompass Live: RDA: Are We There Yet?NCompass Live: RDA: Are We There Yet?
NCompass Live: RDA: Are We There Yet?
 
Реляционные или нереляционные (Josh Berkus)
Реляционные или нереляционные (Josh Berkus)Реляционные или нереляционные (Josh Berkus)
Реляционные или нереляционные (Josh Berkus)
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZone
 
Hadoop databases for oracle DBAs
Hadoop databases for oracle DBAsHadoop databases for oracle DBAs
Hadoop databases for oracle DBAs
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit Dublin
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
 
GRAKN.AI: The Hyper-Relational Database for Knowledge-Oriented Systems
GRAKN.AI: The Hyper-Relational Database for Knowledge-Oriented SystemsGRAKN.AI: The Hyper-Relational Database for Knowledge-Oriented Systems
GRAKN.AI: The Hyper-Relational Database for Knowledge-Oriented Systems
 
Application architecture
Application architectureApplication architecture
Application architecture
 
web 3.0 part1
web 3.0 part1web 3.0 part1
web 3.0 part1
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph
 
Games for the Masses (Jax)
Games for the Masses (Jax)Games for the Masses (Jax)
Games for the Masses (Jax)
 
Velox at SF Data Mining Meetup
Velox at SF Data Mining MeetupVelox at SF Data Mining Meetup
Velox at SF Data Mining Meetup
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture
 

More from Sergio Bossa

Three Languages in Thirty Minutes
Three Languages in Thirty MinutesThree Languages in Thirty Minutes
Three Languages in Thirty MinutesSergio Bossa
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developersSergio Bossa
 
Actor concurrency for the JVM: a case study
Actor concurrency for the JVM: a case studyActor concurrency for the JVM: a case study
Actor concurrency for the JVM: a case studySergio Bossa
 
Scalable Databases - From Relational Databases To Polyglot Persistence
Scalable Databases - From Relational Databases To Polyglot PersistenceScalable Databases - From Relational Databases To Polyglot Persistence
Scalable Databases - From Relational Databases To Polyglot PersistenceSergio Bossa
 
Scale Your Database And Be Happy
Scale Your Database And Be HappyScale Your Database And Be Happy
Scale Your Database And Be HappySergio Bossa
 
Clustering In The Wild
Clustering In The WildClustering In The Wild
Clustering In The WildSergio Bossa
 
Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008
Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008
Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008Sergio Bossa
 

More from Sergio Bossa (8)

Three Languages in Thirty Minutes
Three Languages in Thirty MinutesThree Languages in Thirty Minutes
Three Languages in Thirty Minutes
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developers
 
Actor concurrency for the JVM: a case study
Actor concurrency for the JVM: a case studyActor concurrency for the JVM: a case study
Actor concurrency for the JVM: a case study
 
Scalable Databases - From Relational Databases To Polyglot Persistence
Scalable Databases - From Relational Databases To Polyglot PersistenceScalable Databases - From Relational Databases To Polyglot Persistence
Scalable Databases - From Relational Databases To Polyglot Persistence
 
Scale Your Database And Be Happy
Scale Your Database And Be HappyScale Your Database And Be Happy
Scale Your Database And Be Happy
 
Clustering In The Wild
Clustering In The WildClustering In The Wild
Clustering In The Wild
 
Real Terracotta
Real TerracottaReal Terracotta
Real Terracotta
 
Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008
Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008
Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008
 

Recently uploaded

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Recently uploaded (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

To be relational, or not to be relational? That's NOT the question!

  • 1. To be relational, or not ? That's not the question! @sbtourist aka sergio bossa @alexsnaps aka alex snaps sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 2. Agenda Act 1 – Relational databases: a difficult love affair. Interlude 1 – Problems in the modern era: big data and the CAP theorem. Act 2 : Non-relational databases: love is in the air. Interlude 2 : Relational or non-relational? Not the correct question. Act 3 : Building scalable apps. The End sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 3. About us Sergio Bossa Software Engineer at Bwin Italy. Long time open source contributor. (Micro)Blogger - @sbtourist. Alex Snaps Software engineer at Terracotta … … after 10 years of industry experience. www.codespot.net — @alexsnaps sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 4. Act I Relational databases A difficult love affair … sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 5. The relational model Defines constraints Finite model aka relation variable, relation or table Candidate keys Foreign keys Queries are relations themselves Heading & body SQL differs slightly from the "pure" relational model sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 6. The ACID guarantees Let people easily reason about the problem. Atomic. We see all changes, or no changes at all. Consistent. Changes respect our rules and constraints. Isolated. We see all changes as independently happening. Durable. We keep the effect of our changes forever. Fits a simplified model of our reality. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 7. The SQL ubiquity SQL is everywhere Still trying to figure out why my blog uses a relational database to be honest SQL is known by everyone Raise your hand if you've never written a SQL query ... and if you don't want to ORM are there for you ActiveRecord if you're into Rails Simple persistence for all our objects! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 8. Interlude 1 Problems of  the Modern Era sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 9. Big Data What's Big Data for you? Not a question of quantity. Gigabytes? Terabytes? Petabytes? It's all about supporting your business growth. Growth in terms of schema evolution. Growth in terms of data storage. Growth in terms of data processing. Is your data stack capable of handling such a growth? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 10. CAP Theorem Year 2000: Formulated by Eric Brewer. Year 2002: Demonstrated by Lynch and Gilbert. Nowadays: A religion for many distributed system guys. CAP Theorem in a few words: Consistency. Availability. Partition-Tolerance. Pick (at most) two. More later. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 11. Act II Non-relational databases Love is in the air sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 12. Non-relational databases From the origins to the current explosion... How do they differ? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 13. Data Model (1) Column-family. Key-identified rows with a sparse number of columns. Columns grouped in families. Multiple families for the same key. Dynamically add/remove columns. Efficiently access same-family columns sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 14. Data Model (2) Graph. Vertices represent your data. Edges represent meaningful relations between nodes. Key/Value properties attached to both. Indexed properties. Efficient traversal. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 15. Data Model (3) Document. Schemaless documents. With denormalized data. Stored as a whole unit. Clients can update/query contents. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 16. Data Model (4) Key/Value. Opaque values. Maximize flexibility. Efficiently store and retrieve by key. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 17. Consistency Model (1) Strict (Sequential) Consistency. Every read/write operation act on either: The last value read. The last value written. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 18. Consistency Model (2) Eventual Consistency. All read/write operations will eventually reach consistent state. Stale data may be served. Versions may diverge. Repair may be needed. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 19. Partitioning (1) Client-side partitioning. Every server is self-contained. Clients partition data per-request. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 20. Partitioning (2) Server-side partitioning. Servers automatically partition data. Consistent-hashing, ring-based is the most used. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 21. Replication (1) Master/Slave. Master propagates changes to slaves. Log replication. Statement replication. Slaves may take over master in case of failures. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 22. Replication (2) N-Replication. Nodes are all peers. No master, no slaves. Each node replicates its data to a subset (N) of nodes. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 23. Processing A distributed system is built by: Moving data toward its behavior. ... or ... Moving behavior toward its data. An efficient distributed system is built by: Moving behavior toward its data. Map/Reduce is the most common and efficient. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 24. CAP - Problem CAP Theorem. Consistency. Availability. Partition tolerance. Pick two. Do you remember? Makes sense only when dealing with partitions/failures ... sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 25. CAP - Trade-offs Consistency + Availability. Requests will wait until partitions heal. Full consistency. Full availability. Max latency. Consistency + Partition tolerance. Some requests will act on partial data. Some requests will be refused. Full consistency. Reduced availability. Min latency. Availability + Partition tolerance. All requests will be fulfilled. Sacrifice consistency. Max availability. Min latency. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 26. Interlude II Relational or non-relational?  Not the correct question! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 27. Relational or non-relational? Not the correct question! Freedom to build the right solution for your problems. Freedom to scale your solution as your problems scale. Know your use case. Understand the problem domain, then choose technology. Know your data. Understand your data and data access patterns, then choose technology. Know your tools. Understand available tools, don't go blind. Pick the simple solution. Choose the simpler technology that works for your problem. Build on that. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 28. Act III Building scalable apps sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 29. Building scalable apps Relational databases sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 30. Scaling out... Adding app servers will work... ... for a little while! But at some point we need to scale the database as well RDBMS WRITE READ PowerBook G4 PowerBook G4 PowerBook G4 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 31. Master-Slave One master gets all the writes replicates to slaves Slaves (& master) gets the read operations We didn't really scale writes  Static topology SPOF remains sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 32. Master-Master Multiple masters Writes & reads all participants Writes are replicated to masters Synchronously Expensive 2PC Asynchronously Conflicts have to be resolved Complex Static topology Limited performance again Solved SPOF, sort of… sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 33. Vertical Partitioning Data is split across multiple database servers based on The functional area Joins are moved to the application Not relational anymore SPOF back What about one functional area growing "out of hand" ? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 34. Horizontal partitioning Data is split across multiple database servers based on Key sharding Joins are moved to the application Not relational anymore SPOF back What about one functional area growing "out of hand" ? Routing required Where's Order#123 ? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 35. Building scalable apps Caching sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 36. Caching If going to the database is so expensive... ... we should just avoid it! Put a cache in front of the database Data remains closer to processing unit If needed, distribute sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 37. Distributed Caching What if data isn't perfectly partitioned ? How do we keep this all in sync ? RDBMS WRITE Peer-to-peer ? READ Cache Cache Cache PowerBook G4 PowerBook G4 PowerBook G4 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 38. Distributed Caching Cached data remains close to the processing unit RDBMS Central unit L2 orchestrates it all L1 L1 L1 PowerBook G4 PowerBook G4 PowerBook G4 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 39. Distributed Caching RDBMS L2 L1 L1 L1 PowerBook G4 PowerBook G4 PowerBook G4 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 40. Distributed Caching SLOWER LARGER RAM L2 L1 L1 L1 Core Core Core FASTER SMALLER sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 41. Distributed Caching SLOWER LARGER RDBMS L2 L1 L1 L1 FASTER PowerBook G4 PowerBook G4 PowerBook G4 SMALLER sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 42. Distributed Caching We've scaled our reads But what about writes ? SLOWER LARGER RDBMS L2 L1 L1 L1 FASTER PowerBook G4 PowerBook G4 PowerBook G4 SMALLER sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 43. Write-behind Cache Rather than write changes directly to the slowest participant Write to fastest durable store (persistent queue) required for recovery in the face of failure Only write to database later in batches and/or coalesced while still controlling the lag the load on the database In a distributed environment handling failures  we enforce happens at least once  loosens the contract vs. "once and only once"! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 44. Write-through RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 45. Write-through RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 46. Write-through RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 47. Write-through RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 48. Write-behind RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 49. Write-behind RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 50. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 51. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 52. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 53. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 54. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 55. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 56. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 57. Write-behind RDBMS Writer Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 58. Write-behind RDBMS Writer Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 59. Building scalable apps Non-relational databases sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 60. The case for non-relationals We've seen how to scale our relational database. We've seen how to add caching. We've seen how to make caching scale. So why going non-relational? A matter of use case sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 61. Rich Data Frequent schema changes. Relational databases cannot easily handle table modifications. Column, Document and Graph databases can. Tracking/Processing of large relations. Relational databases keep and traverse table relations by foreign-key joins. Joins are expensive. Graph databases provide cheap and fast traversal operations. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 62. Runtime Data Poorly structured data. Relational model doesn't fit unstructured data. Unless you want BLOBs. Non-relational databases provide more flexible models. Throw-away data. Relational databases provide maximum data durability. But not all kind of data are the same. When you can afford to possibly lose some data, non-relational databases let you trade durability for higher flexibility and performance. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 63. Massive Data Large quantity of data. Relational databases are typically single instance. Otherwise cost too much. Choose a partitioned non-relational database. High volume of writes. Scaling writes with relational databases is difficult. Even if using write-behind caching. Choose a partitioned non-relational database. Eventual consistency may also help. Complex, aggregated, queries. Complex aggregations and queries are expensive in standard relational databases. Remember joins? Choose a non-relational database supporting distributed data processing. Hint: Map/Reduce. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 64. Building scalable apps Multi-paradigm sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 65. A case for multi-paradigm: CQRS sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 66. CQRS - Explained Command Store. Strictly modeled after the problem domain. Commands model domain changes. Domain changes cause events publishing. Query Store. Fed by published events. Strictly modeled after the user interface. Possibly denormalized to accommodate queries. Offline Store. Fed by published events. Strictly modeled after processing needs. Business Intelligence. Reporting. Statistical aggregations, … sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 67. What about stores implementation? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 68. Introducing Ehcache Started in 2003 by Greg Luck Apache 2.0 License Integrated by lots of projects, products Hibernate Provider implemented 2003 Web Caching 2004 Distributed Caching 2006 REST and SOAP APIs 2008 Acquired by Terracotta Sept. 2009 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 69. Introducing Ehcache 2.4 New Search API New consistency modes New transactional modes Still: Small memory footprint Cache Writers JTA Bulk load Grows with your application with only two lines of configuration Scale Up - BigMemory (100’s of Gig, in process, NO GC) Scale Out - Clustering Platform (Up to 2 Terabytes, HA) sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 70. Ehcache & Terracotta Terabyte scale with minimal footprint Server array with striping for linear scale High availability High performance data persistence In-memory performance as you scale sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 71. Introducing Terrastore Document Store. Ubiquitous. Consistent. Distributed. Scalable. Written in Java. Based on Terracotta. Open Source. Apache-licensed. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 72. Terrastore Architecture sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 73. CQRS - Implemented sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 74. CQRS - Implemented Command Store. Express your domain as an object model. Process and store it with Ehcache and Terracotta. Optionally write it back to a relational database. Query Store. Map queries to user (screen) views. Map views to JSON documents. Store and get them back with Terrastore. Offline Store. Collect data from input commands. Keep data denormalized. Aggregate and process with Terrastore Map/Reduce. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 75. The end Be a polyglot! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 76. Conclusion Be polyglot Go out and play with all this Know your options (all of them) Understand your domain It’s current and future requirements Be wise, and choose well! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 77. Creative Commons by wwworks sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 78. More info www.ehcache.org www.terracotta.org code.google.com/p/terrastore sergio bossa & alex snaps - @sbtourist & @alexsnaps