SlideShare a Scribd company logo
1 of 54
Abdelmonaim Remani | Just.me Inc.


The Rise of NoSQL and
 Polyglot Persistence
About Me
• Software Architect at Just.me Inc.
• Interested in technology evangelism and enterprise software
  development and architecture
• Frequent speaker (JavaOne, JAX, OSCON, ORDEV, etc…)
• Open-source advocate
• President and founder of a number of user group
   – NorCal Java User Group
   – The Silicon Valley Spring User Group
   – The Silicon Valley Dart Meetup
• Bio:         http://about.me/PolymathicCoder
• Twitter:     @PolymathicCoder
• Email:       abdelmonaim.remani@gmail.com
License




• Creative Commons Attribution Non-Commercial 3.0 Unported
   – http://creativecommons.org/licenses/by-nc/3.0


• Disclaimer: The graphics and the logo in the presentation
  belong to their rightful owners
The Golden Age of Relational
        Databases
Relational Data Stores
• Relational Data Stores have been the
  predominant choice in storing data
  – The existence mature solutions
    • Oracle, MySQL, Ms SQL Server, etc…
  – Wide adoption and familiarity
    • Developers and even advanced business users
  – An abundance of tools
  – Etc…
• It became the De-Facto standard
The Relational Model
• Data
  – Stored in
     • 2 dimensional tables (Relations)
     • Rows (tuples) and columns (attributes)
 • Has well-define enforced schema
   – Relations themselves
   – Integrity constrains
• Normalization
  – Smaller tables with well-defined relationship
    between them
  – Why?
      • Minimized redundancy
      • No modification anomalies
          – Modification Propagation or cascading
The Relational Model
• Supported by SQL (Structured Query
  Language)
  – A somewhat standardized query language
  – Very flexible
  – Many Operations
    • Across multiple relations such as JOIN
    • Aggregations such as GROUP BY
    • Etc…
The Relational Model
• Transactional
  • ACID
    – Atomicity
        » All or nothing
    – Consistency
        » From one valid state to another
    – Isolation
        » Concurrency result in a valid state
    – Durability
        » Once committed, it’s forever
The Relational Model
• Designed with the assumptions that
 – The end-user will directly interact with database

   » It makes sense that the RDBMS should manage concurrency
     and integrity

   » Access Patterns are unknown

     » A flexible query language that is close to English

     » Data structure with no bias towards a particular pattern of
       querying

 – The database runs on a single machine

   » The only way to promise true ACID
Road Bumps
• We started building more complex applications on top
  of relational databases
 – Business logic moved out of the RDBMS

   » Fewer triggers and stored procedures and replaced by
     equivalent application layer code

 – The applications themselves evolved beyond the procedural
   paradigm to a more OOP approach

   » The Object-Relational impedance mismatch

     » ORM framework to the rescue
Scalability
We became data hoarders!
• As our datasets grew out of control
• Performance decreases exponentially
  – We buy a beefier machines
     • Larry Ellison’s most expensive RAC and make
       him even richer
• This put off the problem for a little while
Optimization
• We hire a guy
  – Indexes half of the databases
     • Made those queries a little faster
  – Creates materialized views for complex joins
     • Nightmare to maintain, get stale, etc…
  – He de-normalizes
     • Any thing but a smooth transition!
     • Redundancy
  – He introduces Caching
     • Data too stale
     • More redundancy
Clustering
• We hire another guy
   – Tells us that we hit the limit of the one machine
   – You need to scale out (Horizontally)
      • Master/Slave
          – Assuming you read more than you write
          – Write to the Master and Read from the Slaves
          – Master needs to replicate data across the slaves
              » Risk incorrect reads
          – How’s that consistent?!!
      • Sharding
          –   Improves reads as much as writes
          –   Can’t join across partitions
          –   No referential integrity
          –   Requires modification of client applications
          –   Introduces a single-point of failure
          –   How’s that consistent?!!
What’s the Point?
• We vertically scale our relational
  database
  – We’re no longer consistent
  – No ACIDity?
  – We loose query flexibility
• Are we doing something wrong?
The CAP Theorem
The CAP Theorem
• Eric Brewer on distributed systems
  – Pick tow out of
    • Consistency
    • Availability
    • Partition Tolerance
• There is Fast Cheap Good service
  – Cheap Good service won’t be Fast
  – Fast Good service won’t be Cheap
  – Fast Cheap service won’t be Good
Relational Model & CAP
• Relational Data Stores happen to favor
  – Consistency and Availability
  – For historical reasons
     • They are key to certain type of applications
     • The bank example
        – I deposit $100 in my friend’s bank account
        – Blah blah blah…
• According to CAP, Partition Tolerance is
  impossible meaning that horizontal
  scaling is impossible
Scheiße!
• We’re in a pickle
  – Too much data in CA model
  – Vertical Scaling
     • Too expensive
     • Not sustainable
• Forced to explore other alternatives in
  light of CAP
What AP Looks Like
• Partition Tolerance
  – Since we reached the limit of the one machine
    we have no choice but to scale horizontally
  – Which means to be partition tolerant
• Availability
  – Nobody is willing to give up most of the time
  – This becomes even better with distribution
  – In a cluster of servers
     • The individual node might be unreliable by itself
     • But a whole inherently reliable
What AP Looks Like
• According the CAP we simply cannot have C
• Consistency
  – I make a update and all subsequent read the most
    updated value
  – Unfortunately this is impossible as it takes time for
    the change to be replicated across each node of
    the cluster
• What a bummer?!
• Let’s look and AP system
  – DNS (Domain Naming Service)
     • Not all the nodes have the most updated records (You
       register that domain name and wait for a few days to
       guarantee that every DNS knows about it)
Eventual Consistency
• This is no so bad
   – It means that we just settled for a lesser degree
     Consistency
• So what if
   – Mohammad in Morocco updated his relationship status
     to single on an some edge node
   – His cousin who lives Spain saw it immediately because
     they happen to be on the same edge node
   – His secret admirer Sara who lives in the United States
     could not see it until an hour later
   – His bother in Japan got the update the next day
   – They all got it eventually!
• Eventual Consistency as Opposed to Immediate
  Consistency
The Compromise
• We settle for weaker consistency model
  – BASE
    • Basically Available
    • Soft state
    • Eventual Consistency
• ACID on the individual node BASE on
  the cluster
The Slippery Slope of the
        Faithless
You might as well Question…
• Schema
 – Logical
   • Well-defined and rigid in relational databases
   • Why not a flexible one or even no schema
 – Physical
   • B Trees in most relational databases
   • Why not use some other underlying data
     structure
You might as well Question…
• Integrity Constraints
  – Who cares?
• A Query Language
  – Anything would do…
• Security
  – None
• Name it…
NoSQL: Going Rogue…
NoSQL
• A wide range of specialized data stores
  with the goal of addressing the challenges
  of the relational model
• Eric Evans
  – The whole point of seeking alternatives is that
    you need to solve a problem that relational
    databases are a bad fit for
• Let me make it easier
  – It is does not anti-SQL or anti-Relational
  – Any data store that is non-relational
• “Not Only SQL” instead of “NO SQL”
SQL             vs.            NoSQL
A single machine                  A cluster
       CA                        AP/CA/CP
 Scale Vertically             Scale Horizontally
      SQL                       Custom APIs
      ACID                          BASE
  Full Indexes                 Mostly on Keys


            There are outliers of course
SQL              vs.            NoSQL
    Rigid Schema                    Schema-less
   Flexible Queries              Pre-defined Queries

• SQL (Relational)
  – Concerned about what the data consists of
• NoSQL (Non-Relational)
  – Concerned with how the data is queried

                There are outliers of course
The Zoo
Key-Value Data Stores
• Basically a big hash map associative array
   – Very Simple
   – Very fast read and write
   – No secondary indexes
• Use When
   – Your data is not highly related
   – All you need is basic CRUD
• Challenges
   – Complex queries
• Check out the Amazon Dynamo Paper
       • http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-
         sosp2007.pdf
• Featured Projects
   – DynamoDB http://hbase.apache.org/
   – Riak http://wiki.basho.com/
   – Redis http://redis.io/
Columnar Stores
•   In a table, data of the same column is stored together
     – Storage is not wasted on null value as in row-based stores (RDBMS)
     – Great for sparse tables
     – Very fast column operation including aggregation
•   Use When
     – Big Data (Excellent leverage of Map Reduce)
     – Need compression or versioning
•   Challenges
     – You better know your access patterns before hand
     – Keys design is not trivial
•   Check out Google’s BigTable Paper
     – http://static.googleusercontent.com/external_content/untrusted_dlcp/research.go
       ogle.com/en/us/archive/bigtable-osdi06.pdf
•   Featured Projects
     – Hbase http://hbase.apache.org/
     – Cassanda http://cassandra.apache.org/
Document Data Stores
•   Nested structures of hashes and their values
     – A document can be
          •   Simply a hash and its value
          •   Hash and another document as its value
          •   No limit in depth
     –   Very Flexible schema
     –   Well-Indexed data
     –   Works well with OOP (No impedance mismatch)
     –   De-normalize as a best practice
•   Use when
     – You don’t know much about the schema
     – The schema very likely to change
•   Challenges
     – Complex Join-like queries
     – Self-referencing documents and circular dependencies
•   Projects
     – MongoDB http://www.mongodb.org/
     – CouchDB http://couchdb.apache.org/
Graph Data Stores
• A graph
   –   Perfect for highly interconnected data
   –   Allows for explicit relationships
   –   Fined graph grained-traversal
   –   Very Flexible
   –   Works well with OOP (No impedance mismatch)
• Use when
   – Your data looks like a graph and requires graph question
   – You are smart enough not to try this on another data store
• Challenges
   – Doesn’t scale-well horizontally
• Featured Projects
   – Neo4j http://neo4j.org/
Relational Data Stores
• Use when
   – Your data Highly relational
   – There is a need to break data into small pieces and
     assemble it in different ways
   – When consistence is king
   – Access patterns are unknown
   – Reporting
• Challenges
   – Doesn’t scale-well horizontally
• Featured Projects
   –   Oracle http://www.oracle.com/index.html
   –   Postgres http://www.postgresql.org/
   –   Ms SQL Server http://dev.mysql.com/
   –   MySQL http://www.mysql.com/
How do you choose?
If It Doesn’t Fit, You Must Acquit!
• Data
  –   Does it have a natural structure?
  –   How it is connected to each other?
  –   How is it distributed?
  –   How much?
• Access Patterns
  – Reads/Writes ratio?
  – Uniform or random?
• CAP
Other Considerations
•   Maturity
•   Stability
•   Maintainability
•   Durability
•   Cost
•   Tools
•   Familiarity
For Fairness’ Sake!
For Fairness’ Sake!
• Relational data stores did not fail us
  – They actually perform very well
• We failed ourselves
  – By using them as solutions for problems
    they weren’t designed to solve to begin
    with
• Take any data store and you’ll get as
  much trouble
For Fairness’ Sake!
• You can’t expect
  – A flathead screwdriver to work on a Philips
    as well as one with the matching Philips
    blade
  – A crosshead screwdriver to work on
    flathead screw
Polyglot Persistence
Polyglot Persistence
• Enterprise application are complex and
  combine complex problems
  – Assumption that we should use one data store is
    absurd
  – You can’t try to fit all in one model and expect no
    problem
• Polyglot Persistence
  – To leverage multiple data storages, based on the
    way data is used by the application
     • Associated with a learning curve
     • Long term investment (More productive in the long-run)
  – Leverage the strength of multiple data stores
Polyglot Persistence
• Example
  –   MongoDB for the product catalog
  –   Redis for shopping cart
  –   DynamoDB for social profile info
  –   Neo4j for the social graph
  –   HBase for inbox and public feed messages
  –   MySQL for payment and account info
  –   Cassandra for audit and activity log
• Disclaimer: I’m not making any
  recommendation here.
NoSQL in the Cloud
NoSQL in the Cloud
• NoSQL as a commodity
  – Fully managed data stores (No
    maintenance)
  – Elastic scaling
  – Cheap storage
• Featured:
  – Amazon AWS
  – Heroku Add-ons
  – CloudFoundry
As Promised!
The A’s the Q’s in the Abstract
• What does the rise of all these NoSQL mean
  to my enterprise?
   – I’m guessing a lot
• What is NoSQL to begin with?
   – Any non-relational data store
• Does it mean “NO SQL”?
   – No
• Could this be just another fad?
   – I don’t think so
The A’s the Q’s in the Abstract
• Is a good idea to be the future of my
  enterprise on these new exotic
  technologies and simply abandon
  proven mature RDBMS?
  – It’s up to you. I will say “No guts, no glory!”
• How scalable is scalable?
  – However much you need it to be
The A’s the Q’s in the Abstract
• Assuming that I am sold, how do I
  choose the one that fits my needs the
  best?
  – I’ll tell you if you hire me
• Is there a middle ground somewhere?
  – Polyglot Persistence
• What is this Polyglot Persistence I hear
  about?
  – It’s the middle ground
Any Other Questions?
Thank You All!

@PolymathicCoder

More Related Content

What's hot

Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfChris Bingham
 
Cloud Security - Security Aspects of Cloud Computing
Cloud Security - Security Aspects of Cloud ComputingCloud Security - Security Aspects of Cloud Computing
Cloud Security - Security Aspects of Cloud ComputingJim Geovedi
 
Migration into a Cloud
Migration into a CloudMigration into a Cloud
Migration into a CloudDivya S
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
AZ-900 Azure Fundamentals.pdf
AZ-900 Azure Fundamentals.pdfAZ-900 Azure Fundamentals.pdf
AZ-900 Azure Fundamentals.pdfssuser5813861
 
Cloud deployment models
Cloud deployment modelsCloud deployment models
Cloud deployment modelsAshok Kumar
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Cloud platforms - Cloud Computing
Cloud platforms - Cloud ComputingCloud platforms - Cloud Computing
Cloud platforms - Cloud ComputingAditi Rai
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB AtlasMongoDB
 
Cloud Lock-in vs. Cloud Interoperability - Indicthreads cloud computing conf...
Cloud Lock-in vs. Cloud Interoperability  - Indicthreads cloud computing conf...Cloud Lock-in vs. Cloud Interoperability  - Indicthreads cloud computing conf...
Cloud Lock-in vs. Cloud Interoperability - Indicthreads cloud computing conf...IndicThreads
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing SecurityNinh Nguyen
 
Load balancing in cloud
Load balancing in cloudLoad balancing in cloud
Load balancing in cloudSouvik Maji
 
Cloud Security: A New Perspective
Cloud Security: A New PerspectiveCloud Security: A New Perspective
Cloud Security: A New PerspectiveWen-Pai Lu
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 

What's hot (20)

Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Cloud Deployments Models
Cloud Deployments ModelsCloud Deployments Models
Cloud Deployments Models
 
Big query
Big queryBig query
Big query
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
 
Cloud Security - Security Aspects of Cloud Computing
Cloud Security - Security Aspects of Cloud ComputingCloud Security - Security Aspects of Cloud Computing
Cloud Security - Security Aspects of Cloud Computing
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Migration into a Cloud
Migration into a CloudMigration into a Cloud
Migration into a Cloud
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
AZ-900 Azure Fundamentals.pdf
AZ-900 Azure Fundamentals.pdfAZ-900 Azure Fundamentals.pdf
AZ-900 Azure Fundamentals.pdf
 
Cloud deployment models
Cloud deployment modelsCloud deployment models
Cloud deployment models
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Cloud platforms - Cloud Computing
Cloud platforms - Cloud ComputingCloud platforms - Cloud Computing
Cloud platforms - Cloud Computing
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
Data partitioning
Data partitioningData partitioning
Data partitioning
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Cloud Lock-in vs. Cloud Interoperability - Indicthreads cloud computing conf...
Cloud Lock-in vs. Cloud Interoperability  - Indicthreads cloud computing conf...Cloud Lock-in vs. Cloud Interoperability  - Indicthreads cloud computing conf...
Cloud Lock-in vs. Cloud Interoperability - Indicthreads cloud computing conf...
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing Security
 
Load balancing in cloud
Load balancing in cloudLoad balancing in cloud
Load balancing in cloud
 
Cloud Security: A New Perspective
Cloud Security: A New PerspectiveCloud Security: A New Perspective
Cloud Security: A New Perspective
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 

Similar to The Rise of NoSQL and Polyglot Persistence

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Ricard Clau
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documentsDr. Awase Khirni Syed
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixJason Brown
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTPConnor McDonald
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseJoe Alex
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP120bi
 

Similar to The Rise of NoSQL and Polyglot Persistence (20)

NoSql
NoSqlNoSql
NoSql
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documents
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTP
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
No SQL
No SQLNo SQL
No SQL
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP
 

More from Abdelmonaim Remani

The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling SoftwareAbdelmonaim Remani
 
The Art of Metaprogramming in Java
The Art of Metaprogramming in Java  The Art of Metaprogramming in Java
The Art of Metaprogramming in Java Abdelmonaim Remani
 
Building enterprise web applications with spring 3
Building enterprise web applications with spring 3Building enterprise web applications with spring 3
Building enterprise web applications with spring 3Abdelmonaim Remani
 
Introduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring MvcIntroduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring MvcAbdelmonaim Remani
 
Introduction To Rich Internet Applications
Introduction To Rich Internet ApplicationsIntroduction To Rich Internet Applications
Introduction To Rich Internet ApplicationsAbdelmonaim Remani
 

More from Abdelmonaim Remani (8)

The Eschatology of Java
The Eschatology of JavaThe Eschatology of Java
The Eschatology of Java
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
How RESTful Is Your REST?
How RESTful Is Your REST?How RESTful Is Your REST?
How RESTful Is Your REST?
 
The Art of Metaprogramming in Java
The Art of Metaprogramming in Java  The Art of Metaprogramming in Java
The Art of Metaprogramming in Java
 
Le Tour de xUnit
Le Tour de xUnitLe Tour de xUnit
Le Tour de xUnit
 
Building enterprise web applications with spring 3
Building enterprise web applications with spring 3Building enterprise web applications with spring 3
Building enterprise web applications with spring 3
 
Introduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring MvcIntroduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring Mvc
 
Introduction To Rich Internet Applications
Introduction To Rich Internet ApplicationsIntroduction To Rich Internet Applications
Introduction To Rich Internet Applications
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

The Rise of NoSQL and Polyglot Persistence

  • 1. Abdelmonaim Remani | Just.me Inc. The Rise of NoSQL and Polyglot Persistence
  • 2. About Me • Software Architect at Just.me Inc. • Interested in technology evangelism and enterprise software development and architecture • Frequent speaker (JavaOne, JAX, OSCON, ORDEV, etc…) • Open-source advocate • President and founder of a number of user group – NorCal Java User Group – The Silicon Valley Spring User Group – The Silicon Valley Dart Meetup • Bio: http://about.me/PolymathicCoder • Twitter: @PolymathicCoder • Email: abdelmonaim.remani@gmail.com
  • 3. License • Creative Commons Attribution Non-Commercial 3.0 Unported – http://creativecommons.org/licenses/by-nc/3.0 • Disclaimer: The graphics and the logo in the presentation belong to their rightful owners
  • 4. The Golden Age of Relational Databases
  • 5. Relational Data Stores • Relational Data Stores have been the predominant choice in storing data – The existence mature solutions • Oracle, MySQL, Ms SQL Server, etc… – Wide adoption and familiarity • Developers and even advanced business users – An abundance of tools – Etc… • It became the De-Facto standard
  • 6. The Relational Model • Data – Stored in • 2 dimensional tables (Relations) • Rows (tuples) and columns (attributes) • Has well-define enforced schema – Relations themselves – Integrity constrains • Normalization – Smaller tables with well-defined relationship between them – Why? • Minimized redundancy • No modification anomalies – Modification Propagation or cascading
  • 7. The Relational Model • Supported by SQL (Structured Query Language) – A somewhat standardized query language – Very flexible – Many Operations • Across multiple relations such as JOIN • Aggregations such as GROUP BY • Etc…
  • 8. The Relational Model • Transactional • ACID – Atomicity » All or nothing – Consistency » From one valid state to another – Isolation » Concurrency result in a valid state – Durability » Once committed, it’s forever
  • 9. The Relational Model • Designed with the assumptions that – The end-user will directly interact with database » It makes sense that the RDBMS should manage concurrency and integrity » Access Patterns are unknown » A flexible query language that is close to English » Data structure with no bias towards a particular pattern of querying – The database runs on a single machine » The only way to promise true ACID
  • 10. Road Bumps • We started building more complex applications on top of relational databases – Business logic moved out of the RDBMS » Fewer triggers and stored procedures and replaced by equivalent application layer code – The applications themselves evolved beyond the procedural paradigm to a more OOP approach » The Object-Relational impedance mismatch » ORM framework to the rescue
  • 12. We became data hoarders! • As our datasets grew out of control • Performance decreases exponentially – We buy a beefier machines • Larry Ellison’s most expensive RAC and make him even richer • This put off the problem for a little while
  • 13. Optimization • We hire a guy – Indexes half of the databases • Made those queries a little faster – Creates materialized views for complex joins • Nightmare to maintain, get stale, etc… – He de-normalizes • Any thing but a smooth transition! • Redundancy – He introduces Caching • Data too stale • More redundancy
  • 14. Clustering • We hire another guy – Tells us that we hit the limit of the one machine – You need to scale out (Horizontally) • Master/Slave – Assuming you read more than you write – Write to the Master and Read from the Slaves – Master needs to replicate data across the slaves » Risk incorrect reads – How’s that consistent?!! • Sharding – Improves reads as much as writes – Can’t join across partitions – No referential integrity – Requires modification of client applications – Introduces a single-point of failure – How’s that consistent?!!
  • 15. What’s the Point? • We vertically scale our relational database – We’re no longer consistent – No ACIDity? – We loose query flexibility • Are we doing something wrong?
  • 17. The CAP Theorem • Eric Brewer on distributed systems – Pick tow out of • Consistency • Availability • Partition Tolerance • There is Fast Cheap Good service – Cheap Good service won’t be Fast – Fast Good service won’t be Cheap – Fast Cheap service won’t be Good
  • 18. Relational Model & CAP • Relational Data Stores happen to favor – Consistency and Availability – For historical reasons • They are key to certain type of applications • The bank example – I deposit $100 in my friend’s bank account – Blah blah blah… • According to CAP, Partition Tolerance is impossible meaning that horizontal scaling is impossible
  • 19. Scheiße! • We’re in a pickle – Too much data in CA model – Vertical Scaling • Too expensive • Not sustainable • Forced to explore other alternatives in light of CAP
  • 20. What AP Looks Like • Partition Tolerance – Since we reached the limit of the one machine we have no choice but to scale horizontally – Which means to be partition tolerant • Availability – Nobody is willing to give up most of the time – This becomes even better with distribution – In a cluster of servers • The individual node might be unreliable by itself • But a whole inherently reliable
  • 21. What AP Looks Like • According the CAP we simply cannot have C • Consistency – I make a update and all subsequent read the most updated value – Unfortunately this is impossible as it takes time for the change to be replicated across each node of the cluster • What a bummer?! • Let’s look and AP system – DNS (Domain Naming Service) • Not all the nodes have the most updated records (You register that domain name and wait for a few days to guarantee that every DNS knows about it)
  • 22. Eventual Consistency • This is no so bad – It means that we just settled for a lesser degree Consistency • So what if – Mohammad in Morocco updated his relationship status to single on an some edge node – His cousin who lives Spain saw it immediately because they happen to be on the same edge node – His secret admirer Sara who lives in the United States could not see it until an hour later – His bother in Japan got the update the next day – They all got it eventually! • Eventual Consistency as Opposed to Immediate Consistency
  • 23. The Compromise • We settle for weaker consistency model – BASE • Basically Available • Soft state • Eventual Consistency • ACID on the individual node BASE on the cluster
  • 24. The Slippery Slope of the Faithless
  • 25. You might as well Question… • Schema – Logical • Well-defined and rigid in relational databases • Why not a flexible one or even no schema – Physical • B Trees in most relational databases • Why not use some other underlying data structure
  • 26. You might as well Question… • Integrity Constraints – Who cares? • A Query Language – Anything would do… • Security – None • Name it…
  • 28. NoSQL • A wide range of specialized data stores with the goal of addressing the challenges of the relational model • Eric Evans – The whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for • Let me make it easier – It is does not anti-SQL or anti-Relational – Any data store that is non-relational • “Not Only SQL” instead of “NO SQL”
  • 29. SQL vs. NoSQL A single machine A cluster CA AP/CA/CP Scale Vertically Scale Horizontally SQL Custom APIs ACID BASE Full Indexes Mostly on Keys There are outliers of course
  • 30. SQL vs. NoSQL Rigid Schema Schema-less Flexible Queries Pre-defined Queries • SQL (Relational) – Concerned about what the data consists of • NoSQL (Non-Relational) – Concerned with how the data is queried There are outliers of course
  • 31.
  • 33. Key-Value Data Stores • Basically a big hash map associative array – Very Simple – Very fast read and write – No secondary indexes • Use When – Your data is not highly related – All you need is basic CRUD • Challenges – Complex queries • Check out the Amazon Dynamo Paper • http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo- sosp2007.pdf • Featured Projects – DynamoDB http://hbase.apache.org/ – Riak http://wiki.basho.com/ – Redis http://redis.io/
  • 34. Columnar Stores • In a table, data of the same column is stored together – Storage is not wasted on null value as in row-based stores (RDBMS) – Great for sparse tables – Very fast column operation including aggregation • Use When – Big Data (Excellent leverage of Map Reduce) – Need compression or versioning • Challenges – You better know your access patterns before hand – Keys design is not trivial • Check out Google’s BigTable Paper – http://static.googleusercontent.com/external_content/untrusted_dlcp/research.go ogle.com/en/us/archive/bigtable-osdi06.pdf • Featured Projects – Hbase http://hbase.apache.org/ – Cassanda http://cassandra.apache.org/
  • 35. Document Data Stores • Nested structures of hashes and their values – A document can be • Simply a hash and its value • Hash and another document as its value • No limit in depth – Very Flexible schema – Well-Indexed data – Works well with OOP (No impedance mismatch) – De-normalize as a best practice • Use when – You don’t know much about the schema – The schema very likely to change • Challenges – Complex Join-like queries – Self-referencing documents and circular dependencies • Projects – MongoDB http://www.mongodb.org/ – CouchDB http://couchdb.apache.org/
  • 36. Graph Data Stores • A graph – Perfect for highly interconnected data – Allows for explicit relationships – Fined graph grained-traversal – Very Flexible – Works well with OOP (No impedance mismatch) • Use when – Your data looks like a graph and requires graph question – You are smart enough not to try this on another data store • Challenges – Doesn’t scale-well horizontally • Featured Projects – Neo4j http://neo4j.org/
  • 37. Relational Data Stores • Use when – Your data Highly relational – There is a need to break data into small pieces and assemble it in different ways – When consistence is king – Access patterns are unknown – Reporting • Challenges – Doesn’t scale-well horizontally • Featured Projects – Oracle http://www.oracle.com/index.html – Postgres http://www.postgresql.org/ – Ms SQL Server http://dev.mysql.com/ – MySQL http://www.mysql.com/
  • 38. How do you choose?
  • 39. If It Doesn’t Fit, You Must Acquit! • Data – Does it have a natural structure? – How it is connected to each other? – How is it distributed? – How much? • Access Patterns – Reads/Writes ratio? – Uniform or random? • CAP
  • 40. Other Considerations • Maturity • Stability • Maintainability • Durability • Cost • Tools • Familiarity
  • 42. For Fairness’ Sake! • Relational data stores did not fail us – They actually perform very well • We failed ourselves – By using them as solutions for problems they weren’t designed to solve to begin with • Take any data store and you’ll get as much trouble
  • 43. For Fairness’ Sake! • You can’t expect – A flathead screwdriver to work on a Philips as well as one with the matching Philips blade – A crosshead screwdriver to work on flathead screw
  • 45. Polyglot Persistence • Enterprise application are complex and combine complex problems – Assumption that we should use one data store is absurd – You can’t try to fit all in one model and expect no problem • Polyglot Persistence – To leverage multiple data storages, based on the way data is used by the application • Associated with a learning curve • Long term investment (More productive in the long-run) – Leverage the strength of multiple data stores
  • 46. Polyglot Persistence • Example – MongoDB for the product catalog – Redis for shopping cart – DynamoDB for social profile info – Neo4j for the social graph – HBase for inbox and public feed messages – MySQL for payment and account info – Cassandra for audit and activity log • Disclaimer: I’m not making any recommendation here.
  • 47. NoSQL in the Cloud
  • 48. NoSQL in the Cloud • NoSQL as a commodity – Fully managed data stores (No maintenance) – Elastic scaling – Cheap storage • Featured: – Amazon AWS – Heroku Add-ons – CloudFoundry
  • 50. The A’s the Q’s in the Abstract • What does the rise of all these NoSQL mean to my enterprise? – I’m guessing a lot • What is NoSQL to begin with? – Any non-relational data store • Does it mean “NO SQL”? – No • Could this be just another fad? – I don’t think so
  • 51. The A’s the Q’s in the Abstract • Is a good idea to be the future of my enterprise on these new exotic technologies and simply abandon proven mature RDBMS? – It’s up to you. I will say “No guts, no glory!” • How scalable is scalable? – However much you need it to be
  • 52. The A’s the Q’s in the Abstract • Assuming that I am sold, how do I choose the one that fits my needs the best? – I’ll tell you if you hire me • Is there a middle ground somewhere? – Polyglot Persistence • What is this Polyglot Persistence I hear about? – It’s the middle ground