SlideShare a Scribd company logo
1 of 17
Anotomy of NoSQL Databases
Date: 11/10/13
Amit Kumar
2
Agenda
+Background
+What are NoSQL Databases
+Relational vs NoSQL Databases
+HBase
+Cassandra
+Design Strategies behind NoSQL Databases
3
Background
+Traditional Applications
Limited Data
Top priority on consistency
Focus on average latency
Ideally fit with RDBMS
Utilized the DB intrinsic features well
Good part of logic resided in DB
+Next Gen Applications
Web Scale (~infinite)
ALWAYS available
High performance in ALL cases
Data in the form of key/value pair
Logic part of Application Layer
4
RDBMS with Nextgen Apps – Failure
+Scale
Limit to maximum data supported
Sharding is an option, but then RDBMS features are lost
+Economy
Requires large arrays of fast, expensive disks
Very expensive
+Availability still an issue
5
NoSQL Databases
+Name is confusing
Not RDBMS at all
NoREL Databases a better name
+Key Value Store
+Extremely scalable
+High performance
+Always available
+Weak Consistency (CAP Theorem)
+Distributed
Use commodity hardware - Cheap
+Might not hold ACID properties
+Only for specific Use – Not everything is good
RDBMS vs NoSQL Databases
+Go for RDBMS when
Small instances of simple straight forward systems
Joins, secondary indexing, referential integrity, group by/order by
+Go for NoSQL when
Data scale
Read/write scale
Data model is
Flexible
Semi-structured
6
NoSQL Current Limitations
+Maturity
+Support
+Analytics & Business Intelligence
+Administration
+Ease of Use
7
Some famous NoSQL Databases
+Open-source
HBase
Cassandra
Voldemort
Dynomite
Hypertable
CouchDB
VPork
MongoDB
Riak
+Closed-source
BigTable
Dynamo
PNUTS
8
9
HBase
+Based on Google BigTable
+Sparse distributed persistent multi-dimensional sorted map
+On top of Hadoop HDFS
+Master Slave Model
Single Master (SPOF)
+Especially good when
Objects are huge
Data production/consumption is distributed and is tunneled through map/reduce
jobs
+Loose Data Model
Column Families
+Timestamp based versioning
+Not supported on Windows
+Major Users – Adobe, Twitter, Yahoo, Veoh, Streamy, Trend Micro
HBase Architecture & Table Structure
+Loosely based on Consistent Hashing
+Table made up of regions
Region specified by startkey and endkey
A region may live on a different node.
+Tables sorted by Rows
+Schema defines column families only
Each family consists of any no. of columns
Each column consists of any no. of versions
Columns within a family are sorted & stored together
+Everything except table name are byte[]
10
Connecting to Hbase
+Java Client API
HBaseConfiguration config = new HBaseConfiguration();
HTable table = new HTable(config, “table_name”);
Put p = new Put(Bytes.toBytes(“key”));
p.add(Bytes.toBytes(“key”), Bytes.toBytes(“column”), Bytes.toBytes(“value”));
table.put(p);
Get g = new Get(Bytes.toBytes(“key”));
Result r = table.get(g);
+HBase Shell
$ ${HBASE_HOME}/bin/hbase shell
hbase> describe “table_name”
hbase> put “table_name", “key”, “columnfamily:columnname", "value“
hbase> get “table_name”, “key”
hbase> scan “table_name”
+Thrift Gateway
+REST Gateway
+Many other non-java clients
11
Cassandra
+Based on Amazon Dynamo
+Open sourced by Facebook in 2008
+Peer to Peer Model
No Master Node
+Works on Windows as well
+Distributed Key/Value Store
+Configurable parameters for Consistency/Availability
+Especially suited if
Number of Objects is huge
objects are of small sizes (<1 MB)
+Major Users: Facebook, Digg, Twitter etc.
12
13
NoSQL Databases – Assumptions
+Data size is huge
System must partition its data across multiple nodes
+Reliable
Data must be safe even when disks and nodes fail
System must replicate data
+Performance
Needs to perform well on cheap hardware and maintain low latency ALWAYS
14
NoSQL Databases – Design Strategies
+Complex Distributed System
+Partitioning
Consistent Hashing
+Consistency
Eventual Consistency
Vector Clocks
+Data Models
Primary Key -> Value
Value can be semi-structured
Multi-version Storage
+Storage Layouts
Column storage with Locality groups
Log structured Merge Trees
+Cluster Management
Peer to Peer vs Master/Slave approach
Gossip
15
References
+Bigtable: A Distributed Storage System for Structured Data
http://labs.google.com/papers/bigtable-osdi06.pdf
+Dynamo: Amazon's Highly Available Key-value Store
http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
+NOSQL debrief, June 2009
http://static.last.fm/johan/nosql-20090611/intro_nosql.pdf
http://static.last.fm/johan/nosql-20090611/hbase_nosql.pdf
http://static.last.fm/johan/nosql-20090611/cassandra_nosql.ppt
+NoSQL Databases Official Site
http://nosql-database.org
+Hbase – Hadoop Wiki
http://wiki.apache.org/hadoop/Hbase
+Apache Cassandra Wikipedia
http://en.wikipedia.org/wiki/Apache_Cassandra
16
Questions + Answers
Thank You

More Related Content

What's hot

Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoHyunsik Choi
 
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedwhoschek
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017HashedIn Technologies
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseCloudera, Inc.
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용Byeongweon Moon
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache HadoopSufi Nawaz
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Redis memory optimization sripathi, CTO hashedin
Redis memory optimization   sripathi, CTO hashedinRedis memory optimization   sripathi, CTO hashedin
Redis memory optimization sripathi, CTO hashedinHashedIn Technologies
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
rhbase_tutorial
rhbase_tutorialrhbase_tutorial
rhbase_tutorialAaron Benz
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & ZingLong Dao
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 

What's hot (20)

Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
Hive hcatalog
Hive hcatalogHive hcatalog
Hive hcatalog
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
20080528dublinpt1
20080528dublinpt120080528dublinpt1
20080528dublinpt1
 
Redis memory optimization sripathi, CTO hashedin
Redis memory optimization   sripathi, CTO hashedinRedis memory optimization   sripathi, CTO hashedin
Redis memory optimization sripathi, CTO hashedin
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
rhbase_tutorial
rhbase_tutorialrhbase_tutorial
rhbase_tutorial
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 

Similar to Anatomy of NoSQL Databases

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...IndicThreads
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_finalasterix_smartplatf
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase Cynthia Saracco
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future HBaseCon
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & developmentShashwat Shriparv
 

Similar to Anatomy of NoSQL Databases (20)

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
Hbase
HbaseHbase
Hbase
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Hspark index conf
Hspark index confHspark index conf
Hspark index conf
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
Nextag talk
Nextag talkNextag talk
Nextag talk
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
01 hbase
01 hbase01 hbase
01 hbase
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Anatomy of NoSQL Databases

  • 1. Anotomy of NoSQL Databases Date: 11/10/13 Amit Kumar
  • 2. 2 Agenda +Background +What are NoSQL Databases +Relational vs NoSQL Databases +HBase +Cassandra +Design Strategies behind NoSQL Databases
  • 3. 3 Background +Traditional Applications Limited Data Top priority on consistency Focus on average latency Ideally fit with RDBMS Utilized the DB intrinsic features well Good part of logic resided in DB +Next Gen Applications Web Scale (~infinite) ALWAYS available High performance in ALL cases Data in the form of key/value pair Logic part of Application Layer
  • 4. 4 RDBMS with Nextgen Apps – Failure +Scale Limit to maximum data supported Sharding is an option, but then RDBMS features are lost +Economy Requires large arrays of fast, expensive disks Very expensive +Availability still an issue
  • 5. 5 NoSQL Databases +Name is confusing Not RDBMS at all NoREL Databases a better name +Key Value Store +Extremely scalable +High performance +Always available +Weak Consistency (CAP Theorem) +Distributed Use commodity hardware - Cheap +Might not hold ACID properties +Only for specific Use – Not everything is good
  • 6. RDBMS vs NoSQL Databases +Go for RDBMS when Small instances of simple straight forward systems Joins, secondary indexing, referential integrity, group by/order by +Go for NoSQL when Data scale Read/write scale Data model is Flexible Semi-structured 6
  • 7. NoSQL Current Limitations +Maturity +Support +Analytics & Business Intelligence +Administration +Ease of Use 7
  • 8. Some famous NoSQL Databases +Open-source HBase Cassandra Voldemort Dynomite Hypertable CouchDB VPork MongoDB Riak +Closed-source BigTable Dynamo PNUTS 8
  • 9. 9 HBase +Based on Google BigTable +Sparse distributed persistent multi-dimensional sorted map +On top of Hadoop HDFS +Master Slave Model Single Master (SPOF) +Especially good when Objects are huge Data production/consumption is distributed and is tunneled through map/reduce jobs +Loose Data Model Column Families +Timestamp based versioning +Not supported on Windows +Major Users – Adobe, Twitter, Yahoo, Veoh, Streamy, Trend Micro
  • 10. HBase Architecture & Table Structure +Loosely based on Consistent Hashing +Table made up of regions Region specified by startkey and endkey A region may live on a different node. +Tables sorted by Rows +Schema defines column families only Each family consists of any no. of columns Each column consists of any no. of versions Columns within a family are sorted & stored together +Everything except table name are byte[] 10
  • 11. Connecting to Hbase +Java Client API HBaseConfiguration config = new HBaseConfiguration(); HTable table = new HTable(config, “table_name”); Put p = new Put(Bytes.toBytes(“key”)); p.add(Bytes.toBytes(“key”), Bytes.toBytes(“column”), Bytes.toBytes(“value”)); table.put(p); Get g = new Get(Bytes.toBytes(“key”)); Result r = table.get(g); +HBase Shell $ ${HBASE_HOME}/bin/hbase shell hbase> describe “table_name” hbase> put “table_name", “key”, “columnfamily:columnname", "value“ hbase> get “table_name”, “key” hbase> scan “table_name” +Thrift Gateway +REST Gateway +Many other non-java clients 11
  • 12. Cassandra +Based on Amazon Dynamo +Open sourced by Facebook in 2008 +Peer to Peer Model No Master Node +Works on Windows as well +Distributed Key/Value Store +Configurable parameters for Consistency/Availability +Especially suited if Number of Objects is huge objects are of small sizes (<1 MB) +Major Users: Facebook, Digg, Twitter etc. 12
  • 13. 13 NoSQL Databases – Assumptions +Data size is huge System must partition its data across multiple nodes +Reliable Data must be safe even when disks and nodes fail System must replicate data +Performance Needs to perform well on cheap hardware and maintain low latency ALWAYS
  • 14. 14 NoSQL Databases – Design Strategies +Complex Distributed System +Partitioning Consistent Hashing +Consistency Eventual Consistency Vector Clocks +Data Models Primary Key -> Value Value can be semi-structured Multi-version Storage +Storage Layouts Column storage with Locality groups Log structured Merge Trees +Cluster Management Peer to Peer vs Master/Slave approach Gossip
  • 15. 15 References +Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable-osdi06.pdf +Dynamo: Amazon's Highly Available Key-value Store http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf +NOSQL debrief, June 2009 http://static.last.fm/johan/nosql-20090611/intro_nosql.pdf http://static.last.fm/johan/nosql-20090611/hbase_nosql.pdf http://static.last.fm/johan/nosql-20090611/cassandra_nosql.ppt +NoSQL Databases Official Site http://nosql-database.org +Hbase – Hadoop Wiki http://wiki.apache.org/hadoop/Hbase +Apache Cassandra Wikipedia http://en.wikipedia.org/wiki/Apache_Cassandra

Editor's Notes

  1. DB features like joins, db links, constraints, streams,
  2. 8