In a world where new database technology is being created with every iteration cycle, it is important to understand whether you are using the right tool for the job. When it comes to database technologies, each one has it own benefits and downfalls. In this presentation, we discuss a variety of database platforms and as-a-service offerings and help break down which database solution might be the best to accomplish your goals.
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
What Kind of Relationship Are You Seeking With Your Database?
1. January 20, 2015
Sean Anderson
Manager, Data Services
@seanandersonBD
Making choices:
What kind of relationship are you seeking
with your database?
2. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
What are we going to talk about today?
•Databases are complicated tools
•There are numerous choices
– How did we get here?
•Understanding some of our choices
– SQL: Relational
– MongoDB: Documents
– Redis: Key-value
– Hadoop: Large distributed files
•How should I think about managing them?
2
3. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Common advice these days from smart people
3
8. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Traditional apps
(CRM, HR, Finance apps)
Modern apps
(mobile, social, media, games)
Custom-built
for the app
Programmable
by the app
Infrastructure
Mostly resides on premise Mostly resides on cloudData
Trend
App Development is Changing
8
9. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Traditional apps
(CRM, HR, Finance apps)
Modern apps
(mobile, social, media, games)
Systems of
Record
Highly structured
Slow to change
Transactional
Stable
Core to the business
Not very social
Systems of
Engagement
Loosely structured
Quick to adapt
Conversational
Dynamic and in flux
Edge of the business
Fundamentally social
Characteristics
of the system
Mostly resides on premise Mostly resides on cloudData
Trend
Applications are becoming systems of engagement
9
10. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
MEDIA GAMING M2M MOBILE SOCIAL
SOME UNIQUE SCENARIOS
Cloud scale and fast growth
High speed data retrieval needs
Frequently written, rarely read
Binary files
Short term data
Multi-location access
Zero downtime needs
Dynamic or object oriented models
Trying to avoid RAID / storage limits
Large files
We are building different kinds of applications
11. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Source: “15 Years of Hard Drive History: Capacities outran performance” (November 27, 2006)
http://www.tomshardware.com/reviews/15-years-of-hard-drive-history,1368-6.html
In the 15 year period before 2006, storage density increased 10,000x,
but performance only increased about 100x
11
12. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
As a result, a revolution ensued in the world of Data Services
Polyglot persistence is here to stay: there are about 150+ choices just in the “NoSQL” subset
12
13. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Two key issues
How do you ensure
best fit for your app?
What is the long term
view of your relationship
with your database?
13
15. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Relational Documents Key-value
Distributed large
sets
Understand the personality of your database
Let’s use these examples
Data
Integrity
SQL
Flexible
Schema
Scale
Fast
Retrieval
Data
structures
Distributed
Processing
Big Data
(MongoDB) (Redis) (Hadoop)(SQL)
15
16. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Relational databases (SQL)
They literally saved the world from running on paper
Strengths
• Data integrity through data types and semantic rules
• AGE >= 0
• Person must have a NAME
• Querying
• Aggregation
• SQL
“Weaknesses”
• Complex development as developer needs to
map relational model with object oriented code
• Complexity grows exponentially as relational
model grows
• Difficult to scale
• Expensive (hardware, software)
If your operation depends on the integrity
of your business rules, the relational
model rules.
Scaling is a little difficult and
performance is key.
Relational Documents Key-value
Distributed
large sets
17. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Allow new data without a defined schema
• Designed for scale
• Faster, agile development
• Databases in the cloud!
The complexities of relational databases led to NoSQL
17
19. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
•Leading NoSQL database
•Open Source
•Agility and flexibility (no set schema)
•Better fit to modern development methodologies
•New types of records (fields) are added easily
•Imagine it like a folder you add pages to
MongoDB has emerged as a leader in Document databases
Relational Documents Key-value
Distributed
large sets
19
20. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
db.friends.insert (
{
name: “J.R.”,
email: “email@rackspace.com”,
twitter_handle: “jrarredondo”,
teams: [ “Mariners”, “Rangers” ],
group: 1
}
)
db.friends.ensureIndex( { group: 1} )
var myCursor = db.friends.find( { group: { $gt: 0 } } )
• Document databases and collections
• Indexes
• Rich query language
• Replication (transparent to the app)
– Writes to primary ensure consistency
– Configurable reads to secondaries to help performance
– Eventual consistency on secondary reads
– Election on failures of primary nodes
– Configurable write concerns for flexible write guarantees
depending on app needs
• Shards for horizontal scaling
– Shard Key used to partition data based on ranges or hashes
– Partition strategy depends on how evenly you want data
distributed, and the nature of your queries (single vs. ranges)
MongoDB
Relational Documents Key-value
Distributed
large sets
20
21. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Flexibility of data model (and its problems) with document databases
Appboy: App marketing automation platform for mobile apps
Courtesy of Jon Hyman, CIO and Co-Founder of Appboy 21
23. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
MySQL and MongoDB together
• Heavily used during weekends and at
night
• Complex SQL queries
• “What are my friends drinking?”
• “Where can I find this beer?”
A social discovery and sharing
network for beer drinkers
24. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
MySQL and MongoDB together
What works best for the workflow?
- MySQL worked best for
reference data for us
- Not everything moved to
MongoDB
What stayed in MySQL?
Check-ins
Users
Relationships Data
Primary Datastore
What moved to MongoDB?
Activity Feed (Friend’s Graph)
Recommendation Data
Location-based Check-ins
25. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Relational Documents Key-value
Distributed
large sets
• Think about it as a single huge hash table
• Simple concepts
– GET / SET / DELETE <data> based on some <key>
• High performance, in memory
• Persistence
– Point-in-time Snapshots
– Append only / Journal
• Partitioning
– Redis Cluster (future)
– Proxy-based solutions such as Twemproxy
Key-value stores: Redis
Key Value
<key> <value>
<key> <value>
<key> <value>
<key> <value>
25
26. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Volatile keys: automatic expiration of keys
– SET <key> <value> EX <seconds>
– SETEX <key> <seconds> <value>
• Data structures
– LISTS, SETS / SORTED SETS, HASHES
• Publish / Subscribe
– SUBSCRIBE <channel>
– PUBLISH <channel> <message>
• Transactions (*)
– MULTI
• Commands to be executed as a single, atomic isolated operation
– EXEC / DISCARD
– (*) Warning: VERY different behaviors than in SQL
• Eviction policies
– Useful to implement Least Recently Used caches
Key-value stores: Redis
http://robots.thoughtbot.com/redis-pub-sub-how-does-it-work
Relational Documents Key-value
Distributed
large sets
26
27. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Cache
Making another application better
Data
Structures
(Example: Leaderboards!)
LISTS
SETS
SORTED SETS
HASHES
Relational Documents Key-value
Distributed
large sets
Redis Scenarios
27
28. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Full text search based on Apache Lucene
• Will run alongside of MongoDB, Hadoop, MySQL, and many other databases
• Allows for quick full text search of your data set
• Highly-Available by default
• Optimized Hardware
Elastic Search
28
29. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• MongoDB Users
• Hadoop Users
• JSON formatted Databases
• Users with Large Data Sets
Any current Objectrocket MongoDB or Rackspace Cloud Big Data customer will be able to connect
Elastic Search through simple/documented tool.
29
Who might want to use Elastic Search?
www.rackspace.com
Klout XING GitHub
If you’re one of the many household names using
Klout to create campaigns that target social
influencers, you’re using Elasticsearch.
Elasticsearch made it possible for Klout to
provide their forthcoming self-service option to
their customers, which Klout predicts will allow
them to at least double their current revenues
XING is the leading business social network in
Europe, with half its users located in Germany
and the other half throughout the rest of Europe,
Asia and Australia. XING has called their
relationship with Elasticsearch a strategic
partnership, far beyond a simple customer and
service provider relationship. We’ve forged these
deep ties with our customer by enabling XING to
keep their users’ updates flowing in real-time.
Elasticsearch empowers GitHub’s 4 million
‘social coders’ through providing search across
GitHub’s 8 million + code repositories.
The GitHub team also makes use of
Elasticsearch to monitor for abuse using some
fairly clever logging hacks.
30. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Relational Documents Key-value
Distributed
large sets
Volume Velocity
Variety Complexity
“Big Data”: generating insights with Hadoop
V3CMining social data for sentiment
Analyzing web clickstreams
Analyzing log data for security breaches
Telemetry from sensors and machines
eCommerce predictive analytics
30
31. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Data
Services
Core
Services
Fundamentals of Hadoop v1
HDFS
Distributed File System
HBase
Distributed,
scalable, non
relational
database
HCatalog
Metadata and table management system
Pig
Data flow
scripting
language
Hive
DW analysis layer
through HiveQL
(SQL-like) queries
MapReduce
Data processing framework
Operational
Services
Ambari
Installation, monitoring, administration
Oozie
Workflow and job
scheduling
Zookeeper
Configuration, sync
and naming registry
Falcon
Data pipeline
framework
Knox
Auth and access
Flume
Log data
aggregation and
movement
Sqoop
Bulk data transfer
from and to
relational DB
Relational Documents Key-value
Distributed
large sets
31
32. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
MapReduce
32
…
Large, distributed
files
Algorithm
MAP
REDUCE
MAP MAP MAP MAP MAP
It’s more efficient to send
the algorithm to the data,
than moving data to the
algorithm
REDUCE
Partial answers
Answer
Simple example: how many times does
each word appear in all files?
mapper (filename, file-contents):
for each word in file-contents:
emit (word, 1)
reducer (word, values):
sum = 0
for each value in values:
sum = sum + value
emit (word, sum)
35. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Really understand the personality of your database
First impressions can be deceiving
“Redis is ‘just a cache’”
• SET
• GET
Redis is a server for data structures
• Strings
• Hashes
• Lists
• Sets / Sorted Sets
• Publish / Subscribe
Huge difference!
35
36. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Focus on the tradeoffs
SQL NoSQL
Data integrity
Business rules
Consistency
Transaction isolation
Atomicity
and
Rigidity
Flexibility of schema
Dynamic data models
Horizontal scale
Easier to get started
and
Inconsistency of data
36
37. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Relational Documents Key-value
Distributed large
sets
Understand the personality of your database
Let’s use these examples
(MongoDB) (Redis) (Hadoop)(SQL)
Customer contact
Reference data
Order Details
(Ship To, Bill To
SKU, Quantity, Price)
Billing transactions
Inventory
Prices
Member Info (user, pwd)
Customer relationships
Notes / Social
Partitions (shards)
Promotional materials
Dynamic schemas
Statements
Product Catalog, Images
Product Configuration
Personalized catalog
Member Comments
Product Reviews
Product Q&As
Session info
Cart
Recent orders
Home page info
Latest comments
Recommendations
Product “stars”
Upsell/Cross sell
Customer attributes
(non personally
identifiable information,
geo)
Sales history
Churn info
Price history
Social info
Comments “NPS”
Recommendations
All kinds of analysis
37
38. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
It’s good to understand the fundamental “theory”
What does your problem really need?
ACID
• Atomicity: A transactions either happens
completely, or not at all
– No partial transactions
• Consistency: Transactions end in a “valid” state
– No violation of rules
• Isolation: Transaction appears as if it is the only
thing happening to the database
– Relaxed most times
– Deals with phantom, dirty reads or non repeatable reads
• Durability: Committed transactions are permanent
– Even after failure
BASE
• Basically available:
– Supporting partial failures without complete system
failure
– Design as if users would end up in different partitions
• Soft state:
– Things can be in flux for a little bit of time
• Eventual consistency:
– Things right themselves
http://queue.acm.org/detail.cfm?id=1394128
New ways of thinking:
Do customers really need to know the level of
inventory of a product to place an order? Maybe
all they want is to know that it is not zero
38
39. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Know your CAP, really
Consistency, Availability and Partition Tolerance
You can only have 2 out of 3 in CAP!
• Partitions are not generally common
• Choosing Consistency or Availability is not final
• “It depends”
– Maybe on user
– Maybe on system
– Maybe on type of data
• Just think:
– How am I going to detect a problem in the network? (P)
– How am I going to limit operations once I detect that?
– How am I going to compensate to recover?
Wait! It’s not that simple
Hurst 2010 (http://blog.nahurst.com/visual-guide-to-nosql-systems)
Eric Brewer 2012 (http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed)
39
40. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Stability
• Fit for core scenarios
• Configurability to different scenarios
• Integration with development languages
• Integration with other databases
• SQL compatibility
• End user vs. Developer skillset
• Conceptual changes
• Platform availability
• Data type and semantic needs
• Security
The “ilities” and their cousins
These are some of the challenges indirectly related to data that we must deal with
• Performance
• Scalability
• Consistency
• Resiliency
• Data model
• Flexibility
• Cost
• Training
• Tools availability
• Development experience
40
41. Our vision is Data as a Service
From databases to data as a service
42. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Two key issues
How do you ensure
best fit for your app?
What is the long term
view of your relationship
with your database?
42
43. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Data-as-a-Service: more time building,
less time managing databases
43
Four levels of DaaS transparency
Source: “Choosing The Right Cloud Provider” (December 5, 2013)
http://www.rackspace.com/blog/choosing-the-right-cloud-provider-for-your-mongodb-database/
• For some businesses, database or infrastructure
management IS core of the business
• For most software-based businesses, database or
infrastructure management represents time and
resources not spent building the application
• You must answer for yourself: are you in the
business of managing infrastructure, or in the
business of [your market here]?
More time
spent
building
the app
44. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
From Database-as-a-Service to Data-as-a-Service
Focus on building your app, not managing databases
Manage hardware infrastructure
Manage software infrastructure
(i.e. databases)
Build your application
(i.e. game, startup, mobile app, site)
YOU WANT TO BE
FOCUSED HERE
This is the only job that YOU MUST
DO without anybody’s help because
this is your intellectual property
YOU DON’T WANT TO HAVE
TO MANAGE DATABASES
OR SERVERS
It only takes away from time
building your application
Highest value activity for your application
44
45. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Data
as a service
The next vision for databases: Data-as-a-Service
Applications just access the data as a service, while the database is transparent
The app just
interacts with
THE DATA
The application does not see the
infrastructure
Towards transparent databases
hostname, port number
Build your application
(i.e. game, startup, mobile app, site)
YOU WANT TO BE
FOCUSED HERE
This is the only job that YOU MUST
DO without anybody’s help because
this is your intellectual property
Highest value activity for your application
45
46. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Public Cloud
Managed
Cloud
Your Private
Cloud on
prem
Private
Cloud
Data has mass and gravity: you need choices for your hybrid app
(Or: “Divorces are expensive”)
46
47. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Rackspace Offerings for the Data Tier
Infrastructure
For Data
Managed
Offerings of Most
Popular
Big Data, SQL, &
NoSQL Databases
Managed
Database Services
for Production
Apps
Cloud IaaS
Get started fast
Dedicated Hosting
Predictable costs &
performance
OnMetal
Cloud Elasticity & Dedicated
Performance
• Automatic DBA: Sharding, Backup, & HA
• Entire Stack Optimized on Bare Metal
• Supported 24x7x365 by experts
• More than MongoDB
• Architecture & Design
• Tuning & Monitoring
• 24 x 7 x 365 Support
• Cost Effective
DBA Services
47