The document discusses the history and concepts of noSQL databases. It begins by discussing the hype around noSQL and then provides a brief history of database models from the 1960s to today. It discusses key concepts like CAP theorem, BASE, eventual consistency, and polyglot persistence. The document also discusses common anti-patterns when using relational databases for certain tasks and proposes noSQL alternatives. Overall, the document provides an overview of noSQL databases while discussing both benefits and tradeoffs compared to relational databases.
4. modelos
• Hierarchical (IMS): late 1960’s and 1970’s
• Directed graph (CODASYL): 1970’s
• Relational: 1970’s and early 1980’s
• Entity-Relationship: 1970’s
• Extended Relational: 1980’s
• Semantic: late 1970’s and 1980’s
• Object-oriented: late 1980’s and early 1990’s
• Object-relational: late 1980’s and early 1990’s
• Semi-structured (XML): late 1990’s to late 2000’s
• The next big thing: ???
ref: What Goes Around Comes Around por Michael Stonebraker e Joey Hellerstein
quarta-feira, 8 de setembro de 2010
15. Anti Patterns
• Evolution from SQL Anti Patterns (NoSQL:br May 2010)
• More than just RDBMS
• Large volumes of data
• Distribution
• Architecture
• Research on other tools
• Message Queues, DHT, Job Schedulers, NoSQL
• Indexing, Map/Reduce
quarta-feira, 8 de setembro de 2010
16. RDBMS Anti Patterns
Not all things fit on a relational database, single ou distributed
• The eternal table-as-a-tree
• Dynamic table creation
• Table as cache
• Table as queue
• Table as log file
• Stoned Procedures
• Row Alignment
• Extreme JOINs
• Your scheme must be printed in an A3 sheet.
• Your ORM issue full queries for Dataset iterations
quarta-feira, 8 de setembro de 2010
17. Doing it wrong, Junior !
quarta-feira, 8 de setembro de 2010
18. The eternal tree
Problem: Most threaded discussion example uses something
like a table which contains all threads and answers, relating to
each other by an id. Usually the developer will come up with his
own binary-tree version to manage this mess.
id - parent_id -author - text
1 - 0 - gleicon - hello world
2 - 1 - elvis - shout !
Alternative: Document storage:
{ thread_id:1, title: 'the meeting', author: 'gleicon', replies:[
{
'author': elvis, text:'shout', replies:[{...}]
}
]
}
quarta-feira, 8 de setembro de 2010
19. Dynamic table creation
Problem: To avoid huge tables, one must come with a "dynamic
schema". For example, lets think about a document
management company, which is adding new facilities over the
country. For each storage facility, a new table is created:
item_id - row - column - stuff
1 - 10 - 20 - cat food
2 - 12 - 32 - trout
Now you have to come up with "dynamic queries", which will
probably query a "central storage" table and issue a huge join to
check if you have enough cat food over the country.
Alternatives:
- Document storage, modeling a facility as a document
- Key/Value, modeling each facility as a SET
quarta-feira, 8 de setembro de 2010
20. Table as cache
Problem: Complex queries demand that a result be stored in a
separated table, so it can be queried quickly. Worst than views
Alternatives:
- Really ?
- Memcached
- Redis + AOF + EXPIRE
- De-normalization
quarta-feira, 8 de setembro de 2010
21. Table as queue
Problem: A table which holds messages to be completed.
Worse, they must be ordered by
time of creation.
Corolary: Job Scheduler table
Alternatives:
- RestMQ, Resque
- Any other message broker
- Redis (LISTS - LPUSH + RPOP)
- Use the right tool
quarta-feira, 8 de setembro de 2010
22. Table as log file
Problem: A table in which data gets written as a log file. From
time to time it needs to be purged. Truncating this table once a
day usually is the first task assigned to new DBAs.
Alternative:
- MongoDB capped collection
- Redis, and RRD pattern
- RIAK
quarta-feira, 8 de setembro de 2010
23. Stoned procedures
Problem: Stored procedures hold most of your applications
logic. Also, some triggers are used to - well - trigger important
data events.
SP and triggers has the magic property of vanishing of our
memories and being impossible to keep versioned.
Alternative:
- Now be careful so you dont use map/reduce as modern
stoned procedures. Unfit for real time search/processing
- Use your preferred language for business stuff, and let event
handling to pub/sub or message queues.
quarta-feira, 8 de setembro de 2010
24. Row Alignment
Problem: Extra rows are created but not used, just in case.
Usually they are named as a1, a2, a3, a4 and called padding.
There's good will behind that, specially when version 1 of the
software needed an extra column in a 150M lines database and
it took 2 days to run an ALTER TABLE. But that's no excuse.
Alternative:
- Quit being cheap. Quit feeling 'hacker' about padding
- Document based databases as MongoDB and CouchDB, has
no schema. New atributes are local to the document and can be
added easily.
quarta-feira, 8 de setembro de 2010
25. Extreme JOINs
Problem: Business stuff modeled as tables. Table inheritance
(Product -> SubProduct_A). To find the complete data for a user
plan, one must issue gigantic queries with lots of JOINs.
Alternative:
- Document storage, as MongoDB
might help having important
information together.
- De-normalization
- Serialized objects
quarta-feira, 8 de setembro de 2010
26. Your scheme fits in an A3 sheet
Problem: Huge data schemes are difficult to manage. Extreme
specialization creates tables which converges to key/value
model. The normal form get priority over common sense.
Product_A Product_B
id - desc id - desc
Alternatives:
- De-normalization
- Another scheme ?
- Document store for flattening model
- Key/Value
- See 'Extreme JOINs'
quarta-feira, 8 de setembro de 2010
27. Your ORM ...
Problem: Your ORM issue full queries for dataset iterations,
your ORM maps and creates tables which mimics your classes,
even the inheritance, and the performance is bad because the
queries are huge, etc, etc
Alternative:
- Apart from denormalization and good old common sense,
ORMs are trying to bridge two things with distinct impedance.
- There is nothing to relational models which maps cleanly to
classes and objects. Not even the basic unit which is the
domain(set) of each column. Black Magic ?
quarta-feira, 8 de setembro de 2010
28. No silver bullet
- Think about data
handling and your
system architecture
- Think outside the norm
- De-normalize
- Simplify
- Know stuff (Message
queues, NoSQL, DHT)
quarta-feira, 8 de setembro de 2010
29. Cycle of changes - Product A
1.There was the database model
2.Then, the cache was needed. Performance was no good.
3.Cache key: query, value: resultset
4.High or inexistent expiration time [w00t]
(Now there's a turning point. Data didn't need to change often.
Denormalization was a given with cache)
5. The cache needs to be warmed or the app wont work.
6. Key/Value storage was a natural choice. No data on MySQL
anymore.
quarta-feira, 8 de setembro de 2010
30. Cycle of changes - Product B
1.Postgres DB storing crawler results.
2.There was a counter in each row, and updating this counter
caused contention errors.
3.Memcache for reads. Performance is better.
4.First MongoDB test, no more deadlocks from counter update.
5.Data model was simplified, the entire crawled doc was
stored.
quarta-feira, 8 de setembro de 2010
31. Stuff to think about
Think if the data you use aren't de-normalized somewhere
(cached)
Most of the anti-patterns signals that there are architectural
issues instead of only database issues.
The NoSQL route (or at least a partial NoSQL route) may
simplify it.
Are you dependent on cache ? Does your application fails when
there is no cache ? Does it just slows down ?
Think about the way to put and to get back your data from the
database (be it SQL or NoSQL).
quarta-feira, 8 de setembro de 2010
45. BASE
ref: BASE: an Acid Alternative por Dan Pritchett
quarta-feira, 8 de setembro de 2010
46. B asically
A vailable
S oft State
E eventually Consistent
quarta-feira, 8 de setembro de 2010
47. Eventually
Consistency
ref: Eventually Consistent por Werner Vogels
quarta-feira, 8 de setembro de 2010
48. eventual em inglês:
irá ocorrer em algum
momento
eventual em português:
pode ou não ocorrer
quarta-feira, 8 de setembro de 2010
49. Consitência
em Momento
Indeterminado
@mdediana
quarta-feira, 8 de setembro de 2010
50. consistência
W+R > N
quarta-feira, 8 de setembro de 2010
51. durabilidade
ref: The End of an Architectural Era por Michael Stonebraker & al.
quarta-feira, 8 de setembro de 2010
52. ainda tem...
★ latência
★ performance
★ particionamento
★ distribuição
★ replicação
quarta-feira, 8 de setembro de 2010
53. lembre-se
vc não está criando uma
solução de escala
intergaláctica com
tolerância a falhas aleatórias
entre datacenters
espalhados em diversas
localizações geográficas e
outras dimensões
quarta-feira, 8 de setembro de 2010
54. sacou a
importância
da arquitetura?
quarta-feira, 8 de setembro de 2010
55. com tantas definições...
com tantos conceitos...
com tantos tradeoffs...
com tantos....
quarta-feira, 8 de setembro de 2010
56. como o nosql se
tornou tão
sexy e popular?
quarta-feira, 8 de setembro de 2010