2. • Humongous: Slang. Extraordinary large;
expressive coinage, perhaps reflecting huge
and monstrous, with stress pattern of
tremendous
• Open source NoSQL database
• Written in C++
• https://github.com/mongodb/mongo
11. Collection
• Flexible: no fixed structure
• ALTER TABLE (implicit)
• Created in the first insertion (same for
dbs)
• Capped collection: maintain insert order,
fixed size
12. Document
• JSON document
• _id (ObjectId)
• unique for the collection
• it can be a document itself
• Fields: numeric, string, date
• Arrays and subdocuments
18. Hands on:
let’s get started
• Run a mongod (--fork) instance
• Run a mongo shell (mongo) that connects
to this instance
19. The mongo shell:
basics
• show dbs
• use db_name
• show collections (current db)
• show users (current db)
20. Insertion
Suppose a collection of GUL courses.
db.courses.insert ({
name : 'Full Metal Mongo',
date : new Date(),
presenter: 'isra',
attendants : [
{name: 'ana', age: 23},
{name: 'luis', age: 32}
]
}
21. Querying
//Full Metal Mongo course
db.gul.find({name:'Full Metal Mongo'})
//Courses attended by ana
db.gul.find({attendants.name:'ana'})
//Course names given by isra
db.gul.find({presenter:'isra'}, {name:1})
22. Querying II
//Courses ordered by name
db.gul.find().sort({name:1});
//The first 5 courses
db.gul.find().limit(5);
//Next five courses
db.gul.find().skip(5).limit(5);
//First course (natural order)
db.gul.findOne()
23. Querying III
//Courses attended by any under-age
db.gul.find({attendants.age:{$lt:18}});
//Last year courses between Monday and Thursday
db.gul.find({date:{
$gt:new Date(2012,03,08),
$lt:new Date(2012,03,11)}
});
24. Querying IV
//Courses attended by pedro or ana
db.gul.find({'attendants.name':
{$in:['pedro', 'ana']}
});
//Courses attended by 10 people
db.gul.find({attendants:
{$size:10}
});
25. $ operators
• $in / $nin • $exists
• $all (default is any) • $regex
• $gt(e) / $lt(e) • $natural (order)
• $ne • $toLower / $toUpper
• $elemMatch
(conditions in the
same subdoc)
27. Update
//updates if exits; inserts if new
db.gul.save(x)
//update speakers in the crafty course
db.gul.update(
{name:'Crafty'},
{$set:{presenter:['javi','isra']}}
);
//new attendant to a course (not multi)
db.gul.update(
{name:'mongoDB'},
{attendants:
{$push:{name:'pepe', age:19}}
}
);
30. Database references:
direct linking
//Query
isra = db.gul_members.findOne()
//Response from the query
{_id: ObjectId('ad234fea23482348'),
name:'isra', age:31, languages:'js'}
//Find by id
db.gul.find({'attendants._id':isra._id})
32. Import
example data
• Download a short courses collection from
• http://www.it.uc3m.es/igrojas/mongo/
initDB.json
//Import dataset in JSON
mongoimport --db gul --collection courses initDB.json
33. Hands on:
querying
• Add a new course with data similar to the
existing
• Update your course to add attendants
• Query courses with speaker “Jesús Espino”
• Query course on Friday
• Query courses tagged as “android”
35. Aggregation I
//Number of courses
db.gul.count();
//Number of courses given by isra
db.gul.count({presenter:'isra'});
//Distinct attendants to all courses
db.gul.distinct('attendants.name');
37. Hands on:
aggregation
• Distinct course speakers
• Distinct tags and count
• Number of courses per weekday
38. Map/Reduce
• Batch processing of data and aggregation
operations
• Where GROUP BY was used in SQL
• Input from a collection and output going
to a collection
39. Map/reduce (II)
• Courses attended per individual
var map = function(){
for(var i in this.attendants){
emit(this.attendants[i].name,1);
}
}
40. Map/reduce (III)
• Courses attended per individual
var reduce = function(key, values){
var sum=0;
for (var i in values){
sum+=values[i];
}
return sum;
}
49. Indexes
• Objective: Query optimization
• Used in the query itself and/or the ordering
• B-Tree indexes
• _id index is automatic (unique)
db.gul.ensureIndex({ name:1 })
db.gul.getIndexes()
db.gul.stats() //Size of the index
50. Indexes (II)
• For arrays, the index is multikey (one
index entry per array element)
• Field names are not in indexes
//Compound indexes
db.gul.ensureIndex({ name:1, age:1})
//For nested fields (subdocs)
db.gul.ensureIndex({ attendants.name:1 })
52. Indexes options
• dropDups: drop duplicate keys when
creating the index (converted in unique)
• background: created in the background on
primary of the replica set, in the foreground
on secondaries
53. More about Indexes
• Covered index
• query covered completely by the index
• Selectivity of an index
• Explain
db.gul.find().explain()
• Hints
db.gul.find().hint({name:1})
54. Geospatial indexes
• 2d-only
• compound indexes may be used
db.places.ensureIndex({'loc':'2d'})
db.places.find({loc:{
$near:[20,40],
$maxDistance:2}
}).limit(50)
55. Creating indexes:
examples
• Optimize our courses database
• Think of common queries
• Implement the convenient indexes
62. Commands for dbas
• mongotop
• time of activity per collection
• info about total, read, write, etc.
• mongostat (command line)
• every x seconds
• info about insert, update, delete,
getmore, command, flushes, mapped,
vsize, res, faults, etc.
68. What is a replica set?
• Info replicated among several nodes
• 1 primary
• n secondaries (min 3, to get a majority)
• When a node falls, there’s election and a
majority is needed to select a new primary
69. Types of nodes in a
replica set
• Regular
• Arbiter: decides the primary in a election
• Delayed: cannot be elected primary
• Hidden: used for analytics (not primary)
71. Write concern
• Journal: list of operations (inserts, updates)
done, saved in disk (permanent)
• getLastError (managed by the driver)
• w: wait until write is saved in memory
(the app receives ack) Used to detect
errors, like violation of a unique.
• j: wait until write is saved in the journal
72. Oplog and write
concern
• oplog.rs: capped collection with the
operations made in the replica set, stored
in natural order
• write concern
• w: n, means wait response of n nodes in a
replica set
• w: ‘majority’, wait for the majority of the
nodes
74. What is sharding?
• Scalability
• Horizontal partitioning of a database
• A BSON document stored in ONE shard
• Shard key
• Not unique
• No unique fields in the collection
• Mongo offers auto-sharding
75. What is sharding?
• Auto balancing
• Easy addition of new machines
• Up to 1k nodes
• No single point of failure
• Automatic failover
• Select a convenient shard key
76. Sharding config
• Need of config servers
• store metadata about chunks
• mongod --configsvr
• Need mongod “routers”
• mongos (accessed by the apps)
77. Sharding operations
• chunk: range of the sharding key being in a
shard
• operations
• split: dividing a chunk to balance the size
of the chunks
• migrate: moving a chunk from a shard to
another