Mongodb(via MongoEngine) join query with aggregate
Since Mongodb 3.2 and MongoEngine 0.9, we can use $aggregate command to perform join queries on multiple collections in a database. This post would be a simple tutorial for join queries on Mongodb(via MongoEngine in Python) with examples.
Models Setup
Let’s consider models defined as below:
import random
import mongoengine
class User(mongoengine.Document):
meta = {"indexes": ['rnd']}
name = mongoengine.StringField()
rnd = mongoengine.FloatField(default=random.random)
class Group(mongoengine.Document):
meta = {"indexes": ['rnd']}
name = mongoengine.StringField()
rnd = mongoengine.FloatField(default=random.random)
class Relation(mongoengine.Document):
meta = {"indexes": ['user_id', 'group_id']}
user_id = mongoengine.ObjectIdField()
group_id = mongoengine.ObjectIdField()
Note:
- User and Group documents will be generated randomly.
- rnd fields in User and Group, are used for efficiently pick a random document.
- Relation documents are matched by randomly picking (User, Group) pairs.
Join Queries
Based on the Mongodb official document, aggregation enables pipelining several operators on the original data from a collection. From MongoEngine you may use aggregation via
cursor = SomeDocumentClass.objects.aggregate(*pipelines)
where pipelines is a list of stages defined by a dict(the keys of the dict can be found via the document). And let’s run a few examples.
Join User and Relation
The join is powered by the $lookup operator.
curosr = User.objects.aggregate(*[
{
'$lookup': {
'from': Relation._get_collection_name(),
'localField': '_id',
'foreignField': 'user_id',
'as': 'relation'}
}])
A object from the result set would be like
{
'_id': ObjectId('58725790958bee0e467d12fe'),
'rnd': 0.4821115717624539,
'relation': [
{
'user_id': ObjectId('58725790958bee0e467d12fe'),
'_id': ObjectId('58725790958bee0e467d12ff'),
'group_id': ObjectId('58725790958bee0e467d12e4')
}
],
'name': 'user_sqkXPqverW'}
Use $unwind to deal with list
You may have noticed from above results, the relation in a result object was a list since there may be one-to-many mapping. The $unwind will let you expand the list. And remember that aggregate is an pipeline, so we just append $unwind operators behind.
curosr = User.objects.aggregate(*[
{
'$lookup': {
'from': Relation._get_collection_name(),
'localField': '_id',
'foreignField': 'user_id',
'as': 'relation'},
'$unwind': {
'$unwind': "$relation"}
}])
And from the results, you will notice that relation became a dict:
{
'_id': ObjectId('58725790958bee0e467d12fe'),
'rnd': 0.4821115717624539,
'relation': {
'user_id': ObjectId('58725790958bee0e467d12fe'),
'_id': ObjectId('58725790958bee0e467d12ff'),
'group_id': ObjectId('58725790958bee0e467d12e4')
},
'name': 'user_sqkXPqverW'}
Join more collections: User, Relation and Group
Join more collections are pretty simple, you can just append more $lookup operators in the pipeline.
cursor = User.objects.aggregate(*[
{'$lookup': {'from': Relation._get_collection_name(),
'localField': '_id',
'foreignField': 'user_id',
'as': 'relation'}},
{'$unwind': "$relation"},
{'$lookup': {'from': Group._get_collection_name(),
'localField': 'relation.group_id',
'foreignField': '_id',
'as': 'group'}},
{'$unwind': "$group"},
])
And in result objects, there would be group object:
{
'group': {
'_id': ObjectId('58725790958bee0e467d12e4'),
'rnd': 0.9648027926390293,
'name': 'group_xgsYceERw'
},
'_id': ObjectId('58725790958bee0e467d12fe'),
'rnd': 0.4821115717624539,
'relation': {
'user_id': ObjectId('58725790958bee0e467d12fe'),
'_id': ObjectId('58725790958bee0e467d12ff'),
'group_id': ObjectId('58725790958bee0e467d12e4')},
'name': 'user_sqkXPqverW'
}
Several Notes
ObjectId Field
You should use ObjectId to perform join with other collection’s _id field(which is also an ObjectId), Mongodb can not do type conversion from ObjectId to string(or vice vesa) in a aggregation query.
There are several issues on Mongodb official jira addressing the problem(SERVER-11400, SERVER-22781, SERVER-24947).