Mongodb(via MongoEngine) join query with aggregate

2017-01-09

Since Mongodb 3.2 and MongoEngine 0.9, we can use $aggregate command to perform join queries on multiple collections in a database. This post would be a simple tutorial for join queries on Mongodb(via MongoEngine in Python) with examples.

Models Setup

Let’s consider models defined as below:

import random
import mongoengine


class User(mongoengine.Document):
    meta = {"indexes": ['rnd']}
    name = mongoengine.StringField()
    rnd = mongoengine.FloatField(default=random.random)


class Group(mongoengine.Document):
    meta = {"indexes": ['rnd']}
    name = mongoengine.StringField()
    rnd = mongoengine.FloatField(default=random.random)


class Relation(mongoengine.Document):
    meta = {"indexes": ['user_id', 'group_id']}
    user_id = mongoengine.ObjectIdField()
    group_id = mongoengine.ObjectIdField()

Note:

  • User and Group documents will be generated randomly.
  • rnd fields in User and Group, are used for efficiently pick a random document.
  • Relation documents are matched by randomly picking (User, Group) pairs.

Join Queries

Based on the Mongodb official document, aggregation enables pipelining several operators on the original data from a collection. From MongoEngine you may use aggregation via

cursor = SomeDocumentClass.objects.aggregate(*pipelines)

where pipelines is a list of stages defined by a dict(the keys of the dict can be found via the document). And let’s run a few examples.

Join User and Relation

The join is powered by the $lookup operator.

curosr = User.objects.aggregate(*[
         {
          '$lookup': {
              'from': Relation._get_collection_name(),
              'localField': '_id',
              'foreignField': 'user_id',
              'as': 'relation'}
         }])

A object from the result set would be like

{
 '_id': ObjectId('58725790958bee0e467d12fe'),
 'rnd': 0.4821115717624539,
 'relation': [
     {
      'user_id': ObjectId('58725790958bee0e467d12fe'),
      '_id': ObjectId('58725790958bee0e467d12ff'),
      'group_id': ObjectId('58725790958bee0e467d12e4')
     }
 ],
 'name': 'user_sqkXPqverW'}

Use $unwind to deal with list

You may have noticed from above results, the relation in a result object was a list since there may be one-to-many mapping. The $unwind will let you expand the list. And remember that aggregate is an pipeline, so we just append $unwind operators behind.

curosr = User.objects.aggregate(*[
         {
          '$lookup': {
              'from': Relation._get_collection_name(),
              'localField': '_id',
              'foreignField': 'user_id',
              'as': 'relation'},
          '$unwind': {
              '$unwind': "$relation"}
         }])

And from the results, you will notice that relation became a dict:

{
 '_id': ObjectId('58725790958bee0e467d12fe'),
 'rnd': 0.4821115717624539,
 'relation': {
      'user_id': ObjectId('58725790958bee0e467d12fe'),
      '_id': ObjectId('58725790958bee0e467d12ff'),
      'group_id': ObjectId('58725790958bee0e467d12e4')
     },
 'name': 'user_sqkXPqverW'}

Join more collections: User, Relation and Group

Join more collections are pretty simple, you can just append more $lookup operators in the pipeline.

cursor = User.objects.aggregate(*[
             {'$lookup': {'from': Relation._get_collection_name(),
                          'localField': '_id',
                          'foreignField': 'user_id',
                          'as': 'relation'}},
             {'$unwind': "$relation"},
             {'$lookup': {'from': Group._get_collection_name(),
                          'localField': 'relation.group_id',
                          'foreignField': '_id',
                          'as': 'group'}},
             {'$unwind': "$group"},
         ])

And in result objects, there would be group object:

{
 'group': {
     '_id': ObjectId('58725790958bee0e467d12e4'),
     'rnd': 0.9648027926390293,
     'name': 'group_xgsYceERw'
 },
 '_id': ObjectId('58725790958bee0e467d12fe'),
 'rnd': 0.4821115717624539,
 'relation': {
     'user_id': ObjectId('58725790958bee0e467d12fe'),
     '_id': ObjectId('58725790958bee0e467d12ff'),
     'group_id': ObjectId('58725790958bee0e467d12e4')},
     'name': 'user_sqkXPqverW'
 }

Several Notes

ObjectId Field

You should use ObjectId to perform join with other collection’s _id field(which is also an ObjectId), Mongodb can not do type conversion from ObjectId to string(or vice vesa) in a aggregation query.

There are several issues on Mongodb official jira addressing the problem(SERVER-11400, SERVER-22781, SERVER-24947).