MongoDB – What about cross reference queries ?
Disclaimer – This post does NOT contain code, since its about architecting the DB model for the application.
MongoDB is great document database, and we use it in one of Tikal's customers, taking advantage (among other advantages) of its “schema-less” functionality. This schema-less is considered more flexible than the standard RDBMS, since you don't have to define your schema in advance. As I described in my previous post, I used this schema-less to resolve the “dynamic model” requirement for the customer. But...While, evaluating the MongoDB I encountered a important limitation – “cross reference” queries. I'll try to explain the problem using an example.
Lets say your model contains two entities : Person and Car. Lets assume for the simplicity that a person can have many car, but a car can be associated with only one person (@OneToMany in JPA world). This will be reflected in RDBMS in two tables accordingly “persons” table and “cars” table. The “cars” table will also have a foreign-key on a column “person_id” pointing the associated person and posing a constraint and index in the RDBMS. This simple schema enables us to run SQL for both sides of the relations and apply “cross-reference “ queries. For example we can return all cars for adults (age>25) for modern cars (created after 2009) ordered by the car's color and then (secondary order) by the car model. All you need to do for this kind of queries is do “join” query and then you can apply both criteria and ordering on your query. Assume that your DB contains many persons and cars so, and you need to rerun only the first 20 results. Almost all RDBMS can handle this retirement easily without any burden on the application. So far so good, but lets see what happens in MongoDB...
MongoDB anatomy is around the notion of documents. Documents are aggregated within collections. You can apply queries on a collection to get back your desired documents accordingly. How would you apply the above model with MongoDB?
Well, we have (at least) two options. The first option is to create a separate collection for the cars and a separate collections for the persons. Nevertheless, this design is turned to be limited in both functionality and performance. MongoDB has no notion of “join” between collections. So in order to apply this design, you will have to put a “person_id” reference in the car document. This reference can be used for simple cases, when you have a car and you want to get back the associated person. But our use case is more complicated - We want to “cross” the collection and apply a criteria on the persons documents and on the cars document and orderer the data (again, on both collections) at the same query. Yes...you are right – it will NOT go this way...
Lets consider the second option – Lets put the cars data inside the persons documents. This means that we'll have only one collection – person collection. MongoDB support this, since you can put array f data inside a document. This demoralization is actually encouraged by MongoDB ,and can boost performance for some use cases, when you want to filter queries from both entities. However there is a limitation in MongoDB – When you query a collection, you always get back the top level of the document (persons in our case) as your result. This means you can NOT order the results according to the car's model, and the results will NOT reflect our requirement.
I had a correspondence with 10gen, the company behind MongoDB. Following their suggestion, I opened a new JIRA on it. I also tried to look on other document databases like CouchDB and XML databases. They all lack this functionality – they all miss the cross reference queries. MongoDB is flexible in many cases as RDBMS, and I hope the will close this gap also in future versions.
In summary, MongoDB is a great an flexible database. Maybe the most flexible/function-rich from all NoSQL databases. However the lack of cross reference functionality has no descent answer. I wish they will resolve it soon.