Asked  6 Months ago    Answers:  5   Viewed   72 times

When we run a Mongo find() query without any sort order specified, what does the database internally use to sort the results?

According to the documentation on the mongo website:

When executing a find() with no parameters, the database returns objects in forward natural order.

For standard tables, natural order is not particularly useful because, although the order is often close to insertion order, it is not guaranteed to be. However, for Capped Collections, natural order is guaranteed to be the insertion order. This can be very useful.

However for standard collections (non capped collections), what field is used to sort the results? Is it the _id field or something else?

Edit:

Basically, I guess what I am trying to get at is that if I execute the following search query:

db.collection.find({"x":y}).skip(10000).limit(1000);

At two different points in time: t1 and t2, will I get different result sets:

  1. When there have been no additional writes between t1 & t2?
  2. When there have been new writes between t1 & t2?
  3. There are new indexes that have been added between t1 & t2?

I have run some tests on a temp database and the results I have gotten are the same (Yes) for all the 3 cases - but I wanted to be sure and I am certain that my test cases weren't very thorough.

 Answers

14

What is the default sort order when none is specified?

The default internal sort order (or natural order) is an undefined implementation detail. Maintaining order is extra overhead for storage engines and MongoDB's API does not mandate predictability outside of an explicit sort() or the special case of fixed-sized capped collections which have associated usage restrictions. For typical workloads it is desirable for the storage engine to try to reuse available preallocated space and make decisions about how to most efficiently store data on disk and in memory.

Without any query criteria, results will be returned by the storage engine in natural order (aka in the order they are found). Result order may coincide with insertion order but this behaviour is not guaranteed and cannot be relied on (aside from capped collections).

Some examples that may affect storage (natural) order:

  • WiredTiger uses a different representation of documents on disk versus the in-memory cache, so natural ordering may change based on internal data structures.
  • The original MMAPv1 storage engine (removed in MongoDB 4.2) allocates record space for documents based on padding rules. If a document outgrows the currently allocated record space, the document location (and natural ordering) will be affected. New documents can also be inserted in storage marked available for reuse due to deleted or moved documents.
  • Replication uses an idempotent oplog format to apply write operations consistently across replica set members. Each replica set member maintains local data files that can vary in natural order, but will have the same data outcome when oplog updates are applied.

What if an index is used?

If an index is used, documents will be returned in the order they are found (which does necessarily match insertion order or I/O order). If more than one index is used then the order depends internally on which index first identified the document during the de-duplication process.

If you want a predictable sort order you must include an explicit sort() with your query and have unique values for your sort key.

How do capped collections maintain insertion order?

The implementation exception noted for natural order in capped collections is enforced by their special usage restrictions: documents are stored in insertion order but existing document size cannot be increased and documents cannot be explicitly deleted. Ordering is part of the capped collection design that ensures the oldest documents "age out" first.

Tuesday, June 1, 2021
 
ioleo
answered 6 Months ago
23

If you don't specify an ORDER BY, then there is NO ORDER defined.

The results can be returned in an arbitrary order - and that might change over time, too.

There is no "natural order" or anything like that in a relational database (at least in all that I know of). The only way to get a reliable ordering is by explicitly specifying an ORDER BY clause.

Update: for those who still don't believe me - here's two excellent blog posts that illustrate this point (with code samples!) :

  • Conor Cunningham (Architect on the Core SQL Server Engine team): No Seatbelt - Expecting Order without ORDER BY
  • Alexander Kuznetsov: Without ORDER BY, there is no default sort order (post in the Web Archive)
Wednesday, June 2, 2021
 
Neysor
answered 6 Months ago
46

There is no guarantee which two rows you get. It will just be the first two retrieved from the table scan.

The TOP iterator in the execution plan will stop requesting rows once two have been returned.

Likely for a scan of a heap this will be the first two rows in allocation order but this is not guaranteed. For example SQL Server might use the advanced scanning feature which means that your scan will read pages recently read from another concurrent scan.

Wednesday, July 21, 2021
 
mgraph
answered 5 Months ago
76

You don't need the full notation as the placeholder has already moved to that position in the array.

db.junk.update(
    { "commandes.voyagesSouscrits.idVoyage": "123" },
    {$pull: { "commandes.$.voyagesSouscrits": { idVoyage: "123" } }}
)

This part:

idVoyage: { <query> }

is only needed because the positional operator in "commandes.$.voyagesSouscrits" can only match the first array position found in the query.

http://docs.mongodb.org/manual/reference/operator/projection/positional/

Hope that clears it up.

Friday, August 6, 2021
 
dmp
answered 4 Months ago
dmp
83

TL;DR:

Use the async driver if the operations are slow, or use the regular driver in most cases. You shouldn't use the core driver.

MongoDB Regular Driver:

General driver that you can use to search, create, read, update and delete documents. The find(...), updateMany(...), deleteMany(...) and similar methods will hang for as long as the result is not returned or the operation not done (synchronous behavior). This is the driver that most program uses and is good in most cases.

Here is an example for inserting a single Document:

collection.insertOne(doc);
//Do something here.
System.out.println("Inserted!")

MongoDB Async Driver:

Another type of driver that you can use to search, create, read, update and delete documents. This driver offers similar methods than the regular driver (find(...), updateMany(...), deleteMany(...), etc.).

The difference with the regular driver is that the main thread will not hang because the async driver sends the result in a callback (asynchronous behavior). This driver is used when the operations can take a long time (a lot of data to go through, high latency, query on unindexed fields, etc.) and you do not want to manage multiple threads.

Here is an example of the callback when inserting a single Document:

collection.insertOne(doc, new SingleResultCallback<Void>() {
    @Override
    public void onResult(final Void result, final Throwable t) {
        //Do something here.
        System.out.println("Inserted!");
    }
});
// Do something to show that the Document was not inserted yet.
System.out.println("Inserting...")

For more informations, read this.

MongoDB Core Driver

Base layer of the regular and async drivers. It contains low-level methods to do all the operations common to the regular and async drivers. Unless you are making a new API / Driver for MongoDB, you shouldn't use the core driver.

Tuesday, November 2, 2021
 
Industrial
answered 4 Weeks ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share