Asked  7 Months ago    Answers:  5   Viewed   1.7k times

The data type of the field is String. I would like to fetch the data where character length of field name is greater than 40.

I tried these queries but returning error. 1.

db.usercollection.find(
{$where: "(this.name.length > 40)"}
).limit(2);

output :error: {
    "$err" : "TypeError: Cannot read property 'length' of undefined near '40)' ",
    "code" : 16722
}

this is working in 2.4.9 But my version is 2.6.5

 Answers

42

For MongoDB 3.6 and newer:

The $expr operator allows the use of aggregation expressions within the query language, thus you can leverage the use of $strLenCP operator to check the length of the string as follows:

db.usercollection.find({ 
    "name": { "$exists": true },
    "$expr": { "$gt": [ { "$strLenCP": "$name" }, 40 ] } 
})

For MongoDB 3.4 and newer:

You can also use the aggregation framework with the $redact pipeline operator that allows you to proccess the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.

This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient.

As for the logical condition, there are String Aggregation Operators that you can use $strLenCP operator to check the length of the string. If the length is $gt a specified value, then this is a true match and the document is "kept". Otherwise it is "pruned" and discarded.


Consider running the following aggregate operation which demonstrates the above concept:

db.usercollection.aggregate([
    { "$match": { "name": { "$exists": true } } },
    {
        "$redact": {
            "$cond": [
                { "$gt": [ { "$strLenCP": "$name" }, 40] },
                "$$KEEP",
                "$$PRUNE"
            ]
        }
    },
    { "$limit": 2 }
])

If using $where, try your query without the enclosing brackets:

db.usercollection.find({$where: "this.name.length > 40"}).limit(2);

A better query would be to to check for the field's existence and then check the length:

db.usercollection.find({name: {$type: 2}, $where: "this.name.length > 40"}).limit(2); 

or:

db.usercollection.find({name: {$exists: true}, $where: "this.name.length > 
40"}).limit(2); 

MongoDB evaluates non-$where query operations before $where expressions and non-$where query statements may use an index. A much better performance is to store the length of the string as another field and then you can index or search on it; applying $where will be much slower compared to that. It's recommended to use JavaScript expressions and the $where operator as a last resort when you can't structure the data in any other way, or when you are dealing with a small subset of data.


A different and faster approach that avoids the use of the $where operator is the $regex operator. Consider the following pattern which searches for

db.usercollection.find({"name": {"$type": 2, "$regex": /^.{41,}$/}}).limit(2); 

Note - From the docs:

If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.

A regular expression is a “prefix expression” if it starts with a caret (^) or a left anchor (A), followed by a string of simple symbols. For example, the regex /^abc.*/ will be optimized by matching only against the values from the index that start with abc.

Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent strings, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prefix.

Thursday, June 3, 2021
 
TheFrack
answered 7 Months ago
15

Considering the String class' length method returns an int, the maximum length that would be returned by the method would be Integer.MAX_VALUE, which is 2^31 - 1 (or approximately 2 billion.)

In terms of lengths and indexing of arrays, (such as char[], which is probably the way the internal data representation is implemented for Strings), Chapter 10: Arrays of The Java Language Specification, Java SE 7 Edition says the following:

The variables contained in an array have no names; instead they are referenced by array access expressions that use nonnegative integer index values. These variables are called the components of the array. If an array has n components, we say n is the length of the array; the components of the array are referenced using integer indices from 0 to n - 1, inclusive.

Furthermore, the indexing must be by int values, as mentioned in Section 10.4:

Arrays must be indexed by int values;

Therefore, it appears that the limit is indeed 2^31 - 1, as that is the maximum value for a nonnegative int value.

However, there probably are going to be other limitations, such as the maximum allocatable size for an array.

Tuesday, June 1, 2021
 
Ultimater
answered 7 Months ago
11

UTF-8 string length

In addition to fedorqui's correct answer, I would like to show the difference between string length and byte length:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.n" "${myvar}" $chrlen $bytlen

will render:

Généralités is 11 char len, but 14 bytes len.

you could even have a look at stored chars:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).n" "${myvar}" $chrlen $bytlen "$myreal"

will answer:

Généralités has 11 chars, 14 bytes: ($'G303251n303251ralit303251s').

Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL along with $LANG.

Length of an argument

Argument work same as regular variables

strLen() {
    local bytlen sreal oLang=$LANG oLcAll=$LC_ALL
    LANG=C LC_ALL=C
    bytlen=${#1}
    printf -v sreal %q "$1"
    LANG=$oLang LC_ALL=$oLcAll
    printf "String '%s' is %d bytes, but %d chars len: %s.n" "$1" $bytlen ${#1} "$sreal"
}

will work as

strLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th303251or303250me'

Useful printf correction tool:

If you:

for string in Généralités Language Théorème Février  "Left: ?" "Yin Yang ?";do
    printf " - %-14s is %2d char lengthn" "'$string'"  ${#string}
done

 - 'Généralités' is 11 char length
 - 'Language'     is  8 char length
 - 'Théorème'   is  8 char length
 - 'Février'     is  7 char length
 - 'Left: ?'    is  7 char length
 - 'Yin Yang ?' is 10 char length

Not really pretty... For this, there is a little function:

strU8DiffLen () { 
    local bytlen oLang=$LANG oLcAll=$LC_ALL
    LANG=C LC_ALL=C
    bytlen=${#1}
    LANG=$oLang LC_ALL=$oLcAll
    return $(( bytlen - ${#1} ))
}

Then now:

for string in Généralités Language Théorème Février  "Left: ?" "Yin Yang ?";do
    strU8DiffLen "$string"
    printf " - %-$((14+$?))s is %2d chars length, but uses %2d bytesn" 
        "'$string'" ${#string} $((${#string}+$?))
  done 

 - 'Généralités'  is 11 chars length, but uses 14 bytes
 - 'Language'     is  8 chars length, but uses  8 bytes
 - 'Théorème'     is  8 chars length, but uses 10 bytes
 - 'Février'      is  7 chars length, but uses  8 bytes
 - 'Left: ?'      is  7 chars length, but uses  9 bytes
 - 'Yin Yang ?'   is 10 chars length, but uses 12 bytes

Unfortunely, this is not perfect!

But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...

Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.

Wednesday, June 9, 2021
 
Tak
answered 6 Months ago
Tak
86

You can use the length filter:

{% if some_var|length > 1 %}
Tuesday, August 3, 2021
 
freeMagee
answered 4 Months ago
72

Considering the amount of data, I would definitely run this directly on the server. Here is an example that works to use as a guide. It will replace the names field with a new array from the $split.

db.collectionName.aggregate(
    [
        { "$addFields": { 
            "names": { "$split": [ "$names", "," ] } 
        }},
        {$out:"collectionName"}
    ]
)
Sunday, August 29, 2021
 
mertak
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share