Asked  7 Months ago    Answers:  5   Viewed   129 times

What's the syntax for doing a $lookup on a field that is an array of ObjectIds rather than just a single ObjectId?

Example Order Document:

{
  _id: ObjectId("..."),
  products: [
    ObjectId("..<Car ObjectId>.."),
    ObjectId("..<Bike ObjectId>..")
  ]
}

Not Working Query:

db.orders.aggregate([
    {
       $lookup:
         {
           from: "products",
           localField: "products",
           foreignField: "_id",
           as: "productObjects"
         }
    }
])

Desired Result

{
  _id: ObjectId("..."),
  products: [
    ObjectId("..<Car ObjectId>.."),
    ObjectId("..<Bike ObjectId>..")
  ],
  productObjects: [
    {<Car Object>},
    {<Bike Object>}
  ],
}

 Answers

70

2017 update

$lookup can now directly use an array as the local field. $unwind is no longer needed.

Old answer

The $lookup aggregation pipeline stage will not work directly with an array. The main intent of the design is for a "left join" as a "one to many" type of join ( or really a "lookup" ) on the possible related data. But the value is intended to be singular and not an array.

Therefore you must "de-normalise" the content first prior to performing the $lookup operation in order for this to work. And that means using $unwind:

db.orders.aggregate([
    // Unwind the source
    { "$unwind": "$products" },
    // Do the lookup matching
    { "$lookup": {
       "from": "products",
       "localField": "products",
       "foreignField": "_id",
       "as": "productObjects"
    }},
    // Unwind the result arrays ( likely one or none )
    { "$unwind": "$productObjects" },
    // Group back to arrays
    { "$group": {
        "_id": "$_id",
        "products": { "$push": "$products" },
        "productObjects": { "$push": "$productObjects" }
    }}
])

After $lookup matches each array member the result is an array itself, so you $unwind again and $group to $push new arrays for the final result.

Note that any "left join" matches that are not found will create an empty array for the "productObjects" on the given product and thus negate the document for the "product" element when the second $unwind is called.

Though a direct application to an array would be nice, it's just how this currently works by matching a singular value to a possible many.

As $lookup is basically very new, it currently works as would be familiar to those who are familiar with mongoose as a "poor mans version" of the .populate() method offered there. The difference being that $lookup offers "server side" processing of the "join" as opposed to on the client and that some of the "maturity" in $lookup is currently lacking from what .populate() offers ( such as interpolating the lookup directly on an array ).

This is actually an assigned issue for improvement SERVER-22881, so with some luck this would hit the next release or one soon after.

As a design principle, your current structure is neither good or bad, but just subject to overheads when creating any "join". As such, the basic standing principle of MongoDB in inception applies, where if you "can" live with the data "pre-joined" in the one collection, then it is best to do so.

The one other thing that can be said of $lookup as a general principle, is that the intent of the "join" here is to work the other way around than shown here. So rather than keeping the "related ids" of the other documents within the "parent" document, the general principle that works best is where the "related documents" contain a reference to the "parent".

So $lookup can be said to "work best" with a "relation design" that is the reverse of how something like mongoose .populate() performs it's client side joins. By idendifying the "one" within each "many" instead, then you just pull in the related items without needing to $unwind the array first.

Tuesday, June 1, 2021
 
weegee
answered 7 Months ago
98

First of all, it is all_category_id, not category_id. Secondly, you don't link articles - all documents will have exactly the same article_category array. Lastly, you probably want to filter out articles that don't have matched category. The conditional pipeline should look more like this:

db.article.aggregate([
  { $match: {
      title: { $regex: /example/ }
  } },
  { $lookup: {
    from: "article_category",
    let: {
      article_id: "$article_id"
    },
    pipeline: [
      { $match: {
          $expr: { $and: [
              { $in: [ 8, "$all_category_id" ] },
              { $eq: [ "$article_id", "$$article_id" ] }
          ] }
      } }
    ],
    as: "article_category"
  } },
  { $match: {
    $expr: { $gt: [
      { $size: "$article_category"},
      0
    ] }
  } }
] )

UPDATE:

If you don't match article_id, the $lookup will result with identical article_category array to all articles.

Let's say your article_category collection has another document:

{
  "article_id": 0,
  "all_category_id": [5,8,10]
}

With { $eq: [ "$article_id", "$$article_id" ] } in the pipeline the resulting article_category is

[ 
  { 
    "article_id" : 2015110920343902, 
    "all_category_id" : [ 5, 8, 10 ] 
  } 
]

without:

[ 
  { 
    "article_id" : 2015110920343902, 
    "all_category_id" : [ 5, 8, 10 ] 
  },
  {
    "article_id": 0,
    "all_category_id": [ 5, 8, 10 ]
  }
]

If the later is what you need, it would be way simpler to make to find requests:

db.article.find({ title: { $regex: /example/ } })

and

db.article_category.find({ all_category_id: 8 })
Friday, June 11, 2021
 
madphp
answered 6 Months ago
64

This is my answer to the question after reading the post suggested by @Veeram

db.collection.aggregate([{
"$group":{
    "field": {
        "$push": {
            "$cond":[
                {"$gt":["$A", 0]},
                {"id": "$_id", "A":"$A"},
                null
            ]
        }
    },
    "secondField":{"$push":"$B"}
},
{
    "$project": {
        "A":{"$setDifference":["$A", [null]]},
        "B":"$B"
    }
}])
Thursday, July 22, 2021
 
astaykov
answered 5 Months ago
76

array_contains() works, and you only have to group the result by the player afterwards.

Lets start with two datasets, one for the players and one for the guitars:

val player = Seq(("Eric Clapton", Array(1,5)), ("Paco de Lucia", Array(1,2)), ("Jimi Hendrix", Array(3))).toDF("player", "guitars")
val guitar = Seq((1, "Gibson", "SG", "Electric"), (2, "Faustino Conde", "Media Luna", "Acoustic"), (3, "Pulsebeatguitars", "Spider", "Electric"), (4, "Yamaha", "FG800", "Acoustic"), (5, "Fender", "Stratocaster", "Electric")).toDF("guitarId", "make", "model", "type")
+-------------+-------+
|       player|guitars|
+-------------+-------+
| Eric Clapton| [1, 5]|
|Paco de Lucia| [1, 2]|
| Jimi Hendrix|    [3]|
+-------------+-------+
+--------+----------------+------------+--------+
|guitarId|            make|       model|    type|
+--------+----------------+------------+--------+
|       1|          Gibson|          SG|Electric|
|       2|  Faustino Conde|  Media Luna|Acoustic|
|       3|Pulsebeatguitars|      Spider|Electric|
|       4|          Yamaha|       FG800|Acoustic|
|       5|          Fender|Stratocaster|Electric|
+--------+----------------+------------+--------+

To make the grouping operation a bit easier, the idea is to combine the three columns of the guitar dataset into a struct before the join:

val guitar2 = guitar.withColumn("guitar", struct('make, 'model, 'type))

After the join, we group the result by the player and get the correct result:

player.join(guitar2, expr("array_contains(guitars, guitarId)"))
  .groupBy("player")
  .agg(collect_list('guitar))
  .show(false)

prints

+-------------+----------------------------------------------------------------+
|player       |collect_list(guitar)                                            |
+-------------+----------------------------------------------------------------+
|Jimi Hendrix |[[Pulsebeatguitars, Spider, Electric]]                          |
|Eric Clapton |[[Gibson, SG, Electric], [Fender, Stratocaster, Electric]]      |
|Paco de Lucia|[[Gibson, SG, Electric], [Faustino Conde, Media Luna, Acoustic]]|
+-------------+----------------------------------------------------------------+
Friday, August 20, 2021
 
Sufi
answered 4 Months ago
86

There is one more way to go. Instead of creating wrapper object you can create Map<group,List<child>> selected items and add items to this list pretty much the same way, by listening this event:

mExpandableList.setOnChildClickListener(new OnChildClickListener() {
  @Override public boolean onChildClick(ExpandableListView parent,
    View v,int groupPosition, int childPosition, long id) {
      //toggle selections code here 
  }

So here is important part: (and full working github repo )

@Override
protected void onCreate(Bundle savedInstanceState) {
    ...// some init part 

    final MyAdapter adapter = new MyAdapter(schools, students);
    listView.setAdapter(adapter);

    listView.setOnChildClickListener(new ExpandableListView.OnChildClickListener() {
        @Override
        public boolean onChildClick(ExpandableListView parent, 
                   View v, int groupPosition, int childPosition, long id) {
            adapter.toggleSelection(groupPosition, childPosition);
            adapter.notifyDataSetInvalidated();
            return false;
        }
    });
}

We have several options where to put such selected items map, but in my project I use it in custom adapter class. There is no real need for custome adapter, and it is possible to put Map<G, List<C>> selectedItems; and related functions (toggleSelection, isSelected, getSelectedItems) in Activity, but we still need to highlight selected sells, so adapter usually is the best place to put it.

private class MyAdapter<G, C> extends BaseExpandableListAdapter {
    private List<G> groups;
    private Map<G, List<C>> childMap;
    private Map<G, List<C>> selectedItems;

    public MyAdapter(List<G> groups, Map<G, List<C>> childMap){
        this.groups = groups;
        this.childMap = childMap;
        this.selectedItems = new HashMap<>();
    }

    public boolean isSelected(int groupPosition, int childPosition){
        G group = groups.get(groupPosition);
        // getChild is adapter Fn and is the same as
        // G group = groups.get(groupPosition)
        // C child = childMap.get(group).get(childPosition);
        C child = getChild(groupPosition, childPosition);
        List<C> sel = selectedItems.get(group);
        return sel != null && sel.contains(child);
    }

    public void toggleSelection(int groupPosition, int childPosition){
        G group = groups.get(groupPosition);
        C child = getChild(groupPosition,childPosition);
        List<C> sel = selectedItems.get(group);
        if (sel == null){
            sel = new ArrayList<>(); // Lasy arrays creation
            //can init all arrays in constructor and never check for nulls
            selectedItems.put(group, sel);
        }
        if (sel.contains(child)) sel.remove(child);
        else sel.add(child);
    }
    ... // Adapter fns can find in git repo 

enter image description here

And convert result Map to List will be an easy task:

private ArrayList<String> selectedAsList(Map<String, List<String>> selectedItems){
     ArrayList<String> result =  new ArrayList<>();
    for(List<String> students: selectedItems.values())
        result.addAll(students);
    return result;
}

or something similar.

PS. You can also play with Map<group,List<child>> .It can be pretty much any data structure you want it to be. 2 arrays or maybe just 1 single array if you don't have duplicates in your group data. You can control it, limit the number of selections and so on...

Saturday, October 30, 2021
 
Do Tog
answered 1 Month ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share