MongoDB unwind multiple empty arrays

Question

The sample data in the database looks something like this:

{
'data':
[
    'Log':
    {
        'IP':['8.8.8.8','8.8.4.4'],
        'URL':['www.google.com']
        'Hash' ['d2a12319bf1221ce7681928cc']
    },
    'Log':
    {
        'IP':['1.2.3.4'],
        'URL':['www.cnn.com']
        'Hash' []
    },

]
}

I am trying to aggregate a list of unique IP, URL and Hash from the above list of logs. My current query looks sth like this:

db.loglist.aggregate([{'$match':{'data.Log':{'$exists':true}}},
                {'$unwind':'$data'},
                {'$unwind':'$data.Log.URL'},
                {'$unwind':'$data.Log.Hash'},       
                {'$unwind':'$data.Log.IP'},
                {'$group':{'_id':'$ioc',
                            'FHList':{'$addToSet':'$data.Log.Hash'},
                            'URLList':{'$addToSet':'$data.Log.URL'},
                            'IPList':{'$addToSet':'$data.Log.IP'}}
                }])

It works well if for every log, there is at least one element in each of the three arrays. However, when there is an empty array appears in any one of the logs. Mongo returns empty for the whole query. I figured out it's the default behavior of $unwind from a few similar posts. But what is the standard way to use $unwind then, if say we have no results for "Hash", we can still keep the results for "IP" and "URL".

Thanks in advance for any answer.

Blakes Seven · Accepted Answer · 2015-07-30 01:15:12Z

The $cond operator is the main helper here, with a test to see if the array is empty, and replace it with another value to filter later:

db.loglist.aggregate([
  {"$match":{"data.Log":{"$exists":true}}},
  {"$unwind":"$data"},
  { "$project": {
     "ioc": 1,
     "data": {
         "Log": {
            "IP": { "$cond": [
              { "$ne": [ "$IP", [] ] },
              "$IP",
              [false]
            ]},
            "URL": { "$cond": [
              { "$ne": [ "$URL", [] ] },
              "$URL",
              [false]
            ]},
            "Hash": { "$cond": [
              { "$ne": [ "$Hash", [] ] },
              "$Hash",
              [false]
            ]}
         }
     }
  }}
  {"$unwind":"$data.Log.URL"},
  {"$unwind":"$data.Log.Hash"},       
  {"$unwind":"$data.Log.IP"},
  {"$group":{
      "_id":"$ioc",
      "FHList":{"$addToSet":"$data.Log.Hash"},
      "URLList":{"$addToSet":"$data.Log.URL"},
      "IPList":{"$addToSet":"$data.Log.IP"}
  }},
  { "$project": {
      "FHList":{ "$setDifference": ["$FHList", [false]] },
      "URLList":{ "$setDifference": ["$URList", [false]] },
      "IPList":{ "$setDifference": ["$IPList", [false]] }
  }}
])

Once the set it contructed the unwanted value is filtered away.

If your MongoDB version is less than 2.6 and you do not have $setDifference then your can filter after unwinding again, presuming that no result array would be expected to be empty here:

db.loglist.aggregate([
  {"$match":{"data.Log":{"$exists":true}}},
  {"$unwind":"$data"},
  { "$project": {
     "ioc": 1,
     "data": {
         "Log": {
            "IP": { "$cond": [
              { "$ne": [ "$IP", [] ] },
              "$IP",
              [false]
            ]},
            "URL": { "$cond": [
              { "$ne": [ "$URL", [] ] },
              "$URL",
              [false]
            ]},
            "Hash": { "$cond": [
              { "$ne": [ "$Hash", [] ] },
              "$Hash",
              [false]
            ]}
         }
     }
  }}
  {"$unwind":"$data.Log.URL"},
  {"$unwind":"$data.Log.Hash"},       
  {"$unwind":"$data.Log.IP"},
  {"$group":{
      "_id":"$ioc",
      "FHList":{"$addToSet":"$data.Log.Hash"},
      "URLList":{"$addToSet":"$data.Log.URL"},
      "IPList":{"$addToSet":"$data.Log.IP"}
  }},
  { "$unwind": "$FHList" },
  { "$match": { "FHList": { "$ne": false } }},
  { "$unwind": "$URLList" },
  { "$match": { "URLList": { "$ne": false } }},
  { "$unwind": "$IPList" },
  { "$match": { "IPList": { "$ne": false } }},
  { "$group": {
      "_id": "$_id",
      "FHList":{ "$addToSet":"$FHList" },
      "URLList":{ "$addToSet":"$URLList" },
      "IPList":{ "$addToSet":"$IPList" }
  }}
])

If your grouped arrays were empty then it is tricky in the second form but still possible.

Collectives™ on Stack Overflow

MongoDB unwind multiple empty arrays

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related