1

The sample data in the database looks something like this:

{
'data':
[
    'Log':
    {
        'IP':['8.8.8.8','8.8.4.4'],
        'URL':['www.google.com']
        'Hash' ['d2a12319bf1221ce7681928cc']
    },
    'Log':
    {
        'IP':['1.2.3.4'],
        'URL':['www.cnn.com']
        'Hash' []
    },

]
}

I am trying to aggregate a list of unique IP, URL and Hash from the above list of logs. My current query looks sth like this:

db.loglist.aggregate([{'$match':{'data.Log':{'$exists':true}}},
                {'$unwind':'$data'},
                {'$unwind':'$data.Log.URL'},
                {'$unwind':'$data.Log.Hash'},       
                {'$unwind':'$data.Log.IP'},
                {'$group':{'_id':'$ioc',
                            'FHList':{'$addToSet':'$data.Log.Hash'},
                            'URLList':{'$addToSet':'$data.Log.URL'},
                            'IPList':{'$addToSet':'$data.Log.IP'}}
                }]) 

It works well if for every log, there is at least one element in each of the three arrays. However, when there is an empty array appears in any one of the logs. Mongo returns empty for the whole query. I figured out it's the default behavior of $unwind from a few similar posts. But what is the standard way to use $unwind then, if say we have no results for "Hash", we can still keep the results for "IP" and "URL".

Thanks in advance for any answer.

1 Answer 1

1

The $cond operator is the main helper here, with a test to see if the array is empty, and replace it with another value to filter later:

db.loglist.aggregate([
  {"$match":{"data.Log":{"$exists":true}}},
  {"$unwind":"$data"},
  { "$project": {
     "ioc": 1,
     "data": {
         "Log": {
            "IP": { "$cond": [
              { "$ne": [ "$IP", [] ] },
              "$IP",
              [false]
            ]},
            "URL": { "$cond": [
              { "$ne": [ "$URL", [] ] },
              "$URL",
              [false]
            ]},
            "Hash": { "$cond": [
              { "$ne": [ "$Hash", [] ] },
              "$Hash",
              [false]
            ]}
         }
     }
  }}
  {"$unwind":"$data.Log.URL"},
  {"$unwind":"$data.Log.Hash"},       
  {"$unwind":"$data.Log.IP"},
  {"$group":{
      "_id":"$ioc",
      "FHList":{"$addToSet":"$data.Log.Hash"},
      "URLList":{"$addToSet":"$data.Log.URL"},
      "IPList":{"$addToSet":"$data.Log.IP"}
  }},
  { "$project": {
      "FHList":{ "$setDifference": ["$FHList", [false]] },
      "URLList":{ "$setDifference": ["$URList", [false]] },
      "IPList":{ "$setDifference": ["$IPList", [false]] }
  }}
])  

Once the set it contructed the unwanted value is filtered away.

If your MongoDB version is less than 2.6 and you do not have $setDifference then your can filter after unwinding again, presuming that no result array would be expected to be empty here:

db.loglist.aggregate([
  {"$match":{"data.Log":{"$exists":true}}},
  {"$unwind":"$data"},
  { "$project": {
     "ioc": 1,
     "data": {
         "Log": {
            "IP": { "$cond": [
              { "$ne": [ "$IP", [] ] },
              "$IP",
              [false]
            ]},
            "URL": { "$cond": [
              { "$ne": [ "$URL", [] ] },
              "$URL",
              [false]
            ]},
            "Hash": { "$cond": [
              { "$ne": [ "$Hash", [] ] },
              "$Hash",
              [false]
            ]}
         }
     }
  }}
  {"$unwind":"$data.Log.URL"},
  {"$unwind":"$data.Log.Hash"},       
  {"$unwind":"$data.Log.IP"},
  {"$group":{
      "_id":"$ioc",
      "FHList":{"$addToSet":"$data.Log.Hash"},
      "URLList":{"$addToSet":"$data.Log.URL"},
      "IPList":{"$addToSet":"$data.Log.IP"}
  }},
  { "$unwind": "$FHList" },
  { "$match": { "FHList": { "$ne": false } }},
  { "$unwind": "$URLList" },
  { "$match": { "URLList": { "$ne": false } }},
  { "$unwind": "$IPList" },
  { "$match": { "IPList": { "$ne": false } }},
  { "$group": {
      "_id": "$_id",
      "FHList":{ "$addToSet":"$FHList" },
      "URLList":{ "$addToSet":"$URLList" },
      "IPList":{ "$addToSet":"$IPList" }
  }}
])  

If your grouped arrays were empty then it is tricky in the second form but still possible.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.