The sample data in the database looks something like this:
{
'data':
[
'Log':
{
'IP':['8.8.8.8','8.8.4.4'],
'URL':['www.google.com']
'Hash' ['d2a12319bf1221ce7681928cc']
},
'Log':
{
'IP':['1.2.3.4'],
'URL':['www.cnn.com']
'Hash' []
},
]
}
I am trying to aggregate a list of unique IP, URL and Hash from the above list of logs. My current query looks sth like this:
db.loglist.aggregate([{'$match':{'data.Log':{'$exists':true}}},
{'$unwind':'$data'},
{'$unwind':'$data.Log.URL'},
{'$unwind':'$data.Log.Hash'},
{'$unwind':'$data.Log.IP'},
{'$group':{'_id':'$ioc',
'FHList':{'$addToSet':'$data.Log.Hash'},
'URLList':{'$addToSet':'$data.Log.URL'},
'IPList':{'$addToSet':'$data.Log.IP'}}
}])
It works well if for every log, there is at least one element in each of the three arrays. However, when there is an empty array appears in any one of the logs. Mongo returns empty for the whole query. I figured out it's the default behavior of $unwind from a few similar posts. But what is the standard way to use $unwind then, if say we have no results for "Hash", we can still keep the results for "IP" and "URL".
Thanks in advance for any answer.