2

With this data:

{
    "_id" : ObjectId("576948b4999274493425c08a"),
    "virustotal" : {
        "scan_id" : "4a6c3dfc6677a87aee84f4b629303c40bb9e1dda283a67236e49979f96864078-1465973544",
        "sha1" : "fd177b8c50b457dbec7cba56aeb10e9e38ebf72f",
        "resource" : "4a6c3dfc6677a87aee84f4b629303c40bb9e1dda283a67236e49979f96864078",
        "response_code" : 1,
        "scan_date" : "2016-06-15 06:52:24",
        "results" : [ 
            {
                "sig" : "Gen:Variant.Mikey.29601",
                "vendor" : "MicroWorld-eScan"
            }, 
            {
                "sig" : null,
                "vendor" : "nProtect"
            }, 
            {
                "sig" : null,
                "vendor" : "CAT-QuickHeal"
            }, 
            {
                "sig" : "HEUR/QVM07.1.0000.Malware.Gen",
                "vendor" : "Qihoo-360"
            }
        ]
    }
},
{
    "_id" : ObjectId("5768f214999274362f714e8b"),
    "virustotal" : {
        "scan_id" : "3d283314da4f99f1a0b59af7dc1024df42c3139fd6d4d4fb4015524002b38391-1466529838",
        "sha1" : "fb865b8f0227e9097321182324c959106fcd8c27",
        "resource" : "3d283314da4f99f1a0b59af7dc1024df42c3139fd6d4d4fb4015524002b38391",
        "response_code" : 1,
        "scan_date" : "2016-06-21 17:23:58",
        "results" : [ 
            {
                "sig" : null,
                "vendor" : "Bkav"
            }, 
            {
                "sig" : null,
                "vendor" : "ahnlab"
            }, 
            {
                "sig" : null,
                "vendor" : "MicroWorld-eScan"
            }, 
            {
                "sig" : "Mal/DrodZp-A",
                "vendor" : "Qihoo-360"
            }
        ]
    }
}

I'm trying to group by and count the vendor when sig is not null in order to obtain something like:

{
    "_id" : "Qihoo-360",
    "count" : 2
},
{
    "_id" : "MicroWorld-eScan",
    "count" : 1
},
{
    "_id" : "Bkav",
    "count" : 0
},
{
    "_id" : "CAT-QuickHeal",
    "count" : 0
}

At the moment with this code:

db.analysis.aggregate([ 
    { $unwind: "$virustotal.results"  },
    {
        $group : {
             _id : "$virustotal.results.vendor", 
             count : { $sum : 1 }
        }
    },
    { $sort : { count : -1 } }
])

I'm getting everything:

{
    "_id" : "Qihoo-360",
    "count" : 2
},
{
    "_id" : "MicroWorld-eScan",
    "count" : 2
},
{
    "_id" : "Bkav",
    "count" : 1
},
{
    "_id" : "CAT-QuickHeal",
    "count" : 1
}

How can I count 0 if the sig is null?

2 Answers 2

1

You need a conditional expression in your $sum operator that will check if the "$virustotal.results.sig" key is null by using the comparison operator $gt (as specified in the documentation's BSON comparsion order)

You can restructure your pipeline by adding this expression as follows:

db.analysis.aggregate([
    { "$unwind": "$virustotal.results" },
    {
        "$group" : {
            "_id": "$virustotal.results.vendor", 
            "count" : { 
                "$sum": {
                    "$cond": [
                        { "$gt": [ "$virustotal.results.sig", null ] },
                        1, 0
                    ]
                }
            }
        }
    },
    { "$sort" : { "count" : -1 } }
])

Sample Output

/* 1 */
{
    "_id" : "Qihoo-360",
    "count" : 2
}

/* 2 */
{
    "_id" : "MicroWorld-eScan",
    "count" : 1
}

/* 3 */
{
    "_id" : "Bkav",
    "count" : 0
}

/* 4 */
{
    "_id" : "CAT-QuickHeal",
    "count" : 0
}

/* 5 */
{
    "_id" : "nProtect",
    "count" : 0
}

/* 6 */
{
    "_id" : "ahnlab",
    "count" : 0
}
Sign up to request clarification or add additional context in comments.

5 Comments

I see. I was simply clicking over the up arrow "answer useful" but due I have low reputation, the changes is not shown :( Let me know if now is ok :)
@MarioArancioni That's fine now :) Thank you
Hello, didn't know if open another thread or continuing here. The code is working in mongo shell, now I'm translating in python: vt_list = results_db.analysis.aggregate( [ { "$unwind": "$virustotal.results" },{"$group": { "_id":"$virustotal.results.vendor","count": { "$sum": { "$cond": [ { "$gt": [ "$virustotal.results.sig","null" ] } ,1,0 ] } } } },{ "$sort": { "count" : -1 } } ] ) for vt in vt_list: report["states_count"][vt["_id"]] = vt["count"] The query is returning not ordered values and most important wrong count! (less) Do you have any idea?
Just change this line { "$gt": [ "$virustotal.results.sig", "null" ] } to { "$gt": [ "$virustotal.results.sig", None ] }
I would suggest create another question, tagging pymongo as well as showing your expected output with the aggregation pipeline.
0

I changed the null with None and the numbers increased but seems not correct yet. Basically doing the query in mongoshell I get like

{ "_id" : "Kaspersky", "count" : 176.0 }

from python: Kaspersky 64

one of these 2 is wrong :)

So I'm trying to investigate what part of the query in python is not correctly written compared to the mongo shell one. I did a simple query: In mongoshell: rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$ne": "null"} } }}) results: 176

db.analysis.count( { "virustotal.results" : { $elemMatch : { "vendor": "Kaspersky", "sig": {$gt: null} } }}) results: 0

Then I tried in python: rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$ne": "null"} } }}) results: 568

rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$ne": "None"} } }})

results: 568

rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$gt": "None"} } }})

results: 64

rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$gt": "null"} } }})

results: 6

hard to says what is the correct value! I suppose 176 but not able to reproduce in python...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.