How to optimize query for Mongodb

Question

I have 300,000 documents in this specific collection. Each document is considered as one taxi trip. Each document contains a TaxiStation number and a License number.

My goal is to figure out the number of trips per TaxiLicense per TaxiStation.
For example:
TaxiStation A License X had 5 trips.
TaxiStation A License Y had 9 trips. And so on.

How can I optimize my query? It is takes an upwards time of 30 minutes to complete!

List /*of*/ taxistationOfCollection, taxiLicenseOfTaxistation;
        //Here I get all the distinct TaxiStation numbers in the collection
        taxistationOfCollection = coll.distinct("TaxiStation");

        BasicDBObject query, tripquery;
        int tripcount;

        //Now I have to loop through each Taxi Station
        for(int i = 0; i<taxistationOfCollection.size(); i++)
        {
            query = new BasicDBObject("TaxiStation", taxistationOfCollection.get(i));
            //Here, I make a list of each distinct Taxi License in the current Taxi station
            taxiLicenseOfTaxistation = coll.distinct("TaxiLicense", query);

            //Now I make a loop to process each Taxi License within the current Taxi station
            for(int k = 0; k<taxiLicenseOfTaxistation.size();k++)
            {
                tripcount=0;
                if(taxiLicenseOfTaxistation.get(k) !=null)
                {
                    //I'm looking for each Taxi Station with this Taxi License
                    tripquery= new BasicDBObject("TaxiStation", taxistationOfCollection.get(i)).append("TaxiLicense", taxiLicenseOfTaxistation.get(k));
                    DBCursor cursor = coll.find(tripquery);

                    try {
                        while(cursor.hasNext()) {
                            //Increasing my counter everytime I find a match
                            tripcount++;
                            cursor.next();
                        } 
                    } finally {
                        //Finally printing the results
                        System.out.println("Station: " + taxistationOfCollection.get(i) + " License:" + taxiLicenseOfTaxistation.get(k)
                                + " Trips: " + tripcount);
                    }



                }
            }
        }

Sample Document :

{
  "_id" : ObjectId("53df46ed9b2ed78fb7ca4f23"),
  "Version" : "2",
  "Display" : [],
  "Generated" : "2014-08-04,16:40:05",
  "GetOff" : "2014-08-04,16:40:05",
  "GetOffCellInfo" : "46001,43027,11237298",
  "Undisplay" : [],
  "TaxiStation" : "0000",
  "GetOn" : "2014-08-04,16:40:03",
  "GetOnCellInfo" : "46001,43027,11237298",
  "TaxiLicense" : "000000",
  "TUID" : "26921876-3bd5-432e-a014-df0fb26c0e6c",
  "IMSI" : "460018571356892",
  "MCU" : "CM8001MA121225V1",
  "System_ID" : "000",
  "MeterGetOffTime" : "",
  "MeterGetOnTime" : "",
  "Setup" : [],
  "MeterSID" : "",
  "MeterWaitTime" : "",
  "OS" : "4.2",
  "PackageVersion" : "201407300888",
  "PublishVersion" : "201312060943",
  "SWVersion" : "rel_touchbox_20101010",
  "MeterMile" : 0,
  "MeterCharged" : 0,
  "GetOnLongitude" : 0,
  "GetOnLatitude" : 0,
  "GetOffLongitude" : 0,
  "TripLength" : 2,
  "GetOffLatitude" : 0,
  "Clicks" : 0,
  "updateTime" : "2014-08-04 16:40:10"
}

could you post a sample document? It's hard to puzzle that back together from the query — Trudbert
– Trudbert, Commented Nov 12, 2014 at 7:32
@Trudbert Sure! I know it is inefficient in regards to how I use find(tripquery) but I'm not really sure how to get around that. — krikara
– krikara, Commented Nov 12, 2014 at 7:37

Trudbert · Accepted Answer · 2014-11-12 08:39:39Z

2

Aggregation is probably what you are looking for. With an aggregation operation your whole code runs on the database and can be performed in a few lines. Performance should also be a lot better since the database handles everything that needs to be done an can take full advantage of indexes and other stuff.

From what you postet this boils down to a simple $group operation. In the shell this would look like:

db.taxistationOfCollection.aggregate([
                         {$group: 
                             { _id:
                                    {station: "$TaxiStation", 
                                    licence: "$TaxiLicense"},
                              count : {$sum : 1}
                          }
                        ])

This will give you documents of the form

{_id : {station: stationid, licence: licence_number}, count: number_of_documents}

For Java it would look like this:

 DBObject taxigroup = new BasicDBObject("$group",
                               new BasicDBObject("_id", 
                                   new BasicDBObject("station","$TaxiStation")
                                   .append("Licence","$TaxiLicense"))
                               .append("count", new BasicDBObject("$sum",1)));
AggregationOutput aggout = taxistationOfCollection.aggregate(
                                                      Arrays.asList(taxigroup));

Please note that the code snippets are not tested.

edited Nov 12, 2014 at 8:39

answered Nov 12, 2014 at 7:45

Trudbert

3,19817 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

krikara Over a year ago

For some reason, I'm getting an error on the last part. aggregate(List<DBOject>) is undefined for Type List

Trudbert Over a year ago

What driver version are you using? This is the relevant doc api.mongodb.org/java/current/com/mongodb/…

Trudbert Over a year ago

Could it be some naming conflict like here: stackoverflow.com/questions/9914873/… ?

Trudbert Over a year ago

The 2.2 in the doc is the database version not the driver version. driver v 2.12 shoud support this on database 2.2

krikara Over a year ago

I have the 2.12 driver from here central.maven.org/maven2/org/mongodb/mongo-java-driver.

|

Collectives™ on Stack Overflow

How to optimize query for Mongodb

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related