0

I have 300,000 documents in this specific collection. Each document is considered as one taxi trip. Each document contains a TaxiStation number and a License number.

My goal is to figure out the number of trips per TaxiLicense per TaxiStation.
For example:
TaxiStation A License X had 5 trips.
TaxiStation A License Y had 9 trips. And so on.

How can I optimize my query? It is takes an upwards time of 30 minutes to complete!

List /*of*/ taxistationOfCollection, taxiLicenseOfTaxistation;
        //Here I get all the distinct TaxiStation numbers in the collection
        taxistationOfCollection = coll.distinct("TaxiStation");

        BasicDBObject query, tripquery;
        int tripcount;

        //Now I have to loop through each Taxi Station
        for(int i = 0; i<taxistationOfCollection.size(); i++)
        {
            query = new BasicDBObject("TaxiStation", taxistationOfCollection.get(i));
            //Here, I make a list of each distinct Taxi License in the current Taxi station
            taxiLicenseOfTaxistation = coll.distinct("TaxiLicense", query);

            //Now I make a loop to process each Taxi License within the current Taxi station
            for(int k = 0; k<taxiLicenseOfTaxistation.size();k++)
            {
                tripcount=0;
                if(taxiLicenseOfTaxistation.get(k) !=null)
                {
                    //I'm looking for each Taxi Station with this Taxi License
                    tripquery= new BasicDBObject("TaxiStation", taxistationOfCollection.get(i)).append("TaxiLicense", taxiLicenseOfTaxistation.get(k));
                    DBCursor cursor = coll.find(tripquery);

                    try {
                        while(cursor.hasNext()) {
                            //Increasing my counter everytime I find a match
                            tripcount++;
                            cursor.next();
                        } 
                    } finally {
                        //Finally printing the results
                        System.out.println("Station: " + taxistationOfCollection.get(i) + " License:" + taxiLicenseOfTaxistation.get(k)
                                + " Trips: " + tripcount);
                    }



                }
            }
        }

Sample Document :

{
  "_id" : ObjectId("53df46ed9b2ed78fb7ca4f23"),
  "Version" : "2",
  "Display" : [],
  "Generated" : "2014-08-04,16:40:05",
  "GetOff" : "2014-08-04,16:40:05",
  "GetOffCellInfo" : "46001,43027,11237298",
  "Undisplay" : [],
  "TaxiStation" : "0000",
  "GetOn" : "2014-08-04,16:40:03",
  "GetOnCellInfo" : "46001,43027,11237298",
  "TaxiLicense" : "000000",
  "TUID" : "26921876-3bd5-432e-a014-df0fb26c0e6c",
  "IMSI" : "460018571356892",
  "MCU" : "CM8001MA121225V1",
  "System_ID" : "000",
  "MeterGetOffTime" : "",
  "MeterGetOnTime" : "",
  "Setup" : [],
  "MeterSID" : "",
  "MeterWaitTime" : "",
  "OS" : "4.2",
  "PackageVersion" : "201407300888",
  "PublishVersion" : "201312060943",
  "SWVersion" : "rel_touchbox_20101010",
  "MeterMile" : 0,
  "MeterCharged" : 0,
  "GetOnLongitude" : 0,
  "GetOnLatitude" : 0,
  "GetOffLongitude" : 0,
  "TripLength" : 2,
  "GetOffLatitude" : 0,
  "Clicks" : 0,
  "updateTime" : "2014-08-04 16:40:10"
}
2
  • could you post a sample document? It's hard to puzzle that back together from the query Commented Nov 12, 2014 at 7:32
  • @Trudbert Sure! I know it is inefficient in regards to how I use find(tripquery) but I'm not really sure how to get around that. Commented Nov 12, 2014 at 7:37

1 Answer 1

2

Aggregation is probably what you are looking for. With an aggregation operation your whole code runs on the database and can be performed in a few lines. Performance should also be a lot better since the database handles everything that needs to be done an can take full advantage of indexes and other stuff.

From what you postet this boils down to a simple $group operation. In the shell this would look like:

db.taxistationOfCollection.aggregate([
                         {$group: 
                             { _id:
                                    {station: "$TaxiStation", 
                                    licence: "$TaxiLicense"},
                              count : {$sum : 1}
                          }
                        ])

This will give you documents of the form

{_id : {station: stationid, licence: licence_number}, count: number_of_documents}

For Java it would look like this:

 DBObject taxigroup = new BasicDBObject("$group",
                               new BasicDBObject("_id", 
                                   new BasicDBObject("station","$TaxiStation")
                                   .append("Licence","$TaxiLicense"))
                               .append("count", new BasicDBObject("$sum",1)));
AggregationOutput aggout = taxistationOfCollection.aggregate(
                                                      Arrays.asList(taxigroup));

Please note that the code snippets are not tested.

Sign up to request clarification or add additional context in comments.

6 Comments

For some reason, I'm getting an error on the last part. aggregate(List<DBOject>) is undefined for Type List
What driver version are you using? This is the relevant doc api.mongodb.org/java/current/com/mongodb/…
Could it be some naming conflict like here: stackoverflow.com/questions/9914873/… ?
The 2.2 in the doc is the database version not the driver version. driver v 2.12 shoud support this on database 2.2
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.