1

I am cleaning a dataset, and have a field gender. In this field, there are entries such as Male, male, and MALE. To resolve this, I am trying to update my MongoDB database using pymongo.

In the database, the Gender attribute is Gender (which a capital G at the front)

My code currently looks like this:

import pymongo
from pymongo import MongoClient


db_info = {
    'db_name': 'MentalHealth',
    'collection_name': 'MentalHealth',
}

if __name__ == "__main__":

    mongo_client = MongoClient()
    mongo_db = mongo_client[db_info['db_name']]
    mongo_collection = mongo_db[db_info['collection_name']]

    #normalize to lowercase
    mongo_collection.aggregate([{ '$project': { 'Gender':{ '$toLower':"$Gender"}}}])

The code runs without issue, but the database is not updating, and I am unsure what is the error with the code. Any help would be greatly appreciated. Thank you!!!

6
  • You are doing an aggregate which will return you all Gender fields cast to lower case. To update record use update Commented Dec 30, 2017 at 4:54
  • You are almost there. You have many options. See my answer on the possible duplicate Commented Dec 30, 2017 at 21:39
  • @sstyvane this is wrong duplicate. OP is not updating Gender field using value of another field but the same field. Commented Dec 31, 2017 at 5:03
  • Another field or same field, the process is still the same that is why we say "possible duplicate" I would have answered if it was not the case because none of the answers here is useful except this answer which mentioned the $out pipeline stage operator. @GarbageCollector Commented Dec 31, 2017 at 14:29
  • 1
    You are missing the point here. The answers are not only for the OP. I raised that issue once on meta with answers that teach bad practice see this comment but I guess you don't want to see what I am pointing out and that is your choice @GarbageCollector Commented Dec 31, 2017 at 14:48

3 Answers 3

5

Mongodb aggregation operations process data records and return computed results. It can't update any collection. you can update the same like this

db.mongo_collection.find({}).forEach(function(doc) {
    db.mongo_collection.update(
        { "_id": doc._id },
        { "$set": { "Gender": doc.Gender.toUpperCase() } }
    );
});
Sign up to request clarification or add additional context in comments.

2 Comments

This question is tagged pymongo however your solution is using js
Thank you for your help. As Garbage Collector said, this isn't in python, it is in js, but I appreciate your assistance!
2

You are using aggregate query which will return you the result with all Gender fields cast to lower case. If you wish to update the value for a field you have to use update query.

Since you are using pymongo to query your documents your code should be like this

import pymongo
from pymongo import MongoClient
from bson.objectid import ObjectId

db_info = {
    'db_name': 'MentalHealth',
    'collection_name': 'MentalHealth'
}

if __name__ == "__main__":

     mongo_client = MongoClient()
     mongo_db = mongo_client[db_info['db_name']]
     mongo_collection = mongo_db[db_info['collection_name']]

     for doc in mongo_collection.find(no_cursor_timeout=True):
            pk = ObjectId(str(doc.get("_id")))
            g = doc.get('Gender')
            if g:
               g = g.lower()
               mongo_collection.update({"_id": pk}, {"$set":{"Gender":g}}) 

1 Comment

My comment on the answer above applies to your answer as well
1

The aggregation framework you’re using only performs queries. To actually perform writes, you need to use a $out stage to dump the results into the collection. If you select an existing collection, that collection is replaced atomically as described in https://docs.mongodb.com/manual/reference/operator/aggregation/out/#pipe._S_out

Another option is to use an update operation to update just the documents with incorrect case.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.