This is a continuation of this question.
I'm using the following code to find all documents from collection C_a whose text contains the word StackOverflow and store them in another collection called C_b:
import pymongo
from pymongo import MongoClient
client = MongoClient('127.0.0.1') # mongodb running locally
dbRead = client['C_a'] # using the test database in mongo
# create the pipeline required
pipeline = [{"$match": {"$text": {"$search":"StackOverflow"}}},{"$out":"C_b"}] # all attribute and operator need to quoted in pymongo
dbRead.C_a.aggregate(pipeline) #execution
print (dbRead.C_b.count()) ## verify count of the new collection
This works great, however, if I run the same snippet for multiple keywords the results get overwritten. For example I want the collection C_b to contain all documents that contain the keywords StackOverflow, StackExchange, and Programming. To do so I simply iterate the snippet using the above keywords. But unfortunately, each iteration overwrites the previous.
Question: How do I update the output collection instead of overwriting it?
Plus: Is there a clever way to avoid duplicates, or do I have to check for duplicates afterwards?
$out. B. Iterate results on a returned cursor and write updates back. Where of course B means transferring results and updates back "over the wire" which seems like what you are exactly trying to avoid. You should have paid attention to the very clear lesson.\