3

I am trying to write a search script in MongoDB but can't figure out how to do it....The thing I wan't to do is as follows....

Lets I have a string array XD = {"the","new","world"}

Now i want to search string array XD in MongoDB document (using regex) and get the result document. For example..

{ _id: 1, _content: "there was a boy" }
{ _id: 2, _content: "there was a boy in a new world" }
{ _id: 3, _content: "a boy" }
{ _id: 4, _content: "there was a boy in world" }

now I want to get result in accordance to _content contains the string in string array XD

{ _id: 2, _content: "there was a boy in a new world", _times: 3 }
{ _id: 4, _content: "there was a boy in world", times: 2 }
{ _id: 1, _content: "there was a boy", times: 1 }

as first document (_id : 2 ) contains all three { "the" in there, "new" as new, "world" as world } so it got 3

second document (_id: 4) only two { "world" as world } so it got 2

4
  • Have you looked at text searching? Typically words like "the" are ignored (and arguably should be) but "new" and "world" would match and rank just like you are expecting already. Commented Feb 9, 2016 at 5:43
  • its just an example......word can be anything.....and here i am trying to use regex ( like if i want to search "exam" than document containing "example" or "exammed" should return....here i am talking about aggregate function which could return these type of document Commented Feb 9, 2016 at 5:51
  • 1
    Just suggesting that maybe elasticsearch would be more suitable for your needs. Related question Commented Feb 9, 2016 at 6:34
  • thanks.....but here i also wanted to know that which document contains how many of string in string array, so that i can ranked them. Commented Feb 9, 2016 at 6:41

1 Answer 1

1

Here is what you can do.

Create a Regex to be matched with _content

XD = ["the","new","world"];
regex = new RegExp(XD.join("|"), "g");

Store a JS function on the server, which matches the _content with XD and returns the counts matched

db.system.js.save(
   {
     _id: "findMatchCount",
     value : function(str, regexStr) {
        XD = ["the","new","world"];
        var matches = str.match(regexStr);
        return (matches !== null) ? matches.length : 0;
     }
   }
)

Use the function with mapReduce

db.test.mapReduce(
    function(regex) {
       emit(this._id, findMatchCount(this._content, regex));
    },
    function(key,values) {
        return values;
    },
    { "out": { "inline": 0 } }
);

This will produce the output as below:

{
    "results" : [
        {
            "_id" : 1,
            "value" : 1
        },
        {
            "_id" : 2,
            "value" : 1
        },
        {
            "_id" : 3,
            "value" : 1
        },
        {
            "_id" : 4,
            "value" : 1
        }
    ],
    "timeMillis" : 1,
    "counts" : {
        "input" : 4,
        "emit" : 4,
        "reduce" : 0,
        "output" : 4
    },
    "ok" : 1
}

I am not sure how efficient this solution is but it works.

Hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.