2

I have a string:

str = "John: hey, what's your name?.. :haha \n Stella: :foo :xx: my name is ... stella :xx:"

I want to replace all smilies in the list ary = [":haha", ":xx:", ":foo", ":bar"] and special characters (except space) with (.*) so that it becomes like this:

John(.*) hey(.*) what(.*)s your name(.*) Stella(.*) my name is (.*) stella (.*)

enter image description here

I tried this:

str.gsub(Regexp.new("^#{ary.join('|')}$")) { |w| "(.*)" }.gsub( /[\W ]+/, "(.*)")
# => "John(.*)hey(.*)what(.*)s(.*)your(.*)name(.*)haha(.*)Stella(.*)my(.*)name(.*)is(.*)stella(.*)"

Problem:

  • Space still replace
3
  • ` ` is being replaced because you are providing it in the pattern: [\W ]. Change it to [^\w\s]+ Commented Apr 18, 2015 at 21:55
  • @CarySwoveland make sure you can see that in my question, I have include with text too, not only an image. Commented Apr 19, 2015 at 4:21
  • My apologies. I missed that. Commented Apr 19, 2015 at 4:30

2 Answers 2

1

I have tried creating a more generic approach, but finally came up with a 3-step approach. Since it seems impossible to filter out multiple consecutive (.*), I am adding a post-process with the 3rd gsub:

str = "John: hey, what's your name?.. :haha \n Stella: :foo :xx: my name is ... stella :xx:"
ary = [":haha", ":xx:", ":foo", ":bar"]
print str.gsub(Regexp.new("#{ary.join('|')}")) { |w| "(.*)" }.gsub( /(?>\(\.\*\)|[^\w ]+)/, "(.*)").gsub(/\(\.\*\)(?>\s*\(\.\*\))*/,"(.*)")

Output of a sample program:

John(.*) hey(.*) what(.*)s your name(.*) Stella(.*) my name is (.*) stella (.*)
Sign up to request clarification or add additional context in comments.

2 Comments

@stribizhev : Also use Regexp.escape, if some word in array has special character, str.gsub(Regexp.new(ary.map { |x| Regexp.escape(x) }.join('|'))) { |w| "(.*)" }.gsub( /(?>\(\.\*\)|[^\w ]+)/, "(.*)").gsub(/\(\.\*\)(?>\s*\(\.\*\))*/,"(.*)")
@anonymousxxx: a very good hint, since we do not know what the actual smilie list looks like. I assumed they all look like a colon plus letters.
0

You can do it like so:

s = "John: hey, what's your name?.. :haha \n Stella: :foo :xx: my name is ... stella :xx:"

r = /\?\.\. :haha \n|: :foo :xx:|\.\.\.|:xx:|[^\w ]/

s.gsub(r,'(.*)')
  #=> "John(.*) hey(.*) what(.*)s your name(.*) Stella(.*) my name is (.*) stella (.*)" 

The only tricky bit concerns the order of the 'or' elements in the regex . In particular, : cannot be replaced before three other strings are replaced.

2 Comments

And now add .. to the input string, what will you get? Correct, it won't get converted to (.*). You also do not take into account the list of known smilies: they might appear in different contexts.
@stribizhev, you're right. I confess that I didn't read the entire question (I guess I was mesmerized by the pretty pictures). I made some changes to address the points your made. I assume that if .. were added it should be replaced by (.*)(.*), but if it (or any number of consecutive periods) should just be just (.*) that's an easy change.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.