Trouble matching multiple patterns in Ruby (regex)

Question

What is the difference between the floowing regexes: HEAD|GET, (HEAD|POST) & [HEAD|POST]?

Basically, I want to extract the number after either HEAD or POST.

irb(main):001:0> "This is HEAD and a POST".match("HEAD|POST")
=> #<MatchData "HEAD">
irb(main):002:0> "This is HEAD and a POST".match("(HEAD|POST)")
=> #<MatchData "HEAD" 1:"HEAD">
irb(main):003:0> "This is HEAD and a POST".match("[HEAD|POST]")
=> #<MatchData "T">
irb(main):004:0> "This is HEAD 1 and a POST 2".match("[HEAD|POST] (.)")
=> #<MatchData "D 1" 1:"1">
irb(main):005:0>

The last regex didn't match the "2" that is after "POST". Why? Also, why is "D 1" being matched?

Chowlett · Accepted Answer · 2012-07-11 13:19:10Z

4

HEAD|POST and (HEAD|POST) match the same strings (either HEAD or POST); the second one captures the string while the first doesn't.

[HEAD|POST] matches a single character, any of ADEHOPST or |. So "This is HEAD and a POST".match("[HEAD|POST]") matches the single character T in This.

On the other hand, "This is HEAD 1 and a POST 2".match("[HEAD|POST] (.)") can't match the leading T because it isn't followed by a space - instead it matches the single D at the end of HEAD, plus the space and 1 following, capturing the 1.

answered Jul 11, 2012 at 13:19

Chowlett

47k21 gold badges119 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

yetanotherstacker Over a year ago

So, matching stops after the first pattern is found, right? Also, you said thatT isn't followed by space but isn't there a space between T 2?

Chowlett Over a year ago

That's right. A regex looks over the string from left to right, stopping when it's proved there's a match. And yes, there is a space in T 2, but the regex never gets there because it matches D 1 (from HEAD 1) first. If you want all matches, then seph's scan is good.

yetanotherstacker Over a year ago

Ok. One more thing, instead of doing [HEAD|POST] (.), I could have also done [D|T] (.) and it would give me the same thing. How do I match the full words rather just matching single characters in a character class?

Chowlett Over a year ago

@yetanotherstacker - Replace the square brackets ("match any character") with round brackets ("group"). /(HEAD|POST) (.)/ will match "HEAD 1" (and stop), capturing "HEAD" into group 1 and "1" into group 2. If you don't want to capture, you can use the ?: extension: (?:HEAD|POST) (.) will match and just capture "1" into group 1.

seph · Accepted Answer · 2012-07-11 13:16:14Z

1

try scan:

"This is HEAD 1 and a POST 2".scan /(HEAD|POST)\s(\d)/

=> [["HEAD", "1"], ["POST", "2"]]

answered Jul 11, 2012 at 13:16

seph

6,0963 gold badges23 silver badges19 bronze badges

2 Comments

yetanotherstacker Over a year ago

Thanks for giving the answer but you didn't tell why I was getting "D 1" & also the differences? This would greatly help me in understanding them.

seph Over a year ago

Chowlett did a great job of explaining what was going on with your code. This is basic regex involving captures and character classes. The beauty of scan is that a simple regex can sweep up a bunch of data and put it in an array: "if there is either a HEAD or POST followed by a whitespace character followed by a digit then give me HEAD or POST and the digit"

Collectives™ on Stack Overflow

Trouble matching multiple patterns in Ruby (regex)

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related