0

What is the difference between the floowing regexes: HEAD|GET, (HEAD|POST) & [HEAD|POST]?

Basically, I want to extract the number after either HEAD or POST.

irb(main):001:0> "This is HEAD and a POST".match("HEAD|POST")
=> #<MatchData "HEAD">
irb(main):002:0> "This is HEAD and a POST".match("(HEAD|POST)")
=> #<MatchData "HEAD" 1:"HEAD">
irb(main):003:0> "This is HEAD and a POST".match("[HEAD|POST]")
=> #<MatchData "T">
irb(main):004:0> "This is HEAD 1 and a POST 2".match("[HEAD|POST] (.)")
=> #<MatchData "D 1" 1:"1">
irb(main):005:0>

The last regex didn't match the "2" that is after "POST". Why? Also, why is "D 1" being matched?

2 Answers 2

4

HEAD|POST and (HEAD|POST) match the same strings (either HEAD or POST); the second one captures the string while the first doesn't.

[HEAD|POST] matches a single character, any of ADEHOPST or |. So "This is HEAD and a POST".match("[HEAD|POST]") matches the single character T in This.

On the other hand, "This is HEAD 1 and a POST 2".match("[HEAD|POST] (.)") can't match the leading T because it isn't followed by a space - instead it matches the single D at the end of HEAD, plus the space and 1 following, capturing the 1.

Sign up to request clarification or add additional context in comments.

4 Comments

So, matching stops after the first pattern is found, right? Also, you said thatT isn't followed by space but isn't there a space between T 2?
That's right. A regex looks over the string from left to right, stopping when it's proved there's a match. And yes, there is a space in T 2, but the regex never gets there because it matches D 1 (from HEAD 1) first. If you want all matches, then seph's scan is good.
Ok. One more thing, instead of doing [HEAD|POST] (.), I could have also done [D|T] (.) and it would give me the same thing. How do I match the full words rather just matching single characters in a character class?
@yetanotherstacker - Replace the square brackets ("match any character") with round brackets ("group"). /(HEAD|POST) (.)/ will match "HEAD 1" (and stop), capturing "HEAD" into group 1 and "1" into group 2. If you don't want to capture, you can use the ?: extension: (?:HEAD|POST) (.) will match and just capture "1" into group 1.
1

try scan:

"This is HEAD 1 and a POST 2".scan /(HEAD|POST)\s(\d)/

=> [["HEAD", "1"], ["POST", "2"]]

2 Comments

Thanks for giving the answer but you didn't tell why I was getting "D 1" & also the differences? This would greatly help me in understanding them.
Chowlett did a great job of explaining what was going on with your code. This is basic regex involving captures and character classes. The beauty of scan is that a simple regex can sweep up a bunch of data and put it in an array: "if there is either a HEAD or POST followed by a whitespace character followed by a digit then give me HEAD or POST and the digit"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.