1

I am trying to create a regex that allows me to find instances of a string where I have an unspaced / eg:

some characters/morecharacters

I have come up with the expression below which allows me to find word characters or closing parenthesis before my / and word characters or open parenthesis characters afterwards.

(\w|\))/(\(|\w)

This works great for most situations, however I am coming unstuck when I have a / enclosed in quotes. In this case I'd like it to be ignored. I have seen a few different posts here and here. However, I can't quite get them to work in my situation.

What I'd like is for first three cases identified below to match and the last cast to be ignored allowing me to extract item 1 and item 3.

some text/more text
(formula)/dividethis
divideme/(byme)
"dont match/me"
2
  • What happens with a string like match/me "but not/me"? Commented Nov 14, 2016 at 3:48
  • my expectation is that the first instance would match, but not the second. Commented Nov 14, 2016 at 3:50

2 Answers 2

6

It ain't pretty, but this will do what you want:

(?<!")(?:\(|\b)[^"\n]+\/[^"\n]+(?:\)|\b)(?!")

Demo on Regex101

Let's break it down a bit:

  • (?<!")(?:\(|\b) will match either an open bracket or a word boundary, as long as it's not preceded by a quotation mark. It does this by employing a negative lookbehind.
  • [^"\n]+ will match one or more characters, as long as they're neither a quotation mark or a line break (\n).
  • \/ will match a literal slash character.
  • Finally, (?:\)|\b)(?!") will match either a closing bracket or a word boundary as long as it's not followed by a quotation mark. It does this by employing a negative lookahead. Note that the (?:\)|\b) will only work 100% correctly in this order - if you reverse them, it'll drop the match on the bracket, because it encounters a word boundary before it gets to the bracket.
Sign up to request clarification or add additional context in comments.

2 Comments

That is great and certainly identifies the full string, is it possible to split out the first part (i.e. before /) and the second part (i.e after/)?
Actually figured it out ((?<!\")(?:\(|\b)[^\"\n]+)/([^\"\n]+(?:\)|\b)(?!\")) This is based on python approach to escaping rather than PHP
0

This will only match word/word which is not inside quotation marks.

import re

text = """
some text/more text "dont match/me" divideme/(byme)
(formula)/dividethis
divideme/(byme) "dont match/me hel d/b lo a/b" divideme/(byme)
"dont match/me"
"""

groups=re.findall("(?:\".*?\")|(\S+/\S+)", text, flags=re.MULTILINE)
print filter(None,groups)

Output:

['text/more', 'divideme/(byme)', '(formula)/dividethis', 'divideme/(byme)', 'divideme/(byme)']
  • (?:\".*?\") This will match everything inside quotes but this group won't be captured.
  • (\S+/\S+) This will match word/word only outside the quotations and this group will be captured.

Demo on Regex101

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.