1

I want to get a repetitive string in this example with regex (python):

#txt1#txt2#txt3#txt4

I tested with this pattern:

\#(.*?)

but don't work Thank you

6
  • 1
    what is the expected output ? Commented Aug 14, 2017 at 10:16
  • txt1,txt2,txt3,txt4 Commented Aug 14, 2017 at 10:18
  • 1
    don't feel you need a regex here: '#txt1#txt2#txt3#txt4'.strip('#').split('#') Commented Aug 14, 2017 at 10:19
  • @Chris_Rands: Perhaps, there can be cases like text0#text1#text2 and text0 is not an expected value in the result. Or is a part of a larger regex pattern. Commented Aug 14, 2017 at 10:31
  • please post an example of code. Commented Aug 14, 2017 at 10:32

1 Answer 1

1

A lazy dot pattern .*? at the end of a pattern always matches an empty string because .*? matches as few as possible occurrences of the quantified pattern, and since it can match 0 chars, it matches 0 chars.

Single char scenario

For strings having # delimited values, you may use a negated character class [^#] with a * quantifier:

import re
s = '#txt1#txt2#txt3#txt4'
print(re.findall(r"#([^#]*)", s))
# => ['txt1', 'txt2', 'txt3', 'txt4']

See the Python demo.

The #([^#]*) pattern matches a # and then matches and captures into Group 1 any 0+ characters other than #. re.findall finds all non-overlapping occurrences of the pattern and only returns the values captured into Group 1.

NOTE: To make sure you do not get empty values in the result, you should replace the * quantifier with a + one that matches 1 or more occurrences.

Multi-char delimiters

In this case, you should choose a splitting approach. In case you have just a hard-coded delimiter, like #|, all you need is str.split():

s = '#|txt1#|txt2#|txt3#|txt4'
res = filter(None, s.split('#|'))
print(res)
# => ['txt1', 'txt2', 'txt3', 'txt4']

See another Python demo. Note that filter(None, res) will remove all empty strings from the res.

If you have a delimiter that is not hard-coded, you may use a re.split.

Sign up to request clarification or add additional context in comments.

7 Comments

Probably shorter: '[^#]+'
Yes, that is what I have been adding. But that is not actually clear, whether OP wants to match those empty strings or not.
@RohanAmrute For this exact case, yes. In the OP, there is only an input string, and a pattern that seems to be designed to match any text after #. It seems reasonable to use a negated character class in this scenario. There are other possible ways to match it, splitting with # will also work here, but I am focusing on why the original regex does not work and explain what the regex way is to solve it.
if the text is like this '#|txt1#|txt2#|txt3#|txt4' and txt1,txt2... can contain # or | i can't use negative character , (thankyou)
@RguezYasser: And what is the rule then? How do you explain the pattern you need in words? Do you want to get substrings between #| symbols?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.