1

I have the following string:

txt='agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'

This is the delimiter:

delimiters = " \t,;.?!-:@[](){}_*/"

As output, I want this list of values:

"agadsfa","2asdf","sdfsaf","asfsadf","adsf","klnalfk","jn234kmafs","adfs","nlnawr23"

I tried using regex:

re.split(delimiters,txt)

But I'm getting this error:

re.error: unterminated character set at position 10

What is wrong here?

1
  • 1
    Is your actual input you want to capture always letters/numbers A-Z a-z 0-9? Commented Feb 1, 2019 at 11:42

4 Answers 4

2

Your regular expression is incorrect. And from the comments, you've added the requirement that the delimiters string is not to be touched.

What we need to do then, is to process the input string and convert it into a proper regex that can be used by split(). Here's how:

# need to enclose regex in [], we want to split on any of
# the chars; also some of the chars need to be escaped    
delimiters = ' \t,;.?!-:@[](){}_*/'
regex = delimiters.replace(']', '\]').replace('-', '\-')
regex = r'[{}]+'.format(regex)

The result is as expected:

txt = 'agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'
re.split(regex, txt)
=> ['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Oscar, but i can't touch the delimiter string, that is determined by client, i have to give them one function where the function will separate the input string with the provided delimiter.
@avinashse you have no option, I'm afraid - the delimiter string as provided is simply not a valid regular expression for using it in split(). What you could do is take the delimiter string, escape the special characters and enclose the whole thing inside [...]+.
0

Python 3 code

import re

txt="agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23"

delimiters = "_|;|,|\)|\(|\[|\]"

list(filter(None, re.split(delimiters, txt)))

Output

['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']

Separate your symbols by | and use pythons list filter function to avoid empty strings

1 Comment

No need to use | here, that's what character classes [ ] are for. And a properly built regex makes unnecessary the filter part. See my answer for a more concise solution.
0

You have to split your delimiters using |:

delimiters = r' |\t|,|;|\.|\?|!|-|:|@|\[|\]|\(|\)|\{|\}|_|\*|/'
# then use this to eliminate empty strings if you have two delimiters next to each other
print([w for w in re.split(delimiters,txt) if w])   
# or list(filter(lambda a: a, re.split(delimiters,txt)))

result is:

['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']

1 Comment

No need to use | here, that's what character classes [] are for. See my answer for a more concise solution.
0

try this:

import re

txt = "agadsfa_(2asdf_sdfsaf)asfs?adf[adsf_klna!lfk;jn234kmafs)adfs, nlnawr*23"

line = re.sub(
           r"[ \t,;\.?!\-:@\[\](){}_*/]+", 
           r",", 
           txt
       )

print(line.split(","))

2 Comments

The excessive backslashing looks like you don't really know which characters are special within a character class.
only the special chars inside char class need escaped

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.