3

I have to remove any punctuation marks from the start and at the end of the word. I am using re.sub to do it.

re.sub(r'(\w.+)(?=[^\w]$)','\1',text)

Grouping not working out - all I get is ☺. for Mihir4. in command line

3
  • 2
    please give a better example of the input and the expected output. can there be multiple punctuation characters at the start/end of a word? can the string consist of more than one word? Commented Jul 13, 2014 at 15:12
  • For multiple character take this regex pattern [\w+\-]+ which will give me words only separated by a -. But my question is how to replace a string with any pattern . Commented Jul 13, 2014 at 15:16
  • Alternatively, you could also use str.strip (as long as text is just that one word) Commented Jul 13, 2014 at 15:19

2 Answers 2

1

If you have string with multiple words, such as

text = ".adfdf. 'df' !3423? ld! :sdsd"

this will do the trick (it will also work for single words, of course):

>>> re.sub(r'[^\w\s]*(\w+)[^\w\s]*', r'\1', text)
'adfdf df 3423 ld sdsd'

Notice the r in r'\1'. This is equivalent to '\\1'.

>>> re.sub(r'[^\w\s]*(\w+)[^\w\s]*', '\\1', text)
'adfdf df 3423 ld sdsd'

Further reading: the backslash plague

Sign up to request clarification or add additional context in comments.

Comments

1

The string literal '\1' is equivalent to '\x01'. You need to escape it or use raw string literal to mean backreference group 1.

BTW, you don't need to use the capturing group.

>>> re.sub(r'^[^-\w]+|[^-\w]$', '', 'Mihir4.')
'Mihir4'

2 Comments

Mihir4. is not static. Words are coming from a list.
@mjosh, You can replace 'Mihir4.' with text as you did in the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.