0

I have a string such as:

s = "The code for the product is A8H4DKE3SP93W6J and you can buy it here."

The text in this string will not always be in the same format, it will be dynamic, so I can't do a simple find and replace to obtain the product code.

I can see that:

re.sub(r'A[0-9a-zA-Z_]{14} ', '', s)

will get ride of the product code. How do I go about doing the opposite of this, i.e. deleting all of the text, apart from the product code? The product code will always be a 15 character string, starting with the letter A.

I have been racking my brain and Googling to find a solution, but can't seem to figure it out.

Thanks

3
  • 1
    re.findall Commented Feb 1, 2017 at 19:07
  • 2
    Just extract what you want to keep and discard the rest of the string. Commented Feb 1, 2017 at 19:08
  • Possible duplicate of Python regex findall Commented Feb 1, 2017 at 19:16

2 Answers 2

1

Instead of substituting the rest of the string, use re.search() to search for the product number:

In [1]: import re

In [2]: s = "The code for the product is A8H4DKE3SP93W6J and you can buy it here."

In [3]: re.search(r"A[0-9a-zA-Z_]{14}", s).group()
Out[3]: 'A8H4DKE3SP93W6J'
Sign up to request clarification or add additional context in comments.

Comments

0

In regex, you can match on the portion you want to keep for substituting by using braces around the pattern and then referring to it in the sub-pattern with backslash followed by the index for that matching portion. In the code below, "(A[0-9A-Za-z_]{14})" is the portion you want to match, and you can substitute in the resulting string using "\1".

re.sub(r'.*(A[0-9A-Za-z_]{14}).*', r'\1', s)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.