2

I struggle with separating a given string foobar123 between a word and a digit of unknown length with an underscore (Result: foobar_123). I've tried to use regex to find the match r1 (works). But after this, I have no idea, how to separate the corresponding match.

import re
x = "foobar123"
y = re.sub("[a-z]{1}\d{1}", "\1", x)
print(y) # Output: "fooba23"

I think it should be done with "\1" to access the previous match. So I've tried to replace the found match with itself, but this results in: fooba23. Shouldn't it be foobar123.

Thanks in advance.

UPDATE:

Sorry for the typo in the code above, it should be [a-z] not [0-9].

4 Answers 4

6

This could do the trick using a capture group of your digits?

import re
x = "foobar123"
y = re.sub(r'(\d+)', r'_\1', x)
print(y)

I escaped the backslashes using raw string. Something your forgot to do in yours =)


Funny alternative without a capturing group is to use count parameter of re.sub:

import re
x = "foobar123"
y = re.sub(r'(?=\d)', '_', x, 1)
print(y)

The pattern (?=\d) returns all positions followed by a digit but only the first (hence the 1 for count) gets replaced by an underscore.

Sign up to request clarification or add additional context in comments.

Comments

2

You could capture the last letter followed by a digit and append an underscore:

re.sub(r'([a-z])(?=\d)', r'\1_', x)
# 'foobar_123'

Comments

2

You are matching 2 digits using [0-9]{1}\d{1} where the {1} is not needed and a char a-z before the digits is not taken into account.

You could do the replacement without a capturing group using the match only \g<0> followed by an underscore.

The pattern will match a char [a-z] and uses a positive lookahead (?=\d) to assert what is on the right is a digit.

import re
x = "foobar123"
y = re.sub("[a-z](?=\d)", "\g<0>_", x)
print(y) # Output: "foobar_123"

Comments

1

You may

  • captured the letter in one part, and the digits in the other one : ([a-z]+)([0-9]+)
  • replace with the group of letter, underscore, the digits : \1_\2

I've add the re.I for ignorecase

x = "Foobar123"
y = re.sub("([a-z]+)([0-9]+)", r"\1_\2", x, flags=re.I)
print(y)  # Foobar_123

3 Comments

You can simply replace the digits by an underscore followed by the digits, there's no need to capture the first part.
@ThierryLathuille We can capture letter, digits or both, here there 3 answers, each one is represented ;)
A common issue here, use re.sub("([a-z]+)([0-9]+)", r"\1_\2", x, flags=re.I)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.