Seperate word and digits with regex

Question

I struggle with separating a given string foobar123 between a word and a digit of unknown length with an underscore (Result: foobar_123). I've tried to use regex to find the match r1 (works). But after this, I have no idea, how to separate the corresponding match.

import re
x = "foobar123"
y = re.sub("[a-z]{1}\d{1}", "\1", x)
print(y) # Output: "fooba23"

I think it should be done with "\1" to access the previous match. So I've tried to replace the found match with itself, but this results in: fooba23. Shouldn't it be foobar123.

Thanks in advance.

UPDATE:

Sorry for the typo in the code above, it should be [a-z] not [0-9].

JvdV · Accepted Answer · 2020-04-03 11:27:18Z

6

This could do the trick using a capture group of your digits?

import re
x = "foobar123"
y = re.sub(r'(\d+)', r'_\1', x)
print(y)

I escaped the backslashes using raw string. Something your forgot to do in yours =)

Funny alternative without a capturing group is to use count parameter of re.sub:

import re
x = "foobar123"
y = re.sub(r'(?=\d)', '_', x, 1)
print(y)

The pattern (?=\d) returns all positions followed by a digit but only the first (hence the 1 for count) gets replaced by an underscore.

edited Apr 3, 2020 at 11:27

answered Apr 3, 2020 at 10:11

JvdV

76.8k8 gold badges48 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

yatu · Accepted Answer · 2020-04-03 10:11:28Z

2

You could capture the last letter followed by a digit and append an underscore:

re.sub(r'([a-z])(?=\d)', r'\1_', x)
# 'foobar_123'

answered Apr 3, 2020 at 10:11

yatu

88.6k12 gold badges93 silver badges148 bronze badges

Comments

The fourth bird · Accepted Answer · 2020-04-03 10:35:09Z

2

You are matching 2 digits using [0-9]{1}\d{1} where the {1} is not needed and a char a-z before the digits is not taken into account.

You could do the replacement without a capturing group using the match only \g<0> followed by an underscore.

The pattern will match a char [a-z] and uses a positive lookahead (?=\d) to assert what is on the right is a digit.

import re
x = "foobar123"
y = re.sub("[a-z](?=\d)", "\g<0>_", x)
print(y) # Output: "foobar_123"

edited Apr 3, 2020 at 10:35

answered Apr 3, 2020 at 10:25

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Comments

azro · Accepted Answer · 2020-04-03 11:17:19Z

1

You may

captured the letter in one part, and the digits in the other one : ([a-z]+)([0-9]+)
replace with the group of letter, underscore, the digits : \1_\2

I've add the re.I for ignorecase

x = "Foobar123"
y = re.sub("([a-z]+)([0-9]+)", r"\1_\2", x, flags=re.I)
print(y)  # Foobar_123

edited Apr 3, 2020 at 11:17

answered Apr 3, 2020 at 10:09

azro

54.2k9 gold badges38 silver badges75 bronze badges

3 Comments

Thierry Lathuille Over a year ago

You can simply replace the digits by an underscore followed by the digits, there's no need to capture the first part.

azro Over a year ago

@ThierryLathuille We can capture letter, digits or both, here there 3 answers, each one is represented ;)

Wiktor Stribiżew Over a year ago

A common issue here, use re.sub("([a-z]+)([0-9]+)", r"\1_\2", x, flags=re.I)

Collectives™ on Stack Overflow

Seperate word and digits with regex

4 Answers 4

Comments

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related