0

I want to extract the string before the 9 digit number below:

tmp = place1_128017000_gw_cl_mask.tif

The output should be place1

I could do this: tmp.split('_')[0] but I also want the solution to work for:

tmp = place1_place2_128017000_gw_cl_mask.tif where the result would be: place1_place2

You can assume that the number will also be 9 digits long

2
  • 3
    Try Python regular rexpressions: m = re.search("\d{9}", tmp); print(m.group()) Commented Jul 13, 2022 at 3:46
  • 3
    Have you looked at regular expressions at all? Check out re.search(), a regular expression for what you need would look like '.+(?=_\d{9}_)', that is: one or more characters before an underscore before 9 digits (followed by an underscore); check sites like regex101.com to design and test regular expressions. Commented Jul 13, 2022 at 3:46

4 Answers 4

3

Using regular expressions and the lookahead feature of regex, this is a simple solution:

tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'.+(?=_\d{9}_)', tmp)
print(m.group())

Result:

place1_place2

Note that the \d{9} bit matches exactly 9 digits. And the bit of the regex that is in (?= ... ) is a lookahead, which means it is not part of the actual match, but it only matches if that follows the match.

Sign up to request clarification or add additional context in comments.

Comments

3

Assuming we can phrase your problem as wanting the substring up to, but not including the underscore which is followed by all numbers, we can try:

tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'^([^_]+(?:_[^_]+)*)_\d+_', tmp)
print(m.group(1))  # place1_place2

Comments

1

Use a regular expression:

import re

places = (
    "place1_128017000_gw_cl_mask.tif",
    "place1_place2_128017000_gw_cl_mask.tif",
)
pattern = re.compile("(place\d+(?:_place\d+)*)_\d{9}")
for p in places:
    matched = pattern.match(p)
    if matched:
        print(matched.group(1))

prints:

place1

place1_place2

The regex works like this (adjust as needed, e.g., for less than 9 digits or a variable number of digits):

  • ( starts a capture
  • place\d+ matches "places plus 1 to many digits"
  • (?: starts a group, but does not capture it (no need to capture)
  • _place\d+ matches more "places"
  • ) closes the group
  • * means zero or many times the previous group
  • ) closes the capture
  • \d{9} matches 9 digits

The result is in the first (and only) capture group.

Comments

1

Here's a possible solution without regex (unoptimized!):

def extract(s):
    result = ''
    for x in s.split('_'):
        try: x = int(x)
        except: pass
        if isinstance(x, int) and len(str(x)) == 9:
            return result[:-1]
        else:
            result += x + '_'

tmp = 'place1_128017000_gw_cl_mask.tif'
tmp2 = 'place1_place2_128017000_gw_cl_mask.tif'

print(extract(tmp))   # place1
print(extract(tmp2))  # place1_place2 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.