3

I am looking for a regex that will extract everything up to the first . (period) in a string, and everything including and after the last . (period)

For example:

my_file.10.4.5.6.csv
myfile2.56.3.9.txt

Ideally the regex when run against these strings would return:

my_file.csv 
myfile2.txt

The numeric stamp in the file will be different each time the script is run, so I am looking essentially to exclude it.

The following prints out the string up to the first . (period)

print re.search("^[^.]*", data_file).group(0)

I am having trouble though getting it to also return the the last period and string after it.

Sorry just to update this based upon feedback and comments below:

This does need to be a regex. The regex will be passed into the program from a configuration file. The user will not have access to the source code as it will be packaged. The user may need to change the regex based upon some arbitrary criteria, so they will need to update the config file, rather than edit the application and re-build the package.

Thanks

5
  • With regards the comments below saying I don't need a regex. Yes you are correct in that I could use the split method. However I would like to do this via a regex as the regex is passed into the program from a config file. It could change in future based upon the file naming convention changing Commented Dec 16, 2013 at 21:53
  • Regex are slow and unpythonic (a huge string of characters surrounded in slashes only discernible by intense introspection is pretty much precisely the antithesis of Python), so only use them when you need them. Python has a TON of string functions that can replace regex. Commented Dec 16, 2013 at 21:54
  • adsmith - Yeah I hear where you are coming from here, unfortunately for ease of use, the criteria for matching the file needs to be stored in a config file and pulled into the program. Commented Dec 16, 2013 at 21:59
  • 2
    You could specify a callable in the config somehow; then users would be able to specify any arbitrary rule they want. I can appreciate how that's not as simple, though for anything beyond rudimentary regex, the difference in complexity is debatable. You'd also have the added benefit of allowing users to specify custom name collision behavior then as well. Commented Dec 16, 2013 at 22:03
  • Silas, thanks. That is an interesting option. I accepted F.J's answer as I requested a reg ex solution and his meets the bill. However I think I will do some follow up on the callable and see if this presents another option. Commented Dec 16, 2013 at 22:07

4 Answers 4

4

You don’t need a regular expression!

parts = data_file.split(".")
print parts[0] + "." + parts[-1]
Sign up to request clarification or add additional context in comments.

Comments

3

Instead of regular expressions, I would suggest using str.split. For example:

>>> data_file = 'my_file.10.4.5.6.csv'
>>> parts = data_file.split('.')
>>> print parts[0] + '.' + parts[-1]
my_file.csv

However if you insist on regular expressions, here is one approach:

>>> print re.sub(r'\..*\.', '.', data_file)
my_file.csv

Comments

0

You don't need a regex.

tokens = expanded_name.split('.')
compressed_name = '.'.join((tokens[0], tokens[-1]))

If you are concerned about performance, you could use a length limit and rsplit() to only chop up the string as much as you need.

compressed_name = expanded_name.split('.', 1)[0] + '.' + expanded_name.rsplit('.', 1)[1]

Comments

0

Do you need a regex here?

>>> address = "my_file.10.4.5.6.csv"
>>> split_by_periods = address.split(".")
>>> "{}.{}".format(address[0], address[-1])
>>> "my_file.csv"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.