Python regex to extract substring at start and end of string

Question

I am looking for a regex that will extract everything up to the first . (period) in a string, and everything including and after the last . (period)

For example:

my_file.10.4.5.6.csv
myfile2.56.3.9.txt

Ideally the regex when run against these strings would return:

my_file.csv 
myfile2.txt

The numeric stamp in the file will be different each time the script is run, so I am looking essentially to exclude it.

The following prints out the string up to the first . (period)

print re.search("^[^.]*", data_file).group(0)

I am having trouble though getting it to also return the the last period and string after it.

Sorry just to update this based upon feedback and comments below:

This does need to be a regex. The regex will be passed into the program from a configuration file. The user will not have access to the source code as it will be packaged. The user may need to change the regex based upon some arbitrary criteria, so they will need to update the config file, rather than edit the application and re-build the package.

Thanks

With regards the comments below saying I don't need a regex. Yes you are correct in that I could use the split method. However I would like to do this via a regex as the regex is passed into the program from a config file. It could change in future based upon the file naming convention changing — IntelligentHeating
– IntelligentHeating, Commented Dec 16, 2013 at 21:53
Regex are slow and unpythonic (a huge string of characters surrounded in slashes only discernible by intense introspection is pretty much precisely the antithesis of Python), so only use them when you need them. Python has a TON of string functions that can replace regex. — Adam Smith
– Adam Smith, Commented Dec 16, 2013 at 21:54
adsmith - Yeah I hear where you are coming from here, unfortunately for ease of use, the criteria for matching the file needs to be stored in a config file and pulled into the program. — IntelligentHeating
– IntelligentHeating, Commented Dec 16, 2013 at 21:59
You could specify a callable in the config somehow; then users would be able to specify any arbitrary rule they want. I can appreciate how that's not as simple, though for anything beyond rudimentary regex, the difference in complexity is debatable. You'd also have the added benefit of allowing users to specify custom name collision behavior then as well. — Silas Ray
– Silas Ray, Commented Dec 16, 2013 at 22:03
Silas, thanks. That is an interesting option. I accepted F.J's answer as I requested a reg ex solution and his meets the bill. However I think I will do some follow up on the callable and see if this presents another option. — IntelligentHeating
– IntelligentHeating, Commented Dec 16, 2013 at 22:07

Ry- · Accepted Answer · 2013-12-16 21:52:12Z

4

You don’t need a regular expression!

parts = data_file.split(".")
print parts[0] + "." + parts[-1]

answered Dec 16, 2013 at 21:52

Ry-♦

226k56 gold badges496 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrew Clark · Accepted Answer · 2013-12-16 21:53:20Z

3

Instead of regular expressions, I would suggest using str.split. For example:

>>> data_file = 'my_file.10.4.5.6.csv'
>>> parts = data_file.split('.')
>>> print parts[0] + '.' + parts[-1]
my_file.csv

However if you insist on regular expressions, here is one approach:

>>> print re.sub(r'\..*\.', '.', data_file)
my_file.csv

answered Dec 16, 2013 at 21:53

Andrew Clark

210k36 gold badges285 silver badges310 bronze badges

Comments

Silas Ray · Accepted Answer · 2013-12-16 21:53:07Z

0

You don't need a regex.

tokens = expanded_name.split('.')
compressed_name = '.'.join((tokens[0], tokens[-1]))

If you are concerned about performance, you could use a length limit and rsplit() to only chop up the string as much as you need.

compressed_name = expanded_name.split('.', 1)[0] + '.' + expanded_name.rsplit('.', 1)[1]

answered Dec 16, 2013 at 21:53

Silas Ray

26.2k5 gold badges52 silver badges67 bronze badges

Comments

rlms · Accepted Answer · 2013-12-16 21:53:25Z

0

Do you need a regex here?

>>> address = "my_file.10.4.5.6.csv"
>>> split_by_periods = address.split(".")
>>> "{}.{}".format(address[0], address[-1])
>>> "my_file.csv"

edited Dec 16, 2013 at 21:53

answered Dec 16, 2013 at 21:52

rlms

11.1k8 gold badges47 silver badges62 bronze badges

Collectives™ on Stack Overflow

Python regex to extract substring at start and end of string

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related