0

I am writing a function to generate a file name to avoid repetitions with a suffix similar to "(1)", which are automatically substituted by most browsers and other programs when downloading a file.

I ran into the problem that the file extension can be not only with one dot, like example.zip, in addition to this, there are such as example.tar.gz And I can't imagine how to handle such cases yet.

At first, I thought of just defining everything after the first dot in the line as an extension, but I came across the fact that I already have such cases when files were uploaded to the system with dots in the name, and this implementation option no longer suits me.

I understand that cases with such a "two-level" file extension are extremely rare, but I cannot allow such problems in the future that suffixes will break file extensions like example.tar-1.gz

P.S. I also thought of defining everything after the first dot in the file name as its extension, but a file may also have dots in the name.

I tried to trim the extension after the last point, but there are "two-level" extensions:

TARGET: example.zip
OK: example-1.zip

TARGET: example.tar.gz
BAD: example.tar-1.gz

I also thought of defining everything after the first dot in the file name as its extension, but a file may also have dots in the name:

TARGET: example.tar.gz
OK: example-1.tar.gz

TARGET: product-v.1.0.0.zip
BAD: product-v-1.1.0.0.zip
1
  • 1
    I don't see how this would be possible, without maintaining a list of "meaningful" extensions. You can't just go by number of dots, if you want foo.tar.gz to be recognized as a "two-level" extension, but foo.bar.gz as just one level. Whether having tar and gz as individual items on your list would be enough, or whether you would have to explicitly have tar.gz in your list, would depend on whether you want foo.tar.gz and foo.gz.tar to be treated the same, or differently. Commented Oct 1 at 10:57

1 Answer 1

0

You didn't specify the environement, but I think that it is safe to assume that you have regex on hand. Thus - also assuming that expected extensions are from a finite set - I would create a list of possible extensions and use regex to separate names from extensions.

On this sample set we have tar, tar.gz and zip extensions.

first.tar
second.file.tar
third.tar.gz
fourth.zip
fifth.stuff.zip

Using this Python regex '^(.*)\.(tar|tar\.gz|zip)$'gim you can have the file name in the first capture group and the extension in the second. When you process one filename at once, then m switch (multiline) can be omitted.

Is that something you wanted?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.