0

I need to get a file name from file's absolute path (I am aware of the file.getName() method, but I cannot use it here). EDIT: I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path). I need the part of file's path AFTER certain path provided.

Let's say the file is located in the folder:

C:\Users\someUser

On windows machine, if I make a pattern string as follows:

String patternStr = "C:\\Users\\someUser\\(.*+)";

I get an exception: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence for backslash.

If I use Pattern.quote(File.pathSeparator):

String patternStr = "C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) + "someUser" + Pattern.quote(File.separator) + "(.*+)";

the resulting pattern string is: C:\Q;\EUsers\Q;\EsomeUser\Q;\E(.*+) which of course has no match with the actual fileName "C:\Users\someUser\myFile.txt".

What am I missing here? What is the proper way to parse file name?

4
  • 1
    Why can't you use getName here? Commented Oct 10, 2011 at 8:31
  • Have you tried using this resource? This will give you the Java String expression for a regex. Commented Oct 10, 2011 at 8:34
  • 1
    Agree with @StephenC; in what situation would you be able to use a regular expression but not File.getName()? Commented Oct 10, 2011 at 8:36
  • reason I can't use file.getName() is because I don't need the file name only. I need a part of the file's path (not the entire absoulute path). So I need to parse the file's absolute path. Hope this clears thing a bit. Commented Oct 10, 2011 at 9:26

10 Answers 10

7

What is the proper way to parse file name?

The proper way to parse a file name is to use File(String). Using a regex for this is going to hard-wire platform dependencies into your code. That's a bad idea.

I know you said you can't use File.getName() ... but that is the proper solution. If you would care to say why you can't use File.getName() perhaps I could suggest an alternative solution.

Sign up to request clarification or add additional context in comments.

3 Comments

It's not hard to imagine an API with a method like filterFiles(String regexp) for instance. If he says "I can't use File.getName", then I think we should assume that that is indeed the case.
Well at the very least I hope the OP has learned that that is not how one should design a filterFiles method.
@aioobe actually it is a "she" :)
3

If you indeed want to use a regular expressions, you should use

String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
                       ^^       ^^          ^^

instead.

Why? Your string literal

"C:\\Users\\someUser\\(.*+)"

is compiled to

C:\Users\someUser\(.*+)

Since \ is used for escaping in regular expressions too, you'll have to escape them "twice".


Regarding your edit:

You probably want to have a look at URI.relativize(). Example:

File base = new File("C:/Users/someUser");
File file = new File("C:/Users/someUser/someDir/someFile.txt");

String relativePath = base.toURI().relativize(file.toURI()).getPath();

System.out.println(relativePath); // prints "someDir/someFile.txt"

(Note that / works as file-separator on Windows machines too.)


Btw, I don't know what you have as File.separator on your system, but if it's set to \, then

"C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) +
    "someUser" + Pattern.quote(File.separator) + "(.*+)";

should yield

C:\Q\\EUsers\Q\\EsomeUser\Q\\E(.*+)

3 Comments

@MichaelKjörling - and that's why regexes are the wrong solution for this problem.
@StephenC, It may be the case that the OP doesn't have a choice.
@aioobe - it may also be the case that the OP only thinks she / he has no alternative.
2
String patternStr = "C:\\Users\\someUser\\(.*+)";

Backslashes (\) are escape characters in the Java Language. Your string contains the following after compilation:

C:\Users\someUser\(.*+)

This string is then parsed as a regex, which uses backslashes as an escape character as well. The regex parser tries to understand the escaped \U, \s and \(. One of them is incorrect regarding the regex syntax (hence your exception), and none of them are what you are trying to achieve.

Try

String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

Comments

1

If you want to solve it by pattern you need to escape your Pattern properly

String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

Comments

0

Try putting double-double-backslashes in your pattern. You need a second backslash to escape one in the patter, plus you'll need to double each one to escape them in the string. Hence you'll end up with something like:

String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

Comments

0

Move from end of string to first occurrence of file path separator* or begin.

File paths separator can be / or \.

public static final char ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR = '/';
public static final char DIRECTORY_SEPARATOR_CHAR = '\\';
public static final char VOLUME_SEPARATOR_CHAR = ':';


    public static String getFileName(String path) {

        if(path == null || path.isEmpty()) {
            return path;
        }

        int length = path.length();
        int index = length;

        while(--index >= 0) {

            char c = path.charAt(index);

            if(c == ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR || c == DIRECTORY_SEPARATOR_CHAR || c == VOLUME_SEPARATOR_CHAR) {
                return path.substring(index + 1, length); 
            }
        }

        return path;
    }

Try to keep it simple ;-).

3 Comments

I would have gone for finding the platform fileSeparator character, then a lastIndexOf that character, then a substring for the string from that character to the end of the string. Then it would not be necessary to loop through all the characters in the String.
If I would be using the file from system I would have gone for file.getFileName(). That code can be used for files that are located in database for example (we still can use java.io.File, but it could be misleading).Java framework do not have something like Path in ver. 6. Now the last index of, probably works the same as while in my code that iterate from the end and return the position. In that piece of code I do not iterate through all the characters. So this code do more less the same. But in case when you have only "fileName.txt" your code will fail with Index out of bounds exception.
You are right, the lastIndexOf probably does much the same as your code to get it's index. You are going backwards through the String not forwards, so will not loop through all the characters, only the relevant ones. Also my pseudo code needs a lastIndexOf > 0 check.
0

Try this :

String ResultString = null;
try {
    Pattern regex = Pattern.compile("([^\\\\/:*?\"<>|\r\n]+$)");
    Matcher regexMatcher = regex.matcher(subjectString);
    if (regexMatcher.find()) {
        ResultString = regexMatcher.group(1);
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

Output :

myFile.txt

Also for input : C:/Users/someUser/myFile.txt

Output : myFile.txt

Comments

0

What am I missing here? What is the proper way to parse file name?

The proper way to parse a file name is to use the APIs that are already provided for the purpose. You've stated that you can't use File.getName(), without explanation. You are almost certainly mistaken about that.

1 Comment

@Maggie OK so you need the relativizing methods of java.net.URI.
0

I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path).

OK. So what you want is something like this.

    // Canonicalize paths to deal with ".", "..", symlinks, 
    // relative files and case sensitivity issues.
    String directory = new File(someDirectory).canonicalPath();
    String test = new File(somePathname).canonicalPath();

    if (!directory.endsWith(File.separator)) {
        directory += File.separator;
    }
    if (test.startsWith(directory)) {
        String pathInDirectory = test.substring(directory.length()):
        ...
    }

Advantages:

  • No regexes needed.
  • Doesn't break if the path separator is something other than \.
  • Doesn't break if there are symbolic links on the path.
  • Doesn't break due to case sensitivity issues.

Comments

0

Suppose the file name has special characters, specially when supporting MAC where special characters are allowing in filenames, server side Path.GetFileName(fileName) fails and throws error because of illegal characters in path. The following code using regex come for the rescue.

The following regex take care of 2 things

  1. In IE, when file is uploaded, the file path contains folders aswell (i.e. c:\samplefolder\subfolder\sample.xls). Expression below will replace all folders with empty string and retain the file name

  2. When used in Mac, filename is the only thing supplied as its safari browser and allows special chars in file name

     var regExpDir = @"(^[\w]:\\)([\w].+\w\\)";
    
     var fileName = Regex.Replace(fileName, regExpDir, string.Empty);
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.