0

I have a file containing records as below:

drwxr-xr-x   - root supergroup          0 2015-04-05 05:26 /user/root
drwxr-xr-x   - hadoop supergroup          0 2014-11-05 11:56 /user/root/input
drwxr-xr-x   - hadoop supergroup          0 2014-11-05 03:06 /user/root/input/foo
drwxr-xr-x   - hadoop supergroup          0 2015-04-28 03:06 /user/root/input/foo/bar
drwxr-xr-x   - hadoop supergroup          0 2013-11-06 15:54 /user/root/input/foo/bar/20120706
-rw-r--r--   3 hadoop supergroup          0 2013-11-06 15:54 /user/root/input/foo/bar/20120706/_SUCCESS
drwxr-xr-x   - hadoop supergroup          0 2013-11-06 15:54 /user/root/input/foo/bar/20120706/_logs
drwxr-xr-x   - hadoop supergroup          0 2013-11-06 15:54 /user/root/input/foo/bar/20120706/_logs/history

In the Java code, I use Pattern and Matcher class to get substrings that I want to process later. The code is as in the listing:

String filename = "D:\\temp\\files_in_hadoop_temp.txt";
Pattern thePattern
    = Pattern.compile("[a-z\\-]+\\s+(\\-|[0-9]) (root|hadoop)\\s+supergroup\\s+([0-9]+) ([0-9\\-]+) ([0-9:]+) (\\D+)\\/?.*");

    try
    {
        Files.lines(Paths.get(filename))
                .map(line -> thePattern.matcher(line))
                .collect(Collectors.toList())
                .forEach(theMather -> {
                    if (theMather.find())
                    {
                        System.out.println(theMather.group(3) + "-" + theMather.group(4) + "-" + theMather.group(6));
                    }
                });
    } catch (IOException e)
    {
        e.printStackTrace();
    }

and the result is as below:

0-2015-04-05-/user/root
0-2014-11-05-/user/root/input
0-2014-11-05-/user/root/input/foo
0-2015-04-28-/user/root/input/foo/bar
0-2013-11-06-/user/root/input/foo/bar/
0-2013-11-06-/user/root/input/foo/bar/
0-2013-11-06-/user/root/input/foo/bar/
0-2013-11-06-/user/root/input/foo/bar/

But my expected results are without the tailing "/" as the first three rows. I have tried many patterns to strip the tailing "/" but failed.

Would you please provide some suggestions about the pattern to strip the tailing "/".

Thank you a lot.

1
  • Regex will match only the existing string. First 3 strings doesn't ends with '/'. So just use a if condition and add the ending '/' if not there. Commented Apr 29, 2015 at 4:03

2 Answers 2

1

Use a character set to make sure the last character isn't a slash. Thus, instead of

(\\D+)\\/?.*"

try

(\\D*[^\\d/]).*

The part in parentheses matches the longest substring of nondigits, with the added restriction that the last character may not be a slash.

Note: Tested.

Sign up to request clarification or add additional context in comments.

1 Comment

This is indeed what I expect, Thanks
0

What you can do is to check a simple if statement if the last char is a slash and get the new string using substring:

if (theMather.find())
   {
       String data = theMather.group(3) + "-" + theMather.group(4) + "-" + theMather.group(6);
       //String data = theMather.group(3) + "-" + theMather.group(4) + "-" + theMather.group(6);
       if(data.charAt(data.length() - 1) == '/')
        data = data.substring(0, data.length() - 1);

       System.out.println(data);
   }

2 Comments

@MohanRaj That is usually the wrong question. A lot of posters here want to try to solve everything with regexes, but the best way to solve problems is the way that works and is most readable--and the complex regexes that are sometimes used to solve problems are not readable (and probably not efficient, either). For some reason, programmers fall in love with regexes and want to use them for everything. Rod's answer is understandable. It is not a "workaround".
You could also use if (data.endsWith("/")).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.