2

I have multiple types of strings like those ones:

ProjectOne
ProjectOne-2017-05-03
ProjectOne_version2
ProjectOne-2017-04-24
ProjectOne-2017-04-10_Version2
ProjectTwo
ProjectTwo-2016-11-12
...

I would like to find a way using regex ( or easiest ) to extract the project names and project dates. My aim is to keep, for each project, the recent version of it ( based on its date, and the ones with only the name or the name and version are considered older than those with dates)

Is there a way to extract those different substrings using regex? I read a bit about it and it is quite confusing.

1
  • What is your expected output from this list? Commented May 3, 2017 at 8:54

2 Answers 2

1

Thats the problem with input data that doesn't follow any/much rules: determining its content is hard.

In other words: first you have to step back and look all the data; in order to discover "patterns" in the data set. Then you think up rules that could be used to put entries into different buckets.

Example:

ProjectOne-2017-04-24

It seems that some entries follow the rule:

name separator iso-date

This means: a simple first check would be to figure if incoming strings match something like

(\w+)[-_](\d{4}[-_]\d{2}[-_]\d{2})

This regex matches:

  • a sequence of (more than 1) non-whitespace characters
  • followed by something consisting of 4 digits, 2 digits, 2 digits; with _ or - as separator between them
  • the regex contains two groups; so if you have a match, the first group will contain your project name; and the second group the ISO date value (as string).

The above is just meant as "inspiration"; in the end, it is your project; so you have to sit down and learn and understand regular expressions. You can start here to learn how the rules for such patterns; or here for a complete tutorial on the subject.

Long story short: there are no detours - don't expect SO to provide you with one magic regex that solves all your problems; especially given that you are lacking essential understanding of the concept you intend to use.

Sign up to request clarification or add additional context in comments.

5 Comments

good explication +1 i think i use some informations like your's is that ok @GhostCat ;)
Sure. But you should be careful to not go overboard here. "We" are not a code writing service; so you should be careful to do "all" the work for the OP. The "less code" the OP has written himself, the less code should flow back to him. But sure, you put up a nice answer here. And probably the one that will be accepted in the end; because you did exactly what the OP was looking for ... do most of the work for him ;-)
Thanks a lot! it was a great help for me. From your answer I could find a solution that matches my problem String mydata = "ProjectOne-2017-04-24"; List<String> allMatches = new ArrayList<String>(); Pattern pattern1 = Pattern.compile("(\\w+)"); Matcher matcher1 = pattern1.matcher(mydata); Pattern pattern2 = Pattern.compile("[-](\\d{4}[-]\\d{2}[-_]\\d{2})"); Matcher matcher2 = pattern2.matcher(mydata); if(matcher1.find()) { System.out.println("1==> "+matcher1.group(1)); } if(matcher2.find()) { System.out.println("2==> "+matcher2.group(1)); }
You are welcome; always glad to help. Now throw a coin which of the two answers to accept. I can lend you one, but it is biased, as it as a cat printed on each side ;-)
@SabrinaS Beyond that: please dont paste code into comments. If you think you got enough content for a substantial answer; you can always go in and add your own answer. You still want to upvote/accept other ones; but having such a "this is how i solved it" is a valid thing to do.
1

You have many cases, so you can use groups for example :

  1. (project\\w+)[-_] to match projects names
  2. ([0-9]{4}-[0-9]{2}-[0-9]{2}) to match the dates
  3. version\\d+ to match the version of your project

So your code should look like this :

public static void main(String[] args) {
    String projects[] = {"ProjectOne-2017-05-03", "ProjectOne_version2",
        "ProjectOne-2017-04-24", "ProjectTwo-2016-11-12",
        "ProjectOne-2017-04-10_Version2"};
    for (String project : projects) {
        System.out.println("Input : " + project);
        Pattern pattern = Pattern.compile("(?i)(project\\w+)[-_]|([0-9]{4}-[0-9]{2}-[0-9]{2})|(version\\d+)");
        Matcher matcher = pattern.matcher(project);

        while (matcher.find()) {
            if (matcher.group(1) != null) {
                System.out.println(matcher.group(1));
            }
            if (matcher.group(2) != null) {
                System.out.println(matcher.group(2));
            }
            if (matcher.group(3) != null) {
                System.out.println(matcher.group(3));
            }
        }
        System.out.println("******************************************");
    }
}

Output

Input : ProjectOne-2017-05-03
ProjectOne
2017-05-03
******************************************
Input : ProjectOne_version2
ProjectOne
version2
******************************************
Input : ProjectOne-2017-04-24
ProjectOne
2017-04-24
******************************************
Input : ProjectTwo-2016-11-12
ProjectTwo
2016-11-12
******************************************
Input : ProjectOne-2017-04-10_Version2
ProjectOne
2017-04-10
Version2
******************************************

Regex demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.