1

I have a bunch of strings in the following format -

"- (username) on (date) in (country) for (department)"

Examples -

- user.001 on July 15, 2012 in Africa for Human Resources \r\n\t\t\tEdit
- someusername on January 01, 2012 in United States for HR \r\n\t\t\tEdit
- userid on August 15, 2012 in Asia for Whatever\r\n\t\t\tEdit
- 100100.user on May 21, 2002 in New England for ABC \r\n\t\t\tEdit

How do I extract username, date, country and department using regex and C#?

Thanks for the help!

Edit 1: I discovered that some of the input strings do not have department. It is optional. e.g. "- user.001 on July 15, 2012 in Africa\r\n\t\t\tEdit". How do I handle this?

4
  • Regex is the one thing that I just can't wrap my head around. I've been trying to extract the data using string.substring which is very tedious. Commented Aug 11, 2012 at 5:51
  • I'd recommend going and reading up on the subject and coming back here with any specific questions you have on the subject. Teach a man to fish, and all that. Commented Aug 11, 2012 at 5:52
  • Thank you. I have Expresso installed and have been fiddling around the examples with no luck. Can you recommend a good book on Regex? I have been trying to learn Regex for a while now. Thanks again! Edit: Nevermind, found a couple of good books on Amazon. Commented Aug 11, 2012 at 5:54
  • @tempid C# in a nutshell has a very very good explanation on REGEX Commented Aug 11, 2012 at 5:59

3 Answers 3

2

You can try this:

- (.+) on (.+) in (.+) for (.+)\\r\\n\\t\\t\\tEdit

The matches I got (in $1, $2, $3, $4)

Match 1
1.  user.001
2.  July 15, 2012
3.  Africa
4.  Human Resources
Match 2
1.  someusername
2.  January 01, 2012
3.  United States
4.  HR
Match 3
1.  userid
2.  August 15, 2012
3.  Asia
4.  Whatever
Match 4
1.  100100.user
2.  May 21, 2002
3.  New England
4.  ABC

Edit:

In case the dept. part is optional you can try this (make the last matching part optional, and make the one next to it non-greedy match:

- (.+) on (.+) in (.+?)(?: for (.+))?\\r\\n\\t\\t\\tEdit

Match 5
1.  user.001
2.  July 15, 2012
3.  Africa
4.   
Sign up to request clarification or add additional context in comments.

4 Comments

The exact same regex, and the exact same time. It is, though, undeniable the best solution for this problem.
Thank you very much! It works great. In some strings, the department field does not exist (I've updated the question). Is it possible to edit the regex to handle this?
Quick question - if I don't want the username field at all, how do I ignore it? I understand I can ignore it by not using the value in the groups. But how do skip it completely so it doesn't even show up in the Match.Groups ($1)?
simply don't capture it (leave the parenthesis off): - .+ on (.+) in (.+?)(?: for (.+))?\\r\\n\\t\\t\\tEdit
1

The regex you seem to need is:

"- (.*) on (.*) in (.*) for (.*) \\r\\n\\t\\t\\t(.*)"

Note the whitespaces. Then you just need to get the correct groups from your match. group(1) will be username, group(2) will be date, etc.

group(1) will return the substring that matched the first part wrapped in parenthesis in the regex, group(2) the second, and so on.

1 Comment

Thanks for the tip on Groups. Works great.
0
Regex r=new Regex(@"(.*?)on(.*?)in(.*?)for(.*)\s");
Match m=r.Match(s);
m.Groups[1].Value;//UserName
m.Groups[2].Value;//Date
m.Groups[3].Value;//Country
m.Groups[4].Value;//Department

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.