0

I'm having an issue splitting up two parts of a text file with regex. Basically, a name of a class will appear, but then the room number will come one white space after it. I am not guaranteed the name of the room, otherwise I would split on that.

To illustrate, this splits perfectly fine:

WEB SITE DEVELOPMENT II     NKM 104

It will split because of the white spaces, so in my string[] array it looks like:

0 - WEB SITE DEVELOPMENT II
1 - KNM 104

Which is what I need. The problem lies in entries such as these:

PERSONAL COMPUTER APPLICATI NKM 106
PORTFOLIO DES & PROF PRACTI LCN 104

Which will show up as:

0 - PERSONAL COMPUTER APPLICATI NKM 104
1 - PORTFOLIO DES & PROF PRACTI LCN 104

When I need:

0 - PERSONAL COMPUTER APPLICATI
1 - KNM 104
2 - PORTFOLIO DES & PROF PRACTI 
3 - LCN 104

Any ideas on where to start on some regex in a situation like this? I know I am guaranteed the room number will always be the "XYZ 012" form, but the problem is it comes after the name of the class. It was before, I could easily just split on that. Any help is appreciated.

1
  • What if the class name is "SOME COURSE IN XYZ" ? how could you possibly differentiate "XYZ" from a room name? Commented Feb 27, 2013 at 17:38

3 Answers 3

2

No need for regexes here...

var firstPart = line.Substring(0, line.Length - 8);
var lastPart = line.Substring(line.Length - 7);

... and the complete example:

var data = lines.Split(new[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
                .SelectMany(line => new[] {line.Substring(0, line.Length - 8), line.Substring(line.Length - 7)})
                .Select((part, i) => string.Format("{0} - {1}", i, part));

var asString = string.Join(Environment.NewLine, data);
Sign up to request clarification or add additional context in comments.

Comments

2

The fact different text in your examples are all truncated at the same length makes me suspect your text file is fixed-width, and does not need a regular expression. The FileHelpers project parses fixed-width text.

However, if your widths will always be the same for every file, you can simply extract the substrings with expressions like string field = inputLine.Substring(startcolumn, columnLength).

Comments

0

Here's the regex options I would use (assuming you're reading one line at a time:

Regex regexObj = new Regex(@"^(.+)\s(\w+\s[0-9]{3})$");

You can access it by capture groups. The first capture group will get you the first part of the string, the second will get you the room number and building(?).

Assumptions:

  • The room number is the last thing in a line
  • You are reading this text file line by line, so when you're matching against a string, there's only one entry in it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.