0

This is a stripped down version of code I am working on. The purpose of the code is to take a string of information, break it down, and parse it into key value pairs.

Using the info in the example below, a string might look like:

"DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567"

One further point about the above example, at least three of the features we have to parse out will occasionally include additional values. Here is an updated fake example string.

"DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568"

The problem with this is that the code refuses to split out DIVIDE and DIV information separately. Instead, it keeps splitting at DIV and then assigning the rest of the information as the value.

Is there a way to tell my code that DIVIDE and DIV need to be parsed out as two separate values, and to not turn DIVIDE into DIV?

public List<string> FeatureFilterStrings
    {
        // All possible feature types from the EWSD switch.  
        get
        {
            return new List<string>() { "DIVIDE", "DIV", "CLACOS", "INT"};
        }
    }

public void Parse(string input){

    Func<string, bool> queryFilter = delegate(string line) { return FeatureFilterStrings.Any(s => line.Contains(s)); };


    Regex regex = new Regex(@"(?=\\bDIVIDE|DIV|CLACOS|INT)");
    string[] ms = regex.Split(updatedInput);
    List<string> queryLines = new List<string>();
    // takes the parsed out data and assigns it to the queryLines List<string>
    foreach (string m in ms)
    {
        queryLines.Add(m);
    }

    var features = queryLines.Where(queryFilter);
    foreach (string feature in features)
        {
            foreach (Match m in Regex.Matches(workLine, valueExpression))
            {
                string key = m.Groups["key"].Value.Trim();
                string value = String.Empty;

                value = Regex.Replace(m.Groups["value"].Value.Trim(), @"s", String.Empty);
                AddKeyValue(key, value);
            }
        }

    private void AddKeyValue(string key, string value)
    {
        try
        {
            // Check if key already exists. If it does, remove the key and add the new key with updated value.
            // Value information appends to what is already there so no data is lost.
            if (this.ContainsKey(key))
            {
                this.Remove(key);
                this.Add(key, value.Split('&'));
            }
            else
            {
                this.Add(key, value.Split('&'));
            }
        }
        catch (ArgumentException)
        {
            // Already added to the dictionary.
        }
    }       
}

Further information, the string information does not have a set number of spaces between each key/value, each string may not include all of the values, and the features aren't always in the same order. Welcome to parsing old telephone switch information.

1
  • How about splitting input string by spaces, removing empty entries and then iterating through array of strings? Regexes are not always best option Commented Oct 31, 2015 at 23:53

2 Answers 2

2

I would create a dictionary from your input string

string input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";

var dict = Regex.Matches(input, @"(\w+?) = (.+?)( |$)").Cast<Match>()
           .ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);

Test the code:

foreach(var kv in dict)
{
    Console.WriteLine(kv.Key + "=" + kv.Value);
}
Sign up to request clarification or add additional context in comments.

Comments

1

This might be a simple alternative for you.

Try this code:

var input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";

var parts = input.Split(new [] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);

var dictionary =
    parts.Select((x, n) => new { x, n })
         .GroupBy(xn => xn.n / 2, xn => xn.x)
         .Select(xs => xs.ToArray())
         .ToDictionary(xs => xs[0], xs => xs[1]);

I then get the following dictionary:

dictionary


Based on your updated input, things get more complicated, but this works:

var input = "DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568";

Func<string, char, string> tighten =
    (i, c) => String.Join(c.ToString(), i.Split(c).Select(x => x.Trim()));

var parts =
    tighten(tighten(input, '&'), ',')
    .Split(new[] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);

var dictionary =
    parts
        .Select((x, n) => new { x, n })
        .GroupBy(xn => xn.n / 2, xn => xn.x)
        .Select(xs => xs.ToArray())
        .ToDictionary(
            xs => xs[0],
            xs => xs
                .Skip(1)
                .SelectMany(x => x.Split(','))
                .SelectMany(x => x.Split('&'))
                .ToArray());

I get this dictionary:

dictionary2

4 Comments

Would this still work in a situation such as when DIVIDE = KE48, KE49, KE60 CLACOS = 4556D DIV = 3466 INT = 4567, 4599
@jason - No. Does this mean that your input format can differ from the example in your question? If so, how can it differ?
Yes, sorry, I will go up and edit the question. I had the info in my original draft but apparently I can't copy and paste.
That seems to be working on the limited set I have available to me. I will try it in a live situation on Monday to see what happens. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.