1

I am attempting to parse a complex string in JavaScript, and I'm pretty horrible with Regular Expressions, so I haven't had much luck. The data is loaded into a variable formatted as follows:

Miami 2.5 O (207.5) 125.0 | Oklahoma City -2.5 U (207.5) -145.0 (Feb 20, 2014 08:05 PM)

I am trying to parse that string following these parameters:

1) Each value must be loaded into their own variable (IE: separate variables for Miami, 2.5 O, (207.5) ect)
2) String must split at pipe character (I have this working with .split(" | ") )
3) I am dealing with city names that include spaces
4) The date at the end must be isolated and removed

I have a feeling regular expressions must be used, but I'm seriously hoping there is a different way to approach this. The example provided is just that, an example from a much larger data set. I can provide the full data set if requested.

More direct version of my question: Given the data above, what concepts / procedures can I use to intelligently parse the string elements into their own variables?

If RegEx must be used, will I need multiple expressions?

Thanks in advance for your help!

EDIT: In an effort to supply multiple pathways to a solution I'll explain the overarching problem as well. This data is the return of a RSS / XML item. The string mentioned above is sports odds, and is all contained in the title node of the feed I'm using. If anyone has a better XML / RSS feed for sports odds, I would be ecstatic for that as well.

EDIT 2: Thanks to the replies, I can run a RegEx that matches the data points needed. I'm now having trouble iterating through the matches and returning them correctly. I have the RegEx loaded into its own function:

function regExExtract (txt){
    var exp = /([^|\d]+) ([-\d.]+ [A-Z]) (\([^)]+\)) ([-\d.]+) (\([^)]+\))?/g;
    var comp_arr = exp.exec(txt);

    return comp_arr;        
}

And it is being called with:

var title_arr = regExExtract(title);  

Title is loaded with the data string listed above. I assume I'm using the global flag correctly to ensure all matches are considered, but I'm not sure I'm loading the matches correctly. I apologize for my ignorance, this is all brand new to me.

As requested below, my expected output is ultimately a table with a row for each city, and its subsequent data. Each cell in each row corresponds to a data point.

I have created a JS Fiddle with what I've done, and what the expected output is: http://jsfiddle.net/vDkQD/2/

Potential Final Edit: With the assistance of Robin and rewt, I have come up with:
http://jsfiddle.net/hMJx3/

3
  • You are unable to change the format of the string I suppose? This is horrible for an automated parsing (but not impossible). Commented Feb 21, 2014 at 0:41
  • Correct, I am unable to change the format of the string. It is being returned from a XML Feed file in which all that data is put into the title node. It is a freaking nightmare. Commented Feb 21, 2014 at 0:46
  • Can you show exactly the output you'd like? Commented Feb 21, 2014 at 1:34

2 Answers 2

2

Wouldn't a regex like

/([^|\d]+) ([-\d.]+ [A-Z]) (\([^)]+\)) ([-\d.]+) (\([^)]+\))?/g

do the trick? Obviously, this is based on the example string you gave, and if there are other patterns possible this should be updated... But if it is that fixed it's not so complicated.

Afterwards you just have to go through the captured groups for each match, and you'll have your data parsed. Live demo for fun: http://regex101.com/r/kF5zD3

Explanation

  • [^|\d] evrything but a pipe or a digit. This is to account for strange city name that [a-zA-Z ] might not catch
  • [-\d.] a digit, a dot or a hyphen
  • \([^)]+\) opening parenthesis, everything that isn't a closing parenthesis, closing parenthesis.

Quick incomplete pointers on regex

  • Here, the regex is the part between the /. The g after is a flag, thanks to it the regex won't stop after hitting the first match and will return every match
  • The match is what the whole expression will find. Here, the match will be everything between two | in your string. The capturing groups are a very useful tool that allows you too extract data from this match: they are delimited by parenthesis, which are a special character in regex. (a)b will match ab, the first captured group of this match will be a
  • [...] is means every character inside will do. [abc] will match a or b or c.
  • + is a quantifier, another special character, meaning "one or more of what precedes me". a+ means "one or more a and will match aaaaa.
  • \d is a shortcut for [0-9] (yes, - is a special range character inside of [...]. That's why in [-\d.], which is equivalent to [-0-9.], it's directly following the opening bracket)
  • since parenthesis are special characters, when you actually want to match a parenthesis you need to escape: regex (\(a\))b will match (a)b, the first captured group of this match will be (a) with the parenthesis
  • ? means what precedes is optional (zero or one instances)
  • ^ when put at the beginning of a [...] statement means "everything but what's in the brackets". [^a]+ will match bcd-*ù but not aa

If you really know nothing about regex, as I believe they're the right tool for your case, I suggest your take a quick overview of a tuto, just to get a better idea of what you're dealing with. The way to set flags, loop through matches and their respective captured groups will depend on your language and how you call your regex.

Sign up to request clarification or add additional context in comments.

4 Comments

Holy cow, this might do it. Now I just need to learn how to implement it for testing. Will update comment once I do that.
If you know nothing about regex (and I think they're the right tool for your case), you might want to take a quick overview of even a short tuto so that you're not completely in the fog. I'll update with a few pointers
Thank you very much. I have read through a tutorial, but am having trouble dealing with multiple matches. Right now, the expression is doing its job just fine, but I can only return the first instance argument (the data before the pipe). I'm not sure how to access the data after the pipe.
@Vaune: That's probably because you haven't managed to set the g (global) flag correctly: the regex will stop at the first match, hence not going for matches after the pipe. If you're stuck, show some of your code and people will help. I updated with a few random info to get you started, but won't be back for some hours. Have fun !
1
[A-z][a-z]+( [A-z][a-z]+)* -?[0-9]+\.[0-9] [OU] \(-?[0-9]+\.[0-9]\) -?[0-9]+\.[0-9]

This should match a single part of your long string under the following assumptions:

  • The city consists only of alpha characters, each word starts with an uppercase character and is at least 2 characters long.
  • Numbers have an optional sign and exactly one digit after the decimal point
  • the single character is either O or U

Now it is up to you to:

  • Properly create capturing parentheses
  • Check whether my assumptions are right

In order to match the date:

\([JFMASOND][a-z]{2} [0-9]?[0-9], [0-9]{4} [0-9]{2}:[0-9]{2} [AP]M\)$

4 Comments

This is great, however it doesn't isolate the value components. It does however make it easier to target the individual phrases. It omits the date, and matches both core components before and after the pipe. Now I just need to find a way to further split the core values of each component.
@Vaune As I have written that's up to you. You basically need to put parentheses at the right places in my regex, I omitted them for readability.
Understood, thank you. I have 0 experience with RegEx, so I didn't initially understand. I wish I could upvote you :)
@Vaune Simply accept the most helpful answer when you can, this will gain you 2 reputation as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.