0

I have strings like this:

ab
rx'
wq''
pok'''
oyu,
mi,,,,

Basically, I want to split the string into two parts. The first part should have the alphabetical characters intact, the second part should have the non-alphabetical characters. The alphabetical part is guaranteed to be 2-3 lowercase characters between a and z; the non-alphabetical part can be any length, and is gauranteed to only be the characters , or ', but not both in the one string (e.g. eex,', will never occur).

So the result should be:

[ab][]
[rx][']
[wq]['']
[pok][''']
[oyu][,]
[mi][,,,,]

How can I do this? I'm guessing a regular expression but I'm not particularly adept at coming up with them.

1
  • You could try to find the indexOf the first character that is a , or a ' and then split the string in two parts having that index. Commented Aug 10, 2012 at 6:38

6 Answers 6

2

Regular expressions have is a nice special called "word boundary" (\b). You can use it, well, to detect the boundary of a word, which is a sequence of alpha-numerical characters.

So all you have to do is

foo.split(/\b/)

For example,

"pok'''".split(/\b/) // ["pok", "'''"]
Sign up to request clarification or add additional context in comments.

1 Comment

Cool, didn't know about word boundaries. And just for anyone visiting this page, here is a good explanation of them: stackoverflow.com/a/4541595/963396
2

If you can 100% guarantee that:

  1. Letter-strings are 2 or 3 characters
  2. There are always one or more primes/commas
  3. There is never any empty space before, after or in-between the letters and the marks
    (aside from line-break)

You can use:

/^([a-zA-Z]{2,3})('+|,+)$/gm

var arr = /^([a-zA-Z]{2,3})('+|,+)$/gm.exec("pok'''");
arr === ["pok'''", "pok", "'''"];

var arr = /^([a-zA-Z]{2,3})('+|,+)$/gm.exec("baf,,,");
arr === ["baf,,,", "baf", ",,,"];

Of course, save yourself some sanity, and save that RegEx as a var.

And as a warning, if you haven't dealt with RegEx like this: If a match isn't found -- if you try to match foo','' by mixing marks, or you have 0-1 or 4+ letters, or 0 marks... ...then instead of getting an array back, you'll get null.

So you can do this:

var reg = /^([a-zA-Z]{2,3})('+|,+)$/gm,
    string = "foobar'',,''",

    result_array = reg.exec(string) || [string];

In this case, the result of the exec is null; by putting the || (or) there, we can return an array that has the original string in it, as index-0.

Why?

Because the result of a successful exec will have 3 slots; [*string*, *letters*, *marks*]. You might be tempted to just read the letters like result_array[1]. But if the match failed and result_array === null, then JavaScript will scream at you for trying null[1].

So returning the array at the end of a failed exec will allow you to get result_array[1] === undefined (ie: there was no match to the pattern, so there are no letters in index-1), rather than a JS error.

5 Comments

Primes/commas can be zero or more.
Okay, so the answer to that is to change the ('+|,+) to ('*|,*). It will then look for 0 or more marks, instead of one or more.
The g means to check the whole line -- it's actually not needed for this one. It's useful if you're looking for, say, "oo" in a string, but it could be in multiple places. Like finding /ow/g in "How now, brown cow." - 4 matches. The m means if you've got a multi-line string, treat a line-break like the end of the string. So if you did one single line at a time, as a string (split the text at line-breaks ("\n") ), then m does nothing. If you read the whole text in as 1 string, or left line-breaks in the string, somehow, then without the m this regex doesn't work.
Thanks for the explanation. I've chosen another answer as the accepted solution because it is simpler, but yours is definitely the most comprehensive and safest.
That is cool by me - just remember that the onus is on you, one way or another to make sure that your data is either validated on the way in, or on the way out. ie: if you are going to use word-boundaries, keep in mind that _ is considered a letter, as far as \w is concerned -- so check for that stuff if your data is not 100% perfect. Also validate the length of the letter-string, after the boundary-split. In the end, you do similar amounts of work -- it is a question of where you do the work and how much you can trust what is in your data (hint -- public site: none of it)
0

You could try something like that:

function splitString(string){
   var match1 = null;
   var match2 = null;
   var stringArray = new Array();
   match1 = string.indexOf(',');
   match2 = string.indexOf('`');
   if(match1 != 0){
      stringArray = [string.slice(0,match1-1),string.slice(match1,string.length-1];
   }
   else if(match2 != 0){
      stringArray = [string.slice(0,match2-1),string.slice(match2,string.length-1];
   }
   else{
      stringArray = [string];
   }

}

Comments

0
var str = "mi,,,,";
var idx = str.search(/\W/);
if(idx) {
    var list = [str.slice(0, idx), str.slice(idx)]
}

You'll have the parts in list[0] and list[1].

P.S. There might be some better ways than this.

2 Comments

The second part of str.slice should be with idx+1, if not you will have a char repeated in both parts.
Not really, the length argument is exclusive.
0

yourStr.match(/(\w{2,3})([,']*)/)

2 Comments

Your regEx is going to allow for strings that contain ',' or '', or similar.
Right. But the goal, ultimately should be to reject false-positives, rather than accept potentially-broken data. Technically, your solution works fine, so long as every line of data entered is 100% perfect. However, a_b' will pass in your regex, for example. Will it ever happen? Hopefully not. But if it were a mission-critical system (or involved anything that any user touched), I'd prefer the defensive white-list, rather than the inclusive black-list.
0
if (match = string.match(/^([a-z]{2,3})(,+?$|'+?$)/)) {
    match = match.slice(1);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.