0

I'm trying to get a relatively simple regex working the way I want it to. I'm trying to split a string into an array but ignoring blank lines. Here's what I've got so far

const regExp = /\s*(?:\n|$)\s*/;
const names = "\nBen\n\n\nLeah\nJosh\nJess";
console.log(names.split(regExp));

This is returning an array of

0: ""
1: "Ben"
2: "Leah"
3: "Josh"
4: "Jess"

As you can see all of the duplicated newlines are being correctly ignored but not if it's the first character. Can anyone suggest what amendment I need to make to get rid of that pesky blank first line.

11
  • 9
    Is names.trim().split(regExp) not an option? Commented Oct 31 at 11:57
  • A regex-only solution is simply not available here. If the string starts with 1 or more newlines, the regex instructs split to cut around that region. This will always result in a leading empty string. Commented Oct 31 at 12:29
  • @XavierPedraza well, a regex-only solution would work but not with splitting. Array.from(names.matchAll(/[^\s]+/g), matches => matches[0]) works exactly as required. So, in addition to not knowing why not just trim, I also don't know why this should be a split. Commented Oct 31 at 12:32
  • names.match(/\S(?:.*\S)?/g). Or .split(/[\r\n]/).map(l => l.trim()).filter(Boolean) Commented Nov 1 at 9:47
  • 2
    @sln because I don't answer XY questions. Commented Nov 1 at 20:25

2 Answers 2

1

A regex-only solution backed by split is simply not available here. If the string starts with 1 or more newlines, the regex instructs split to cut around that region. This will always result in a leading empty string.

If you want a clean, self-documenting solution, you can do what VLAZ suggests in this comment (use trim before split). This makes your intent very clear. However, since you are pursuing a regex-only solution, it could be the case that you wish to optimize for performance. If so, this would be my proposal:

function isDelimiterChar(charCode) {
     // Checks if a character is 0x0A (\n).
     // Modify if you want to, for example, split by
     // all whitespace (0x09, 0x20, others?)
     return charCode === 0x0A;
}

function splitIgnoreBlank(string) {
     let start = 0;
     let ret = [];
     let c;

     for (let i = 0; i < string.length; i++) {
          c = string.charCodeAt(i);
          if (!isDelimiterChar(c)) continue;
          if (i !== start) ret.push(string.substring(start, i));
          start = i + 1;
     }

     if (start != string.length) {
          ret.push(string.substring(start));
     }

     return ret;
}

console.log(splitIgnoreBlank(names));
Sign up to request clarification or add additional context in comments.

7 Comments

"Modify if you want to, for example, split by all whitespace (0x09, 0x20, others?)" seems that's the more or less the intention. The split is done by /\s*(?:\n|$)\s*/ which is any covers newlines surrounded by any sequence of whitespace characters. "Alice\nBob" and "Alice \n Bob" would both split into ["Alice", "Bob"] using OP's code. While "Alice \n Santa Claus \n Bob" splits into ["Alice", "Santa Claus", "Bob"]. isDelimiterChar() can't determine that at the moment, since it needs to check the full sequence of whitespaces to see if any is a newline: then it's a delimiter
The author simply using \s in their pattern is not an indicator that they truly want to delimit with respect to non-LF whitespace. This case does not appear in their sample data, so I provided code which gives the same results when used on the given sample data.
I've given you the only interpretation we can support with the evidence and facts we have. You are, of course, under no obligation to stick to it.
Why the minutia of micro code when trimming the boundary whitespace and split newlines and trim \s*\n\s* is all thats needed.
I did specifically mention this approach in my answer. I see no issue in providing additional approaches.
I consider a trim then split with regex a Regex solution since its the simplest an most efficient, no time wasted, cleanest, most understandable way that is possible to do it. This is my opinion.
@sln I don't disagree. I stated in my answer that your regex solution (would not say regex-only) is "clean" and "self-documenting". The code I provided is desirable in few ways other than its relative performance.
0

You can just trim() the string before split().

JS Demo

var names = "\nBen\n\n\nLeah\nJosh\nJess\n";
console.log(names.trim().split(/\s*\n\s*/));

Output:

[ 'Ben', 'Leah', 'Josh', 'Jess' ]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.