1

I have a text file, each line is a string.

The most extreme could look like this:

A01B01C01D100E500F100.00G100.00H100.00

A little information about possibilities:

  • Each string will include at least one of those letters and a number
  • the numbers following the letters can be any number of digits and places past the decimal.
  • The letters are not always in order

Another example of the data:

A01B400C62.578D77.297
C62.409D77.222
C62.259D77.113
C62.135D76.975
C62.042D76.815
C61.985D76.638
C61.973D76.529
A03B10000
A0C62.760 D77.336
A0E3.000
A01F400E0
A01B400E-0.100

What I would like to do is split the string at each letter, and taking all of the numbers until the next letter. With results like so:

A01, B01, C01, D100, E500, F100.00, G100.00, H100.00

I have tried a bunch of things, and the closest I have gotten is this

dicedLine = myLine.split(/[ABCDEFGH]/)

This gives me CLOSE to what I want, except I have found that if you have a string that does not include one of those letters in the search, then the results are not what I am after.

For example a line like this:

A30

Will give me results like this:

["", "30"]

Where I would really want results like this:

["A30", "",  "",  "", "", "", "", ""]

Any ideas are appreciated!

6
  • Shall "D62.409C77.222" result in ["", "", "C77.222", "D62.409", "", "", "", ""]? Can you be more specific? Commented May 25, 2017 at 1:41
  • once the string is split, the order doesn't matter so much. I'm planning to do results.indexOf('X') to manipulate the data as needed. Commented May 25, 2017 at 2:18
  • But you still want results like ["A30", "", "", "", "", "", "", ""] instead of ["A30"] or even Set ["A30"]? Commented May 25, 2017 at 2:33
  • exactly. I want the full 'set' for each string given. Even if the set is only one actual value and 7 blanks. This is basically just cleaning up the data so I can pass it to another function and parse out what I need. Commented May 25, 2017 at 2:35
  • I don't get it. You want to perform indexOf on the results - so I assume the index is important to you. But then you say the order doesn't matter - so I assume you would be equally happy with a Set. But then you say you also want the blanks - which doesn't make sense if you are not interested in the indices... Commented May 25, 2017 at 2:40

3 Answers 3

3

You can use positive lookahead to assert that a particular character exists, without actually consuming it:

codes = [
  "A01B400C62.578D77.297",
  "C62.409D77.222",
  "C62.259D77.113",
  "C62.135D76.975",
  "C62.042D76.815",
  "C61.985D76.638",
  "C61.973D76.529",
  "A03B10000",
  "A0C62.760 D77.336",
  "A0E3.000",
  "A01F400E0",
  "A01B400E-0.100",
];

console.log(codes.map(code => code.split(/ *(?=[A-Z])/)));

Note I also added * to remove the spaces, if any.

Sign up to request clarification or add additional context in comments.

1 Comment

That was the regex I was looking for. Worked perfectly.
0

You can do this fairly easily with match rather than split. It's a simper regular expression so likely more compatible. Just beware that if no matches are found, it returns null rather than an empty array.

var data = [
 'A01B400C62.578D77.297',
 'C62.409D77.222',
 'C62.259D77.113',
 'C62.135D76.975',
 'C62.042D76.815',
 'C61.985D76.638',
 'C61.973D76.529',
 'A03B10000',
 'A0C62.760 D77.336',
 'A0E3.000',
 'A01F400E0',
 'A01B400E-0.100'
];

var result = data.map(s => s.match(/[a-z][^a-z]+/gi));

console.log(result);

Comments

0

Where I would really want results like this:

["A30", "", "", "", "", "", "", ""]

If the letters are ordered, this is easily accomplished by capturing each letter + digits in its own group as follows:

let re = /(A[^A-Z]+)?(B[^A-Z]+)?(C[^A-Z]+)?(D[^A-Z]+)?(E[^A-Z]+)?(F[^A-Z]+)?(G[^A-Z]+)?(H[^A-Z]+)?/;

console.log(re.exec("A01B01C01D100E500F100.00G100.00H100.00").slice(1));
console.log(re.exec("C30").slice(1));

Non-matching groups produce an undefined array entry. Those can easily be mapped to empty strings if desired.

The letters are not always in order

You would need to replace the individual letters in the given regexp with their generic group [A-Z].

However, your question is ambiguous about unordered letters. Shall "C30B30" result in ["", "B30", "C30", "", "", "", "", ""]? Or ["B30", "C30", "", "", "", "", "", ""]? We can't answer that part of the question without more specifications.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.