0

Given this sample text extracted from a PDF:

Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19

My goal is to capture all months and days, i.e. it should capture all of the following:

  • August 31
  • October 19
  • March 18-22
  • December 24 - January 4
  • December 24-January 4

The hard part is capturing the ranges where the months are not the same. I came up with this RegExp:

/(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)(\s*-\s*(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+))?/g

It works great for all except the last two examples listed above. On regexr, it shows that it captures it just fine in capture group #3, but I can't access that in JavaScript. Take this snippet for example:

const string = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';

const subRegex = '(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)';
const dateRegex = new RegExp(`${subRegex}(\s*-\s*${subRegex})?`, 'g');

console.log(string.match(dateRegex));

It seems like I can capture December 24 and January 4 separately, but not together. Is there any way to capture them together?

1 Answer 1

1

You just need to tweak (and perhaps simplify) your original RE a bit:

const str = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
// str2 has "December 24-January 4" instead - no spaces
const str2 = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24-January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
const re = /(January|February|March|April|May|August|September|October|November|December) [\d-]+([ -]*(January|February|March|April|May|August|September|October|November|December) \d+)?/g;
console.log(str.match(re));
console.log(str2.match(re));

Sign up to request clarification or add additional context in comments.

1 Comment

Completely forgot about \d!. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.