7

I want to test if a string ONLY contains specific substrings (as whole words) / spaces

I've written some code and it works, but I am concerned that I don't understand the regex (I've copied from elsewhere)

Is there a way of doing this without complex regex?

const str1 = 'a♭ apple a a a a a apple   a♭ a'; // valid
const str2 = 'a♭ apple a a a a a apple   a♭ aa'; // invalid aa
const str3 = 'a♭ apple ad  a a a apple   a♭ a'; // invalid ad
const str4 = ' a♭ apple a a a a a apple   a♭ a'; // valid
const str5 = ' a♭ apple a a a a a apple   a♭ a '; // valid
const str6 = 'a♭ apple a a a a a apple   a♭ a '; // valid
const str7 = '      '; // invalid
const str8 = ''; // invalid

const allowedSubstrings = [
  'a', 'a♭', 'apple'
]

const isStringValid = str => {
  if (str.trim() === '') return false
  allowedSubstrings.forEach(sub => {
    // https://stackoverflow.com/a/6713427/1205871
    // regex for whole words only
    const strRegex = `(?<!\\S)${sub}(?!\\S)`
    const regex = new RegExp(strRegex, 'g')
    str = str.replace(regex, '')
  })
  str = str.replaceAll(' ', '')
  // console.log(str)
  return str === ''
}

console.log('str1', isStringValid(str1))
console.log('str2', isStringValid(str2))
console.log('str3', isStringValid(str3))
console.log('str4', isStringValid(str4))
console.log('str5', isStringValid(str5))
console.log('str6', isStringValid(str6))
console.log('str7', isStringValid(str7))
console.log('str8', isStringValid(str8))

6
  • @danday74, this seems to work but im not sure im not able to extract capture group, maybe someone can build on this! ((?![apple|a|a♭| ]).)* done on regexr.com Commented Jan 2, 2024 at 5:55
  • @mandy8055 if there is a keyword matched, then it means its an invalid value right? Commented Jan 2, 2024 at 6:08
  • 1
    @mandy8055 I think its wrong what I made, appologies for confusion! Commented Jan 2, 2024 at 6:27
  • 1
    Do you also want to match an empty string? (as your code currently does) Commented Jan 2, 2024 at 10:34
  • 1
    @Thefourthbird good point! I'm not sure of the answer myself yet but I'll check how my code works and fix if needed :) thx Commented Jan 4, 2024 at 2:29

2 Answers 2

8

One approach which I can think of (which avoids complex regex) would be to:

  1. split the string based on one or more whitespace characters (space, tabs or others)
  2. check if every word in the words array (created by above split) is included in the allowedSubstrings array.

const str1 = 'a♭ apple a a a a a apple   a♭ a'; // valid
const str2 = 'a♭ apple a a a a a apple   a♭ aa'; // invalid aa
const str3 = 'a♭ apple ad  a a a apple   a♭ a'; // invalid ad
const str4 = ' a♭ apple a a a a a apple   a♭ a'; // valid
const str5 = ' a♭ apple a a a a a apple   a♭ a '; // valid
const str6 = 'a♭ apple a a a a a apple   a♭ a '; // valid
const str7 = '      '; // invalid
const str8 = ''; // invalid

const allowedSubstrings = [
  'a', 'a♭', 'apple'
];

const isStringValid = (str) => {
  const words = str.trim().split(/\s+/);
  // If the requirement would be to make them valid, we can use trim().
  // const words = str.trim().split(/\s+/);
  return words.every(word => allowedSubstrings.includes(word));
};

// const isStringValid = (str) => str.split(/\s+/).every(word => allowedSubstrings.includes(word));

console.log('str1', isStringValid(str1));
console.log('str2', isStringValid(str2));
console.log('str3', isStringValid(str3));
console.log('str4', isStringValid(str4));
console.log('str5', isStringValid(str5));
console.log('str6', isStringValid(str6));
console.log('str7', isStringValid(str7));
console.log('str8', isStringValid(str8));

Sign up to request clarification or add additional context in comments.

2 Comments

much more intelligent than my approach - will accept if no1 comes up with a better answer shortly
accepted this answer thx - added a trim() to your code after you pointed out failing test cases - many thx :)
3

A single regular expression pattern that checks whether the string contains only the specified substrings as whole words or spaces. It uses the join('|') method to create an alternation pattern (|) for the allowed substrings and then tests the string against this pattern using test().

const str1 = 'a♭ apple a a a a a apple   a♭ a'; // valid
const str2 = 'a♭ apple a a a a a apple   a♭ aa'; // invalid aa
const str3 = 'a♭ apple ad  a a a apple   a♭ a'; // invalid ad

const allowedSubstrings = [
  'a',
  'a♭',
  'apple'
];

const isStringValid = str => {
  const regex = new RegExp(`^((${allowedSubstrings.join('|')})(\\s+|$))+$`);
  return regex.test(str);
};

console.log('str1', isStringValid(str1));
console.log('str2', isStringValid(str2));
console.log('str3', isStringValid(str3));

1 Comment

This returns false for ' a♭ apple a a a a a apple a♭ a ' and ' a♭ apple ad a a a apple a♭ a' which as per OP should be true. You can fix this with the alternation with start of string too(^) though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.