3

I have a small regexp that should verify if a commits subject adheres to the ReactJS commit message format. Since the expression works with my test strings the code left me baffled.

This small example should reproduce the behaviour:

#!/bin/bash

function test_subject {
  local subject="$1"
  local pattern="^(feat|fix|docs|style|refactor|test|chore)\([a-zA-Z0-9._-]+\): [^\n]+$"

  if ! [[ $subject =~ $pattern ]]; then
    echo "Invalid subject: $subject"
  else
    echo "  Valid subject: $subject"
  fi
}

test_subject "chore(gh-actions): add script for commit check"
test_subject "chore(gh-actions): add script for commit checking"
test_subject "feat(ABC-123): add new feature"
test_subject "fix(ABC123): add new feature"
test_subject "fix(ABC123): fix previously added feature"
test_subject "fix(scope): fix bug"

This leads to the following output:

  Valid subject: chore(gh-actions): add script for commit check
Invalid subject: chore(gh-actions): add script for commit checking
Invalid subject: feat(ABC-123): add new feature
Invalid subject: fix(ABC123): add new feature
  Valid subject: fix(ABC123): fix previously added feature
  Valid subject: fix(scope): fix bug
1
  • 1
    Change your regex to: '^(feat|fix|docs|style|refactor|test|chore)\([a-zA-Z0-9._-]+\): .+$' Commented Dec 18, 2024 at 18:42

2 Answers 2

6

You will need to use . instead of [^\n] in your shell regex to match any character.

rx="[^\n]" doesn't mean any character except newline. It is seeing \ and n as two separate characters hence [^\n] is seen as any character other than \ and n.

Note that your example strings number 2, 3, and 4 have the letter n somewhere after matching : hence it is not matching till end of line and $ assertion in the end fails the match.

This should work for you:

test_subject() {
  local subject="$1"
  local pattern="^(feat|fix|docs|style|refactor|test|chore)\([a-zA-Z0-9._-]+\): .+$"

  if ! [[ $subject =~ $pattern ]]; then
    echo "Invalid subject: $subject"
  else
    echo "  Valid subject: $subject"
  fi
}

test_subject "chore(gh-actions): add script for commit check"
test_subject "chore(gh-actions): add script for commit checking"
test_subject "feat(ABC-123): add new feature"
test_subject "fix(ABC123): add new feature"
test_subject "fix(ABC123): fix previously added feature"
test_subject "fix(scope): fix bug"

Output:

Valid subject: chore(gh-actions): add script for commit check
Valid subject: chore(gh-actions): add script for commit checking
Valid subject: feat(ABC-123): add new feature
Valid subject: fix(ABC123): add new feature
Valid subject: fix(ABC123): fix previously added feature
Valid subject: fix(scope): fix bug
Sign up to request clarification or add additional context in comments.

4 Comments

"[^\n] is being evaluated as [^n]" may not be correct: pat="[^\n]"; [[ '\' =~ $pat ]] && echo matches. The backslash is a literal character here. Compare with pat="[^n]"; [[ '\' =~ $pat ]] && echo matches
Please compare output of [[ 'renew' =~ [^\n]+ ]] && declare -p BASH_REMATCH and [[ 'renew' =~ [^n]+ ]] && declare -p BASH_REMATCH in your bash. For me it shows declare -ar BASH_REMATCH='([0]="re")' both the times.
That is different than my example. In my and OP's example, the pattern is embedded to a variable and the backslash is a literal character in this context. You may want to compare output of pat="[^\n]+"; [[ 'r\enew' =~ $pat ]] && declare -p BASH_REMATCH and pat="[^n]+"; [[ 'r\enew' =~ $pat ]] && declare -p BASH_REMATCH
Ah ok, I get your point. Thanks for pointing it out. I changed explanation in my answer to make it clear.
4

Bash regexp does not know \n to be a new line character. The [^\n] is just a [^n] so the lines which your script marks as invalid - have "n" in them ("checking", "new").

The new line bash regexp do recognize is actually $ - end of the line.

Another point - your test strings do not have \n characters in them, so there is no real need to check for its absence - hence a simple <complex_pattern>: .+ is enough, which would mean that complex pattern ends with colon, space and you have something after that. No real need to end the pattern with $ since it already will go up to the end of a string.

Said that... if your just hit Enter inside the pattern, you will get a string like:

  local pattern="^(feat|fix|docs|style|refactor|test|chore)\([a-zA-Z0-9._-]+\): [^
]+$"

But bush (or at least some versions of it) will see it as a single string with a \n and do a real "no \n inside the string".

It is possible - yes. Is it better than simple .+ - definitely not. But as a funny feature and possible way to confuse people - yes.

1 Comment

Perhaps worth noting: \n is not only not special to bash, but does not have a specific meaning in POSIX extended regex in general.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.