4

I want to know if a string has repeated letter 6 times or more, using the =~ operator.

a="aaaaaaazxc2"
if [[ $a =~ ([a-z])\1{5,} ]];
then
     echo "repeated characters"
fi

The code above does not work.

3

3 Answers 3

4

BASH regex flavor i.e. ERE doesn't support backreference in regex. ksh93 and zsh support it though.

As an alternate solution, you can do it using extended regex option in grep:

a="aaaaaaazxc2"
grep -qE '([a-zA-Z])\1{5}' <<< "$a" && echo "repeated characters"

repeated characters

EDIT: Some ERE implementations support backreference as an extension. For example Ubuntu 14.04 supports it. See snippet below:

$> echo $BASH_VERSION
4.3.11(1)-release

$> a="aaaaaaazxc2"
$> re='([a-z])\1{5}'
$> [[ $a =~ $re ]] && echo "repeated characters"
repeated characters
Sign up to request clarification or add additional context in comments.

2 Comments

I found out that backreference is supported in Ubuntu.
Rather, on Ubuntu's libc. Meaning you're not dependent even on the specific build of bash (working anywhere the same version is compiled with the same options), but the platform it's running on. That's about as nonportable as things get.
2

[[ $var =~ $regex ]] parses a regular expression in POSIX ERE syntax.

See the POSIX regex standard, emphasis added:

BACKREF - Applicable only to basic regular expressions. The character string consisting of a character followed by a single-digit numeral, '1' to '9'.

Backreferences are not formally specified by the POSIX standard for ERE; thus, they are not guaranteed to be available (subject to platform-specific libc extensions) in bash's native regex syntax, thus mandating the use of external tools (awk, grep, etc).

Comments

2

You do not need the full power of backreferences for this specific case of one character repeats. You could just build the regex that would check for a repeat of every single lower case letter

regex="a{6}"
for x in {b..z} ; do regex="$regex|$x{6}" ; done    
if [[ "$a" =~ ($regex) ]] ; then echo "repeated characters" ; fi

The regex built with the above for loop looks like

> echo "$regex" | fold -w60
a{6}|b{6}|c{6}|d{6}|e{6}|f{6}|g{6}|h{6}|i{6}|j{6}|k{6}|l{6}|
m{6}|n{6}|o{6}|p{6}|q{6}|r{6}|s{6}|t{6}|u{6}|v{6}|w{6}|x{6}|
y{6}|z{6}

This regular expression behaves as you would expect

> if [[ "abcdefghijkl" =~ ($regex) ]] ; then \
  echo "repeated characters" ; else echo "no repeat detected" ; fi
no repeat detected
> if [[ "aabbbbbbbbbcc" =~ ($regex) ]] ; then \
  echo "repeated characters" ; else echo "no repeat detected" ; fi
repeated characters

Updated following the comment from @sln replaced bound {6,} expression with a simple {6}.

1 Comment

You don't need the more part, just a {6} because matching more than 6 doesn't give you information.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.