Replace all instances of character in portion of string in bash

Question

I need to replace all instances of a character (period in my case) in 1+ portions/segments/ranges of a string. I'm using Bash on Linux. Ideally the solution is in Bash, but if it's either not possible or terribly complex I can call any app commonly found on Linux (sed, Python, etc).

Example:

Starting String: "foo.bar.baz blah. blah. blah. abc.def.ghi ..." .

Needed transformation: Replace all periods "." between  and  with the string "" .

Desired Result: "foobarbaz blah. blah. blah. abcdefghi" .

EDITS:

The starting string will never contain  or  within a set of them (ie. the range markers are never nested).

I'm asking for help with some built-in Bash capability to perform this. The obvious mechanism is to try to find and , and then perform substitution in the content between. I know Bash can do offset finding (in an indirect way), and substitution. But can it be performed on a subset?

For the comments regarding parsing this as XML: I did not say this is XML so you should not assume it. Ultimately it's irrelevant to my question; the range markers can be anything.

Here's something I got working. It's not pure Bash, but it's simple.

while $(echo "${my_str}" | grep -E '<mark>[^.]*\.[^<]*</mark>' >/dev/null 2>&1) ; do
    my_str=$(echo "${my_str}" | sed -E -e 's,(<mark>[^.]*)\.([^<]*</mark>),\1<wbr />\2,g')
done

This quick hack (which absolutely will not work for general XML strings) may help to get you started on a pure Bash solution: tmp=$string; newstr=; while [[ $tmp == *''*''* ]]; do tmp2=${tmp#**}; tmp3=${tmp%"$tmp2"}; tmp=$tmp2; tmp4=${tmp3%%*}; tmp5=${tmp3#"$tmp4"}; tmp5=${tmp5//./''}; tmp5="<begin>${tmp5#}"; tmp5="${tmp5%}</end>"; newstr+=$tmp4$tmp5; done; newstr+=$tmp; printf '%s\n' "$newstr" — pjh
– pjh, Commented Mar 24 at 19:25
@Shawn - good observation! I changed the tags mid-edit and missed some. I've corrected the Desired Result. — codesniffer
– codesniffer, Commented Mar 24 at 20:33
you could start by replacing the while $(echo ... | grep ...); do with while grep -q -E '[^.]*\.[^<]*' <<< "${my_str}"; do to eliminate two subshell calls on each pass through the loop; the $(echo ... | sed ...) could be replaced with $(sed ... <<< "${my_str}") to eliminate another subshell, while this last subshell could be replaced with some creative parameter substitutions; though I'd look into how to compare ${my_str} to a regex and how that populates the BASH_REMATCH[] array, then the BASH_REMATCH[] results can be used to formulate the parameter substitution — markp-fuso
– markp-fuso, Commented Mar 24 at 21:33

markp-fuso · Accepted Answer · 2025-03-25 18:23:04Z

4

Setup:

string='<mark>foo.bar.baz</mark> blah. blah. blah. <mark>abc.def.ghi</mark>'

One bash solution:

regex='(<mark>[^<]*</mark>)'           # assumes no "<" between "<mark>" and "</mark>" tags
unset prev_string                      # used to test for a change to 'string'

# while we have a match and a change has been made to 'string' ...

while [[ "${string}" =~ ${regex} && "${prev_string}" != "${string}" ]]
do
    # typeset -p BASH_REMATCH          # uncomment to see contents of the BASH_REMATCH[] array

    prev_string="${string}"

    # use nested parameter substitutions to make replacement

    string="${string/${BASH_REMATCH[1]}/${BASH_REMATCH[1]//\./<wbr \/>}}"
done

NOTE: "${prev_string}" != "${string}" added as a quick hack to insure we don't go into an infinite loop in the case where no modifications are made to string (eg, no periods between the tags)

A variation on the above which adds a few cpu cycles while making the parameter substitutions easier to read and understand:

regex='(<mark>[^<]*</mark>)'
unset prev_string

while [[ "${string}" =~ ${regex} && "${prev_string}" != "${string}" ]]
do
    old="${BASH_REMATCH[1]}"           # copy the match; makes follow-on commands a bit cleaner
    new="${old//\./<wbr \/>}"          # replace all periods with "<wbr />"

    prev_string="${string}"
    string="${string/${old}/${new}}"   # update "string" by replacing "${old}" with "${new}"
done

These both generate:

$ typeset -p string
declare -- string="<mark>foo<wbr />bar<wbr />baz</mark> blah. blah. blah. <mark>abc<wbr />def<wbr />ghi</mark>"

edited Mar 25 at 18:23

answered Mar 24 at 21:51

markp-fuso

38.5k5 gold badges24 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pjh Mar 27 at 17:17

Test with string='&.*A.B'. (The & causes a problem if patsub_replacement is enabled with Bash version 5.2 or later.)

Cyrus · Accepted Answer · 2025-03-24 17:31:48Z

2

Feed Perl from stdin or append a file name:

perl -pe 's%(<mark>.*?</mark>)% $1 =~ s|\.|<wbr />|gr %eg'

Output:

<mark>foo<wbr />bar<wbr />baz</mark> blah. blah. blah. <mark>abc<wbr />def<wbr />ghi</mark>

Source: https://unix.stackexchange.com/a/152623/74329

answered Mar 24 at 17:31

Cyrus

90.2k15 gold badges112 silver badges173 bronze badges

3 Comments

Léa Gris Mar 24 at 18:25

Pearl has native libxml support. It is counterproductive to parse XML with pcre.

Cyrus Mar 24 at 20:50

@LéaGris: codesniffer has now further specified the question.

codesniffer Mar 24 at 20:53

Impressive find, thanks @Cyrus ! While it's not pure Bash, I like that this solution does not require a separate script file.

DuesserBaest · Accepted Answer · 2025-03-25 08:41:30Z

2

This is probably super inperformant, but it only uses a single regex to search and replace - no loop needed. I am no expert in shell scripts, so I will not provide one, but this should work inside a Perl call.

Try matching:

([^.]+|\G)\.(?=(?:(?!<mark>).)+<\/mark>)

and replacing with:

$1<wbr />

See: regex101

Explanation

MATCH:

Match all .:

( ... ): Capture to group 1 either
- [^.]+: anything but a dot
- |\G: or the end of the last match
\.: then match a dot

Ensure the dot is inside  ...  tags:

(?= ... ): Look ahead and assert
- (?: ... )+: that you match anything
 - (?!).: but it cannot be .
- <\/mark>: Find , ensuring that you must be inside the tag

REPLACE:

$1: Keep the first group (everything before a dot, but inside tag)
: and replace the dots with

answered Mar 25 at 8:41

DuesserBaest

3,2159 silver badges30 bronze badges

6 Comments

Philippe Mar 25 at 16:53

Quite powerful regex! Is it supposed to match . as well?

DuesserBaest Mar 26 at 8:19

Thanks @Philippe. I thought it wold be quite a puzzle to do in one regex, so I tried it for fun^^ With regard to your question, I do not think so; see in the Question: "I need to replace all instances of a character (period in my case) in 1+portions/segments/ranges of a string"

Philippe Mar 26 at 12:28

All the other answers can change ... to .

DuesserBaest Mar 26 at 12:34

@Philippe you could actually ommit the first part of my regex to end up with \.(?=(?:(?!).?)+<\/mark>) which would just match any "." within mark tags.

Philippe Mar 26 at 13:50

That worked great, thank you! One last question though, in ` \.(?=(?:(?!).?)+<\/mark>), the ` is after the dot (.). How does it work?

|

Ed Morton · Accepted Answer · 2025-03-25 11:29:50Z

2

Using any awk in any shell on all Unix boxes:

$ awk '
BEGIN {
    FS = OFS = "</mark>"
}
{
    for (i = 1; i <= NF; i++) {
        if ( match($i, /<mark>.*/) ) {
            tgt = substr($i, RSTART, RLENGTH)
            gsub(/\./, "<wbr />", tgt)
            $i = substr($i, 1, RSTART - 1) tgt
        }
    }
    print
}
' file
<mark>foo<wbr />bar<wbr />baz</mark> blah. blah. blah. <mark>abc<wbr />def<wbr />ghi</mark>

edited Mar 25 at 11:29

answered Mar 25 at 11:24

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Comments

pjh · Accepted Answer · 2025-03-25 19:49:52Z

This Shellcheck-clean pure Bash code updates the value of the variable my_str:

tmp=$my_str
my_str=
while [[ $tmp =~ ^(.*)(\<mark\>.*\</mark\>)(.*)$ ]]; do
    tmp=${BASH_REMATCH[1]}
    my_str=${BASH_REMATCH[2]//./<wbr />}${BASH_REMATCH[3]}${my_str}
done
my_str=${tmp}${my_str}

The code makes no assumptions about characters between  and . (E.g. < is OK.)
... substrings are processed right-to-left within the input string to work around the fact that matching of regular expressions in Bash is always greedy.
See mkelement0's excellent answer to How do I use a regex in a shell script? for information about regular expressions in Bash.
See Substituting part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of the expansion mechanism (${var//old/new}) used in ${BASH_REMATCH[2]//./}.

Collectives™ on Stack Overflow

Replace all instances of character in portion of string in bash

5 Answers 5

1 Comment

3 Comments

6 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

3 Comments

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related