0

How do I make it so from the peptides list it gets all of the text, not just one variable from the ARRAY variable. Is there some sort of delimiter I can use? I want to put a paragraph and have it highlight a database of words, but I can't get it to work.

Textfile:

bad
wait
fu
too
#!/bin/bash

#this is the array how do I make it get all the words not just wait...
ARRAY=(wait bad)

while read line ; do (echo $line | sed ''/$ARRAY/s//`printf "\033[32m$ARRAY\033[0m"`/'' ); done < peptides.txt

This is what I turned it into, yet I cannot seem to get the piping to not exit during the grips. I don't understand the complexity behind it, but this is what I got it working with. I can't get it working without these kazoos on the start of each .db file. I added those so the sed pipes didn't exit. For some reason not finding a match stopped the pipe with some exit code?

kazooa kazoob kazooc kazood kazooe kazoof kazoog kazooh kazooi kazooj kazook kazool kazoom kazooz
#!/bin/bash
set -euo pipefail

#echo "anything" | { grep e || test $? = 1; } | { grep e2 || test $? = 1; }
shopt -s expand_aliases

alias egrep-green="  GREP_COLOR='1;32' grep -E --color=always"
alias egrep-yellow-underline=" GREP_COLOR='4;33' grep -E --color=always"
alias egrep-yellow-highlight=" GREP_COLOR='7;33' grep -E --color=always"
alias egrep-yellow-blinking=" GREP_COLOR='5;33' grep -E --color=always"
alias egrep-yellow-italic=" GREP_COLOR='3;33' grep -E --color=always"
alias egrep-yellow=" GREP_COLOR='1;33' grep -E --color=always"
alias egrep-red="    GREP_COLOR='1;31' grep -E --color=always"
alias egrep-red-highlight="    GREP_COLOR='7;31' grep -E --color=always"
alias egrep-red-blinking="    GREP_COLOR='5;31' grep -E --color=always"
alias egrep-blue-italic="   GREP_COLOR='3;34' grep -E --color=always"
alias egrep-cyan-italic="   GREP_COLOR='3;36' grep -E --color=always"
alias egrep-magenta-italic="  GREP_COLOR='3;35' grep -E --color=always"
#kaplan traffic light system
#kaplan light yellow 
#slow down most to least
#comparison most
#contrast middle
#opposition middle
#sequences least yellow

#yellow/red authorkeywords

#kaplan go green 
#continuation

#kaplan stop red 
#evidence
#conclusion
#refutation



#&& { echo $?; echo Found ;} || { echo $?; echo Not found ;}
clear;
echo "continuation" | egrep-green continuation; echo "evidence" | egrep-red-highlight evidence; echo "conclusion" | egrep-red conclusion; echo "refutation" | egrep-red-blinking refutation; echo "comparison" | egrep-yellow-highlight comparison; \
echo "contrast"| egrep-yellow-underline contrast; echo "opposition" | egrep-yellow-underline opposition; echo "sequence" | egrep-yellow sequence; echo "positive" | egrep-yellow-italic positive; echo "negative" | egrep-blue-italic negative; \
echo "extreme" | egrep-cyan-italic extreme; echo "moderating" | egrep-magenta-italic moderating; \

cat frog.txt | egrep-green -z -w -f continuation.db \
| egrep-red-highlight -z -w -f evidence.db  \
| egrep-red -z -w -f conclusion.db \
| egrep-red-blinking -z -w -f refutation.db \
| egrep-yellow-highlight -z -w -f comparison.db \
| egrep-yellow-underline -z -w -f contrast.db \
| egrep-yellow-underline -z -w -f opposition.db  \
| egrep-yellow -z -w -f sequence.db \
| egrep-yellow-italic -z -w -f positive.db \
| egrep-blue-italic -z -w -f negative.db \
| egrep-cyan-italic -z -w -f extreme.db \
| egrep-magenta-italic -z -w -f moderating.db \
| fold -w 240 -s \


#continuation.db
#contrast.db
#opposition.db
#sequence.db
#comparison.db
#conclusion.db  
#refutation.db  
#evidence.db
#positive.db
#negative.db
#extreme.db
#moderating.db
5
  • 1
    What are you trying to do with the array? You can loop over it with for word in "${ARRAY[@]}" Commented Jun 20, 2024 at 20:16
  • I suspect what you really want is something like sed -r 's/bad|wait/\033[32m&\033[0m/. & is replaced with whatever matched the regexp. Commented Jun 20, 2024 at 20:18
  • See stackoverflow.com/questions/53839253/… for how to convert an array into a delimited string. Commented Jun 20, 2024 at 20:18
  • do you want to match on whole words only or will you allow matches on a subset of a string? for example, would you expect too to match on metoo or tools? Commented Jun 20, 2024 at 20:33
  • as with the proposed grep-based answers sed can work on an entire file so for this particular example there's no need to use a while read loop to process lines one at a time; you could replace the entire while read loop with the more efficient sed 'your-final-sed-script' peptides.txt; also in this case the repeated (echo ... | sed ...) requires spawning two new subshells for each pass through the loop, while a single sed 'your-final-sed-script' peptides.txt will spawn no subshells (for any appreciably sized file you should notice a huge improvement in performance) Commented Jun 20, 2024 at 21:34

4 Answers 4

2

Depending on which unix you have (and which version of grep), but assuming it is gnu grep (it is probably it), I feel that what you really want to do is covered by grep options

words.txt

one
two
three
^

file.txt

This should highlight the three words listed
in words.txt, not just the first one.
grep -f words.txt --color file.txt

Note the ^ line in the words file, to ensure that even line that do not contain any words are displayed (there might be another option of grep that I am not aware of that does the same thing)

EDIT

I realize that I misundertood the role of your text file. It wasn't a list of words, but the text (I thought you were trying to first get the list from the file into an array, and then use that array to highlight color). But that is easily solved

array=(one two three)
grep --color -f <(printf "%s\n" ${array[@]} ^) file.txt

# Variant, not using <()
grep --color $(printf -- "-e %s " ${array[@]} ^) file.txt
Sign up to request clarification or add additional context in comments.

2 Comments

Using printf is much cleaner than my attempt grep -Ef <( sed 's/ /|/g; s/$/|^/' <<< "${a[@]}") --color. Please use lowercase for non-system variables.
@WalterA You're right, I should have used array instead of ARRAY (I recycled the OP variable name). Corrected.
0

Assumptions/understandings:

  • OP wants to match on whole words and not a substring of a word, ie, too will not match on metoo nor tool
  • OP wishes to display all lines regardless of whether there's a match

Sample text file:

$ cat textfile
mybad bad really_bad
waiter nowait wait water
foo fu tofu fufu full
two tu to too metoo tool

Words to search for:

############
# separate lines in a file

$ cat words.db
wait
bad

############
# as an array

ARRAY=( wait bad )

One idea using grep:

############
# match on lines from words.db

grep --color -z -w -f words.db textfile

############
# match on entries in ARRAY[]

grep --color -z -w -f <(printf "%s\n" "${ARRAY[@]}") textfile

Where:

  • -z is a trick to force all lines to be passed to stdout
  • -w match only complete words
  • -f read patterns from following file (or process substitution in the case of <(printf ...))

Both of these generate:

enter image description here

OP's sample sed script appears to want the output highlighted in green. We can modify grep's default color scheme by modifying the GREP_COLORS variable and in particular the ms= substring (see man grep, section titled GREP_COLORS for details).

############
# default: red == 31

GREP_COLORS='ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36'
             ^^^^^^^^

############
# desired: green = 32

GREP_COLORS='ms=01;32:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36'
             ^^^^^^^^

With this change the grep commands generate the following output:

enter image description here

If OP wants to match on all strings (not just complete words) we can remove the -w flag to generate:

enter image description here

4 Comments

THANK YOU! You gave me the best answer! This helps a lot with my critical reasoning analysis section for my MCAT examination! There are many words to look for, and I have to become adept at identification. They only allow a single color highlight tool, but recommend a multi colored stop light approach in the Kaplan. This will help me train! It's are very hard examination.
cat frog.txt | egrep-green -z -w -f continuation.db | egrep-red-highlight -z -w -f evidence.db | egrep-red -z -w -f conclusion.db | egrep-red-blinking -z -w -f refutation.db | egrep-yellow-highlight -z -w -f comparison.db | egrep-yellow-underline -z -w -f contrast.db | egrep-yellow-underline -z -w -f opposition.db | egrep-yellow -z -w -f sequence.db So I got it all working, but when one of the databases isn't filled the output doesn't appear is there... Is there some logical operator I could use? not working evidence.db because of since if for example why the reason is
I think it's something to do with || true or the exit code when no match is found. I can't figure out how to get it when they don't appear without exiting in the pipes.
I suggest you ask a new question to address what looks (to me) like a new/different issue; in particular it appears you wish to apply multiple/different colors based on different sets of word matches
0

You can also do, similar to other answers but just modifying the array

array=(one two three)
grep --color ${array[@]/#/-e } file.txt

which is shorter. Assuming they are words without spaces.

Comments

0

Borrowing @markp-fuso's sample input and adding foreground and background colors for each word in words.db:

$ cat textfile
mybad bad really_bad
waiter nowait wait water
foo fu tofu fufu full
two tu to too metoo tool

$ head words.db
wait green/yellow
bad white/red
tofu blue/
foo /cyan

You might be interested in something like this to control which colors are displayed for which words:

$ cat tst.sh
#!/usr/bin/env bash

awk '
    BEGIN {
        split("black red green yellow blue magenta cyan white",tputColors)
        for (i in tputColors) {
            colorName = tputColors[i]
            colorNr = i-1

            cmd = "tput setaf " colorNr
            fgEscSeq[colorName] = ( (cmd | getline escSeq) > 0 ? escSeq : "<" colorName ">" )
            close(cmd)

            cmd = "tput setab " colorNr
            bgEscSeq[colorName] = ( (cmd | getline escSeq) > 0 ? escSeq : "<" colorName ">" )
            close(cmd)
        }

        cmd = "tput sgr0"
        colorOff = ( (cmd | getline escSeq) > 0 ? escSeq : "<sgr0>" )
        close(cmd)
    }
    NR == FNR {
        split($2,colors,"/")
        fgColors[$1] = fgEscSeq[colors[1]]
        bgColors[$1] = bgEscSeq[colors[2]]
        next
    }
    {
        sep = ""
        for ( i=1; i<=NF; i++ ) {
            fgColor = ( $i in fgColors ? fgColors[$i] : "" )
            bgColor = ( $i in bgColors ? bgColors[$i] : "" )
            printf "%s%s%s%s%s", sep, fgColor, bgColor, $i, colorOff
            sep = OFS
        }
        print ""
    }
' words.db textfile

$ ./tst.sh
mybad bad really_bad
waiter nowait wait water
foo fu tofu fufu full
two tu to too metoo tool

enter image description here

The above does full-word literal string matching (see How do I find the text that matches a pattern? for various pattern matching alternatives) and assumes you don't care about replacing white space between fields with a blank char. If that's not what you want then ask a new question with sample input/output demonstrating whatever it is you do want.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.