apply sed only to the part of the file after last match in loop - shell / bash [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? As written, this question is lacking some of the information it needs to be answered. If the author adds details in comments, consider editing them into the question. Once there's sufficient detail to answer, vote to reopen the question.

Closed 2 years ago.

Improve this question

I have a couple of large files (~1Gb) of such structure:

fooA iug9wa
fooA lauie
fooA nwgoieb
fooB wilgb
fooB rqgebepu
fooB ifbqeiu
...
fooN ibfiygb
fooN yvsiy
fooN aeviu

I would like to replace in shell each fooX (which contains letters, numbers "." and "_"), (I have all listed in foo.list) to sequential numbers 1 to N.

I've used:

nfoos=$(wc -l < foo.list)

for i in $(seq 1 $nfoos)
do
    currentfoo=$(sed "${i}q;d" foo.list)
    sed -i "s/"${currentfoo}"/$i/g" file1
    sed -i "s/"${currentfoo}"/$i/g" file2
    sed -i "s/"${currentfoo}"/$i/g" filen
done

However, with large files it's been taking forever. Since each consecutive fooX always appears in the files than foo(X-1) I though to make sed only search the part of fileX after the last match of fooX, so that with each foo there is less space to search. I've been trying to use labels and some multiline approaches, but the syntax keeps beating me here.

Does anyone know how to make it work? (Doesn't necessarily have to use sed, but would be great if it worked in basic shell in Bash.)

Appreciate any help. And if you do, please explain each function/option/variable used so that I can figure out where I had been messing up.

Counting lines or enumerating line numbers so I can loop over them - why is this an anti-pattern? — tripleee
– tripleee, Commented Nov 19, 2023 at 15:10

Walter A · Accepted Answer · 2023-11-19 15:45:23Z

2

You can use awk.
The first part of the next awk command will fill the array a, the second part replaces the first word.

awk 'NR==FNR { a[$1]=NR; next} $1 in a{$1=a[$1]; print}' foo.list file1

When this is what you like, you can loop over your files

for f in file1 file2 filen; do
  awk 'NR==FNR { a[$1]=NR; next} $1 in a{$1=a[$1]; print}' foo.list "${f}" > "${f}.tmp" &&
  mv "${f}.tmp" "${f}"
done

The && makes sure the new file will only replace the original file when awk was OK.

answered Nov 19, 2023 at 15:45

Walter A

20.2k2 gold badges29 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Walter A Over a year ago

Glad I could help. Next time please add more example cases, such as a small foo.list and the result you want from the given input. Example input with dots and underscores might be releavant.

Walter A Over a year ago

This solution removes lines from the inputfile that don't have a corresponding field in your foo.list. When you want to replace those lines with something like a foo0, create an if statemetn in the awk. Nice training!

Verpous · Accepted Answer · 2023-11-19 14:45:40Z

0

Two optimizations:

Use awk to generate a sed script which does all the replacements in a single run.
Run sed -i with N file arguments instead of running sed N times with 1 file argument each.

awk '{ print "s/" $0 "/" NR "/g;" }' foo.list > temp_script
sed -i -f temp_script $(cat foo.list)

Now you run sed only once instead of N^2 times.

edited Nov 19, 2023 at 14:45

answered Nov 19, 2023 at 14:38

Verpous

7764 silver badges10 bronze badges

4 Comments

Walter A Over a year ago

OP write: fooX contains letters, numbers "." and "_". Values might be val., valx and your sed command will match val. with valx. You should replace the dows with [.] of \..

Walter A Over a year ago

I think you want to replace sed -i -f temp_script $(cat foo.list) with sed -i -f temp_script file1 file 2 filen.

MartynaM Over a year ago

Thanks @WalterA ! That is exactly the problem I ran into. Replacing "_"'s kind of misses the whole point as I would have to replace them in the large files too and this would again, take a lot of time. Reg. second comments, yes, I've noticed this too, but I got the point. I'll fix it in the reply.

Ed Morton Over a year ago

You never need to have awk generate a sed script and then call sed to execute it, just do whatever you want to do in the one call to awk. What you show will fail for various input values, e.g. if foo.list contains abc.e then the sed command will replace abcae, abc5e, abcXe, etc.

Collectives™ on Stack Overflow

apply sed only to the part of the file after last match in loop - shell / bash [closed]

2 Answers 2

2 Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Linked

Related