I'm trying to find the "right" way to read files line-by-line.
I have been using for for line in $(cat "$FILE"); do for a while, and I really enjoy its clearness.
I know that while IFS= read -r line; do ... done < "$FILE" should be more optimized (without a subshell), but I don't like that the file is specified at the end of the loop. Also, when testing it, I encountered some weird issues with variable scopes.
Recently I found out about mapfile -t LINES < $FILE which is supposed to be super-optimized, and it looks cleaner than while read, but the performance in my tests show that it is only faster on very small files.
So, my question is - does it make any sense to use other methods, rather than for line in $(cat "$FILE"); do.
The only scenario that I can imagine, where it would be slower would be reading thousands of small files in a loop. In other cases, the difference is negligible while sacrificing readability
I took files of various sizes, and a script(below) to compare
################ test-med.txt (140698 lines) ###################
for line in $(cat "$FILE"); do
real 0m0,924s
user 0m0,812s
sys 0m0,128s
while IFS= read -r line; do
real 0m1,328s
user 0m1,113s
sys 0m0,215s
mapfile -t LINES < $FILE
real 0m1,240s
user 0m1,129s
sys 0m0,111s
################ test-small.txt (180 lines) ###################
for line in $(cat "$FILE"); do
real 0m0,050s
user 0m0,001s
sys 0m0,049s
while IFS= read -r line; do
real 0m0,001s
user 0m0,001s
sys 0m0,000s
mapfile -t LINES < $FILE
real 0m0,011s
user 0m0,006s
sys 0m0,005s
################ test-tiny.txt (32 lines) ###################
for line in $(cat "$FILE"); do
real 0m0,050s
user 0m0,000s
sys 0m0,050s
while IFS= read -r line; do
real 0m0,000s
user 0m0,000s
sys 0m0,000s
mapfile -t LINES < $FILE
real 0m0,000s
user 0m0,000s
sys 0m0,000s
Comparison script used:
#!/bin/bash
_t1() {
IFS=$'\n'
for line in $(cat "$FILE"); do
echo "$line"
done
}
_t2() {
while IFS= read -r line; do
echo "$line"
done < "$FILE"
}
_t3() {
mapfile -t LINES < $FILE
for line in "${LINES[@]}"; do
echo $line
done
}
for FILE in $(ls *.txt); do
CNT=$(cat $FILE | wc -l)
echo "################ $FILE ($CNT lines) ###################"
echo 'for line in $(cat "$FILE"); do'
time _t1 >/dev/null
echo 'while IFS= read -r line; do'
time _t2 >/dev/null
echo 'mapfile -t LINES < $FILE'
time _t3 >/dev/null
done
for line in $(cat "$FILE")andwhile IFS= read -r linedo different things, so a performance comparison by itself doesn't make sense. In general, if you care about performance, you probably shouldn't use a shell, and definitely not Bash. Also, there's never any reason to use$(ls *.txt), it can only break things.for line in $(cat "$FILE"); doas it breaks when the input contains spaces and/or globbing metachars, and would skip any blank lines. Get a robust solution first and then think about performance.catshould probably do just fine, also your argument is a textbook classic example of why you DRLWFmapfileis leading you to the wrong conclusion. If you want to compare how fastmapfileworks to how fast an equivalent read-loop works you should be comparing onlymapfile -t LINES < "$FILE"towhile IFS= read -r line; do LINES+=( "$line" ); done, i.e. how fast can you populate an array from file contents, not how fast can you print the contents of a file as your current code is implementing. If you don't need an array then you wouldn't usemapfile(akareadarray) as it exists to populate an array.bash" and "performance" in the same Question is barking up the wrong tree. Interpreted languages are inherently slower.