Revisions to wc -c not working in a loop in the script [duplicate]

added 597 characters in body

Source Link

edited Jun 16, 2018 at 21:21

356.1k
42
737
1.1k

Using bash and its globstar shell option. With this option set, the ** glob pattern matches all the pathnames beneath the given directory. This means we don't have to explicitly walk the directory structure in our script:

#!/bin/bash

dir="$1"
size="$2"

shopt -s globstar

for pathname in "$dir"/**; do
    [ ! -f "$pathname" ] && continue

    filesize=$( wc -c <"$pathname" )

    if [ "$filesize" -lt "$size" ]; then
        printf 'Found %s, size is %d\n' "$pathname" "$filesize"
    fi
done

Rather than writing your own directory tree walker, you may instead use find. The following find command does what your code tries to do:

The only slight difference between the explicit directory walker and the find variation is that find (when used as above) will ignore symbolic links, while the shell function above it will resolve symbolic links, possibly causing directory loops to be traversed infinitely, or the same data to be counted more than once.

Using find, the contents of the files will not be read to figure out the file size. Instead a lstat() library call will be made to query the filesystem for the file's size. This is many times faster than using wc -c!

On most (but not all) Unices, you may also use the command line utility stat to get the file size. See the manual for this utility on your system for how to use it (it works differently on Linux and BSD).

Rather than writing your own directory tree walker, you may instead use find. The following find command does what your code tries to do:

The only slight difference between the explicit directory walker and the find variation is that find (when used as above) will ignore symbolic links, while the shell function above it will resolve symbolic links, possibly causing directory loops to be traversed infinitely, or the same data to be counted more than once.

Using bash and its globstar shell option. With this option set, the ** glob pattern matches all the pathnames beneath the given directory. This means we don't have to explicitly walk the directory structure in our script:

#!/bin/bash

dir="$1"
size="$2"

shopt -s globstar

for pathname in "$dir"/**; do
    [ ! -f "$pathname" ] && continue

    filesize=$( wc -c <"$pathname" )

    if [ "$filesize" -lt "$size" ]; then
        printf 'Found %s, size is %d\n' "$pathname" "$filesize"
    fi
done

Rather than writing your own directory tree walker, you may instead use find. The following find command does what your code tries to do:

The only slight difference between the explicit directory walker and the find variation is that find (when used as above) will ignore symbolic links, while the shell function above it will resolve symbolic links, possibly causing directory loops to be traversed infinitely, or the same data to be counted more than once.

Using find, the contents of the files will not be read to figure out the file size. Instead a lstat() library call will be made to query the filesystem for the file's size. This is many times faster than using wc -c!

On most (but not all) Unices, you may also use the command line utility stat to get the file size. See the manual for this utility on your system for how to use it (it works differently on Linux and BSD).

added 9 characters in body

Source Link

edited Jun 16, 2018 at 21:12

Kusalananda ♦

356.1k
42
737
1.1k

The issue is that your code is doing cd .. before being done with all the files in a directory. In general, you don't have to cd into directories to get filenames from them, and going back and forth into directories in loops can be confusing. It can also lead to strange issues if you are redirecting output to filenames with relative paths, etc. inside the loops. In this script, you also would not know where (in what directory) the file was found, because you always look in the current directory.

See also:

Why does my shell script choke on whitespace or other special characters?

Why *not* parse `ls` (and what to do instead)?

Why is printf better than echo?

Have backticks (i.e. `cmd`) in *sh shells been deprecated?

Rather than writing your own directory tree walker, you may instead use find. The following find command does what your code tries to do:

The only slight difference between the explicit directory walker and the find variation is that find (when used as above) will ignore symbolic links, while the shell function above it will resolve symbolic links, possibly causing directory loops to be traversed infinitely, or the same data to be counted more than once.

See also:

Why does my shell script choke on whitespace or other special characters?

Why *not* parse `ls` (and what to do instead)?

Why is printf better than echo?

Have backticks (i.e. `cmd`) in *sh shells been deprecated?

The issue is that your code is doing cd .. before being done with all the files in a directory. In general, you don't have to cd into directories to get filenames from them, and going back and forth into directories in loops can be confusing. It can also lead to strange issues if you are redirecting output to filenames with relative paths, etc. inside the loops.

Rather than writing your own directory tree walker, you may instead use find. The following find command does what your code tries to do:

The only slight difference between the explicit directory walker and the find variation is that find (when used as above) will ignore symbolic links, while the shell function above it will resolve symbolic links, possibly causing directory loops to be traversed infinitely, or the same data to be counted more than once.

See also:

Why does my shell script choke on whitespace or other special characters?

Why *not* parse `ls` (and what to do instead)?

Why is printf better than echo?

Have backticks (i.e. `cmd`) in *sh shells been deprecated?

The issue is that your code is doing cd .. before being done with all the files in a directory. In general, you don't have to cd into directories to get filenames from them, and going back and forth into directories in loops can be confusing. It can also lead to strange issues if you are redirecting output to filenames with relative paths, etc. inside the loops. In this script, you also would not know where (in what directory) the file was found, because you always look in the current directory.

See also:

Why does my shell script choke on whitespace or other special characters?

Why *not* parse `ls` (and what to do instead)?

Why is printf better than echo?

Have backticks (i.e. `cmd`) in *sh shells been deprecated?

Rather than writing your own directory tree walker, you may instead use find. The following find command does what your code tries to do:

The only slight difference between the explicit directory walker and the find variation is that find (when used as above) will ignore symbolic links, while the shell function above it will resolve symbolic links, possibly causing directory loops to be traversed infinitely, or the same data to be counted more than once.

added 331 characters in body

Source Link

edited Jun 15, 2018 at 6:24

Kusalananda ♦

356.1k
42
737
1.1k

The issue is that your code is doing cd .. before being done with all the files in a directory. In general, you don't have to cd into directories to get filenames from them, and going back and forth into directories in loops can be confusing. It can also lead to strange issues if you are redirecting output to filenames with relative paths, etc. inside the loops.

Fixing this by not using cd, and also by not using ls, which allows the script to work with filenames that have spaces and other unusual characters in them:

#!/bin/sh

find_smaller () {
    dir=$1
    size=$2

    for pathname in "$dir"/*; do
        if [ -f "$pathname" ]; then
            # this is a regular file (or a symbolic link to one), test its size
            filesize=$( wc -c <"$pathname" )
            if [ "$filesize" -lt "$size" ]; then
                printf 'Found %s, size is %d\n' "$pathname" "$filesize"
            fi
        elif [ -d "$pathname" ]; then
            # this is a directory (or a symbolic link to one), recurse
            printf 'Entering %s\n' "$pathname"
            find_smaller "$pathname" "$size"
        fi
    done
}

find_smaller "$@"

In the code above, $pathname will be not only the filename of the current file or directory that we're looking at, but also its path relative to the starting directory.

Note also the quoting of all variable expansions. Without quoting the $pathname variable, for example, you would invoke filename globbing if a filename contained characters like * or ?.

Rather than writing your own directory tree walker, you may instead use find. The following find command does what your code tries to do:

find /home/161161 -type f -size -100c

As a script:

#!/bin/sh
dir=$1
size=$2
find "$dir" -type f -size -"$size"c

The only slight difference between the explicit directory walker and the find variation is that find (when used as above) will ignore symbolic links, while the shell function above it will resolve symbolic links, possibly causing directory loops to be traversed infinitely, or the same data to be counted more than once.

See also:

The issue is that your code is doing cd .. before being done with all the files in a directory. In general, you don't have to cd into directories to get filenames from them, and going back and forth into directories in loops can be confusing. It can also lead to strange issues if you are redirecting output to filenames with relative paths, etc.

Fixing this by not using cd, and also by not using ls, which allows the script to work with filenames that have spaces and other unusual characters in them:

#!/bin/sh

find_smaller () {
    dir=$1
    size=$2

    for pathname in "$dir"/*; do
        if [ -f "$pathname" ]; then
            # this is a regular file, test its size
            filesize=$( wc -c <"$pathname" )
            if [ "$filesize" -lt "$size" ]; then
                printf 'Found %s, size is %d\n' "$pathname" "$filesize"
            fi
        elif [ -d "$pathname" ]; then
            # this is a directory, recurse
            printf 'Entering %s\n' "$pathname"
            find_smaller "$pathname" "$size"
        fi
    done
}

find_smaller "$@"

In the code above, $pathname will be not only the filename of the current file or directory that we're looking at, but also its path relative to the starting directory.

Note also the quoting of all variable expansions. Without quoting the $pathname variable, for example, you would invoke filename globbing if a filename contained characters like * or ?.

Rather than writing your own directory tree walker, you may instead use find. The following find command does what your code tries to do:

find /home/161161 -type f -size -100c

As a script:

#!/bin/sh
dir=$1
size=$2
find "$dir" -type f -size -"$size"c

See also:

The issue is that your code is doing cd .. before being done with all the files in a directory. In general, you don't have to cd into directories to get filenames from them, and going back and forth into directories in loops can be confusing. It can also lead to strange issues if you are redirecting output to filenames with relative paths, etc. inside the loops.

Fixing this by not using cd, and also by not using ls, which allows the script to work with filenames that have spaces and other unusual characters in them:

#!/bin/sh

find_smaller () {
    dir=$1
    size=$2

    for pathname in "$dir"/*; do
        if [ -f "$pathname" ]; then
            # this is a regular file (or a symbolic link to one), test its size
            filesize=$( wc -c <"$pathname" )
            if [ "$filesize" -lt "$size" ]; then
                printf 'Found %s, size is %d\n' "$pathname" "$filesize"
            fi
        elif [ -d "$pathname" ]; then
            # this is a directory (or a symbolic link to one), recurse
            printf 'Entering %s\n' "$pathname"
            find_smaller "$pathname" "$size"
        fi
    done
}

find_smaller "$@"

In the code above, $pathname will be not only the filename of the current file or directory that we're looking at, but also its path relative to the starting directory.

Note also the quoting of all variable expansions. Without quoting the $pathname variable, for example, you would invoke filename globbing if a filename contained characters like * or ?.

Rather than writing your own directory tree walker, you may instead use find. The following find command does what your code tries to do:

find /home/161161 -type f -size -100c

As a script:

#!/bin/sh
dir=$1
size=$2
find "$dir" -type f -size -"$size"c

The only slight difference between the explicit directory walker and the find variation is that find (when used as above) will ignore symbolic links, while the shell function above it will resolve symbolic links, possibly causing directory loops to be traversed infinitely, or the same data to be counted more than once.

See also:

Source Link

answered Jun 15, 2018 at 5:58

Kusalananda ♦

356.1k
42
737
1.1k

Loading

Stack Exchange Network

Return to Answer