#!/bin/bash
find "$1" ! -type d |
while read fpath; do
fname="${fpath##*/}"
suffix="${fname##*.}"
if [[ "$suffix" == "$fname" ]]; then
suffix="(none)"
fi
size="$( stat --format '%s' "$fpath" )"
printf '%s\t%d\n' "$suffix" "$size"
done |
awk '{ sz[$1] += $2 }
END { for (s in sz) { printf("%s: %d\n"\t%d\n", s, sz[s]) } }'
Given a directory on the command line, the above bash script will use stat1 to get the size of each individual file in the directory, and below, in bytes. The while-loop also chops off the suffix for each file and outputs it together with the size of the file (in bytes).
The awk script2 at the end will summarize and print the information.
Example, running over a directory of one of my work projects:
$ bash ./script.sh /home/kk/Work/Development/project/src/
c: 4559172
am: 369
h: 151369
o: 4613432
in: 42216
out: 3282712
(none): 2908962
Po: 18414
txt: 7129
The output may then be further filtered and formatted if need be.
Modifying this to do percentages of total size, or to use file to get the filetype rather than relying on the filename suffix, or to output the sizes in another unit than bytes, is left to the reader as on exercise.
1 The stat call here is tailored for GNU stat on an Ubuntu machinefrom the GNU coreutils package. I don't know if stat on other Linuxes are different, butThe stat on OpenBSD is totally different.
2 The awk script is assumed to be run by an awk implementation that knows about associative arrays, such as GNU awk or mawk.