3

For about a week I've been waiting for the following command to complete

find -type d -empty -print -delete >> empty-folder-deletion-log.txt

but it is still going and it seems like it has a lot of work left to do. I feel I could have already imaged the entire hard drive several times by now so I feel the command isn't optimal or there is a way to speed this up.

To be fair there are a huge number of files and folders. One hard drive has roughly 100 million inodes and another has roughly 175 million inodes.

What can I do to speed this up? I suspect that the issue is related to random IO, is there an equivalent command which could do it in order of the way the metadata is stored on the hard drive reducing seeking? What options do I have to speed this up without interrupting the command or is there a way to retry with a different program? Would remounting with noatime help? If so may I do this without interrupting the command?

I'm using the latest version of Debian stable which at the time of writing is Debian bullseye. This command is being ran on two hard drives a 8 TB hard drive and a 12 TB hard drive, both are 5400 RPM.

9
  • 2
    Did you inspect your system while this find process is running (check that it is really running and not waiting for something)? Is it a server (with only text screen) or a desktop system? Is the system doing something else, that would keep it busy at the same time or is this the main process running? Commented Jun 4, 2023 at 19:23
  • 1
    How long is a piece of string? We don't know anything about the I/O subsystem (SATA? USB? SCSI?), the volume of data to sift through, the potential structure (depth of tree, branches per level?), how long it's been going for ... do you see io wait in top? How do you know that imaging would have been faster? Are you seeing errors in dmesg (the hard drive(s) could be failing).... Commented Jun 4, 2023 at 21:24
  • 1
    Is the output .txt file growing at all? Commented Jun 4, 2023 at 23:22
  • 1
    The number of used inodes will show the total number of dir/file objects on the whole partition, so is an upper limit on the content of whatever current directory you ran this from. df -i should be practically instantaneous. Commented Jun 4, 2023 at 23:29
  • 1
    @sudodus it's a text only server, and it's not very busy. Find takes minimal CPU usage but using iotop shows roughly 100% IO usage for the find commands. Commented Jun 6, 2023 at 3:34

2 Answers 2

0

Take advantage of rmdir's refusal to delete non-empty directories (see man rmdir), and do something like:

find $YourDisk -depth -type d -print0 | \
    xargs -0 -r \
        rmdir --ignore-fail-on-non-empty --verbose \
            >>empty-folder-deletion-log.txt
3
  • It isn't clear that this would be faster than using -delete from find. Commented Mar 14 at 6:52
  • You might get the advantage of parallelism, @Kusalananda, where the rmdir is busy deleting stuff and the find can be already looking elsewhere. Unchecked and definitely not a validated idea Commented Mar 14 at 9:39
  • 1
    @ChrisDavies I'm thinking that some filesystem-related operations might get duplicated if you run find and rmdir separately. I can't say that I've tested it on a slow disk though. Commented Mar 14 at 9:45
-1

I haven't used -empty myself, but it could potentially be costly. Also, without -depth to do a depth-first search you will only find directories at the bottom level and not recursively delete otherwise-empty directory trees.

I would suggest to use strace -ttt -T -p <find_pid> (and/or ltrace) to see what is taking so long.

3
  • 2
    I'm confused about your -depth comment, the manual says that "The -delete action also implies -depth.". The command is working exactly as expected, the only issue is it's taking a long time. Commented Jun 6, 2023 at 3:29
  • Regarding the strace command I tried it and it just shows many, many repeated function calls all of which complete very quickly. I'm still convinced that the root cause of the issue is random IO. I'm using mechanical hard drives so data read continuously is fast but random IO is slower. I used filefrag and I inspected some folders and I can see that they are all over the place and that the order of the folders returned by find doesn't seem to match the order they are stored on the disk. Commented Jun 6, 2023 at 3:53
  • 1
    @TheMovieMan You don't say what file system you have -- e.g. ext3, xfs. (I still have NFTS partitions from an old dual-boot which I have yet to clean up.) Some use linear directory order, others maintain tree structures. I would expect find to be optimum at reading any kind of directory. As you are only deleting empty directories, there are no file inodes or data blocks, but an empty directory that previously had many files can be full of nulled entries. You are doing a full depth-first treewalk of directories -- this may just be a pathological case with find -depth. Commented Jun 6, 2023 at 8:08

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.