Find command very slow in directory with tons of files and subdirectories

Question

For about a week I've been waiting for the following command to complete

find -type d -empty -print -delete >> empty-folder-deletion-log.txt

but it is still going and it seems like it has a lot of work left to do. I feel I could have already imaged the entire hard drive several times by now so I feel the command isn't optimal or there is a way to speed this up.

To be fair there are a huge number of files and folders. One hard drive has roughly 100 million inodes and another has roughly 175 million inodes.

What can I do to speed this up? I suspect that the issue is related to random IO, is there an equivalent command which could do it in order of the way the metadata is stored on the hard drive reducing seeking? What options do I have to speed this up without interrupting the command or is there a way to retry with a different program? Would remounting with noatime help? If so may I do this without interrupting the command?

I'm using the latest version of Debian stable which at the time of writing is Debian bullseye. This command is being ran on two hard drives a 8 TB hard drive and a 12 TB hard drive, both are 5400 RPM.

Did you inspect your system while this find process is running (check that it is really running and not waiting for something)? Is it a server (with only text screen) or a desktop system? Is the system doing something else, that would keep it busy at the same time or is this the main process running? — sudodus
– sudodus, Commented Jun 4, 2023 at 19:23
How long is a piece of string? We don't know anything about the I/O subsystem (SATA? USB? SCSI?), the volume of data to sift through, the potential structure (depth of tree, branches per level?), how long it's been going for ... do you see io wait in top? How do you know that imaging would have been faster? Are you seeing errors in dmesg (the hard drive(s) could be failing).... — tink
– tink, Commented Jun 4, 2023 at 21:24
The number of used inodes will show the total number of dir/file objects on the whole partition, so is an upper limit on the content of whatever current directory you ran this from. df -i should be practically instantaneous. — Paul_Pedant
– Paul_Pedant, Commented Jun 4, 2023 at 23:29
@sudodus it's a text only server, and it's not very busy. Find takes minimal CPU usage but using iotop shows roughly 100% IO usage for the find commands. — The Movie Man
– The Movie Man, Commented Jun 6, 2023 at 3:34

waltinator · Accepted Answer · 2025-02-06 19:26:42Z

0

Take advantage of rmdir's refusal to delete non-empty directories (see man rmdir), and do something like:

find $YourDisk -depth -type d -print0 | \
    xargs -0 -r \
        rmdir --ignore-fail-on-non-empty --verbose \
            >>empty-folder-deletion-log.txt

answered Feb 6 at 19:26

waltinator

6,8921 gold badge25 silver badges27 bronze badges

It isn't clear that this would be faster than using -delete from find.

Kusalananda
– Kusalananda ♦

2025-03-14 06:52:47 +00:00
Commented Mar 14 at 6:52
You might get the advantage of parallelism, @Kusalananda, where the rmdir is busy deleting stuff and the find can be already looking elsewhere. Unchecked and definitely not a validated idea

Chris Davies
– Chris Davies

2025-03-14 09:39:15 +00:00
Commented Mar 14 at 9:39
1

@ChrisDavies I'm thinking that some filesystem-related operations might get duplicated if you run find and rmdir separately. I can't say that I've tested it on a slow disk though.

Kusalananda
– Kusalananda ♦

2025-03-14 09:45:08 +00:00
Commented Mar 14 at 9:45

Add a comment |

LustreOne · Accepted Answer · 2023-06-05 22:11:17Z

-1

I haven't used -empty myself, but it could potentially be costly. Also, without -depth to do a depth-first search you will only find directories at the bottom level and not recursively delete otherwise-empty directory trees.

I would suggest to use strace -ttt -T -p <find_pid> (and/or ltrace) to see what is taking so long.

answered Jun 5, 2023 at 22:11

LustreOne

2,0029 silver badges20 bronze badges

2

I'm confused about your -depth comment, the manual says that "The -delete action also implies -depth.". The command is working exactly as expected, the only issue is it's taking a long time.

The Movie Man
– The Movie Man

2023-06-06 03:29:23 +00:00
Commented Jun 6, 2023 at 3:29
Regarding the strace command I tried it and it just shows many, many repeated function calls all of which complete very quickly. I'm still convinced that the root cause of the issue is random IO. I'm using mechanical hard drives so data read continuously is fast but random IO is slower. I used filefrag and I inspected some folders and I can see that they are all over the place and that the order of the folders returned by find doesn't seem to match the order they are stored on the disk.

The Movie Man
– The Movie Man

2023-06-06 03:53:27 +00:00
Commented Jun 6, 2023 at 3:53
1

@TheMovieMan You don't say what file system you have -- e.g. ext3, xfs. (I still have NFTS partitions from an old dual-boot which I have yet to clean up.) Some use linear directory order, others maintain tree structures. I would expect find to be optimum at reading any kind of directory. As you are only deleting empty directories, there are no file inodes or data blocks, but an empty directory that previously had many files can be full of nulled entries. You are doing a full depth-first treewalk of directories -- this may just be a pathological case with find -depth.

Paul_Pedant
– Paul_Pedant

2023-06-06 08:08:35 +00:00
Commented Jun 6, 2023 at 8:08

Add a comment |

Stack Exchange Network

Find command very slow in directory with tons of files and subdirectories

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Find command very slow in directory with tons of files and subdirectories

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions