27

I have a question concerning the find command in Linux.

In all the articles I've found online it says that attribute -size -10M, for example, returns files that are less than 10 MB in size. But when I tried to test this, it seems that -size -10M returns files that are less than or equal 9 MB in size.

If I do

find . -type f -size -1M

the find command returns only empty files (the unit is irrelevant, it can be -1G, -1k...).

find . -type f -size -2M

returns files <= 1M in size, etc.

The man page says:

Bear in mind that the size is rounded up to the next unit. Therefore -size -1M is not equivalent to -size -1048576c. The former only matches empty files, the latter matches files from 0 to 1,048,575 bytes.

Ok, so I guess -1M is rounded to 0M, -2M to -1M and so on... ?

But then

find . -type f -size 1M

returns files <= 1M (i.e. 100K and 512K files, but not empty files), while I would expect it to return files that are exactly 1M in size.

find . -type f -size 2M

returns files > 1M and <= 2M, etc.

Is this all normal or am I doing something wrong and what's the exact behavior of the -size parameter?

0

2 Answers 2

32

The GNU find man page says as follows — and this appears specific to GNU find, other implementations may differ, see below:

The + and - prefixes signify greater than and less than, as usual; i.e., an exact size of n units does not match. Bear in mind that the size is rounded up to the next unit. Therefore -size -1M is not equivalent to -size -1048576c. The former only matches empty files, the latter matches files from 0 to 1,048,575 bytes.

Question:

Ok, so I guess -1M is rounded to 0M, -2M to -1M and so on... ?

No. It's not the limit in the -size condition that's rounded, but the file size itself.

Take a file of 1234 bytes and a -size -1M directive. The file size is rounded up to the nearest unit mentioned in the directive, here, MBs. 1234 B -> 1 MB. That doesn't match the condition, since -size -1M demands less than 1 MB (after this rounding). So, indeed, -size -1x for any x returns only empty files.

Similarly, -size 1M would match the above file, since after rounding, it's exactly 1 MB in size. (it would match any file with size 1 B to 1048576 B.) On the other hand, -size 1k would not match, since 1234 B rounds to 2 kB.

Note that the - or + in front of the number in the condition is irrelevant for the rounding behaviour.

It may be useful to just always specify the sizes in bytes, since that way there's no rounding to stumble on. -size -$((1024*1024))c will reliably find files that are strictly less than 1 MB (or 1 MiB, if you will) in size. If you want a range, you can use e.g. ( -size +$((512*1024-1))c -size -$((1024*1024+1))c ) for files within [512 kB, 1024 kB].

Another question on this: Why does `find -size -1G` not find any files?


Gilles mentions in that linked question the fact that POSIX only specifies -size N as meaning size in 512-byte blocks (rounded as above: "the file size in bytes, divided by 512 and rounded up to the next integer"), and -size Nc as meaning the size in bytes. Both with the optional plus or minus. The others are left unspecified, and not all find implementations recognize other prefixes, or round like GNU find does.

I tested with Busybox and the *BSD find on my Mac, and it seems they treat conditions with size specifiers in a way that feels more sensible, i.e. -size -1k matches files from 0 to 1023 bytes, the same as -size -1024c, and similarly for -size -1M == -size -1024k (Busybox only has c, b and k). Then again, Busybox doesn't seem to do the rounding even for sizes specified in blocks, against what the POSIX text seems to say it should.

So, YMMV and again, maybe better to stick with sizes in bytes.


Note that there's a similar issue with the -atime, -mtime and -ctime conditions:

-atime n
File was last accessed n*24 hours ago. When find figures out how many 24-hour periods ago the file was last accessed, any fractional part is ignored, so to match -atime +1, a file has to have been accessed at least two days ago.

And similarly, it may be easier to just use -amin +$((24*60-1)) to find files that have been last accessed at least a full 24 h ago. (Up to rounding to a minute, which you can't get rid of.)

See also: Why does find -mtime +1 only return files older than 2 days?


Is this all normal or am I doing something wrong and what's the exact behavior of the -size parameter?

It's "normal" as far as the behaviour of GNU find is concerned, but I wouldn't call it exactly sensible. You're not wrong to be confused, it's find that is confusing.

16
  • 1
    Thanks a lot! The key for me was understanding that the file size is being rounded, not the limit set in the -size condition, which is not very clear from the man page... Commented Mar 9, 2021 at 11:09
  • Could you say something about the implementations of find that differs in this behavior (how they differ)? Commented Mar 9, 2021 at 11:18
  • 1
    OpenBSD find has a -size that only follows the POSIX spec. No other suffixes allowed than c. I haven't looked exactly how -ctime etc. works, but I know there is a difference there too. Commented Mar 9, 2021 at 11:54
  • 1
    @Andrew, mmm. Well, like you said, -size #M counts megabyte blocks, rounded up, so, -size -1M matches anything that rounded up to less than one, i.e. empty files. So, -not -size -1M gives you any non-empty files, even those just 1 byte long. I think that's a rather significant difference compared to >= 1M. Now, if you were to do -not -size -1024k, you'd get files of 1023*1024+1 bytes or larger, so closer to 1 M, but still including some file sizes less than 1 M. (Again, note that this applies only to GNU find, which is what the discussion was about.) Commented Sep 6, 2024 at 5:44
  • 1
    @Andrew, another issue there is that -size 1M etc. works differently in different implementations of find, so if you use it to your liking on one system, you might get different results when transferring to other systems. So yeah, I do think it's problematic, and in this case, it's because GNU effed it up. Other issues in the shell languages and the command line utilities are a wider discussion. Most of those are because of historical reasons and backwards compatibility over what, 40+ years? And the fact that filenames can be anything, which I'd change immediately if it was up to me. Commented Sep 6, 2024 at 8:47
2

Answer from find manual, -size section:

The + and - prefixes signify greater than and less than, as usual; i.e., an exact size of n units does not match. Bear in mind that the size is rounded up to the next unit. Therefore -size -1M is not equivalent to -size -1048576c. The former only matches empty files, the latter matches files from 0 to 1,048,575 bytes.

So in each situation mentioned in question there is a matter of rounding up size to nearest unit BEFORE comparing it with size argument. If -size is using "M" as unit, then everything is being rounded up to Megabytes.

2
  • 1
    Isn't that the part of the manual the question already quoted? Commented Mar 9, 2021 at 10:39
  • @ilkkachu Emphasis on "size rounded up" explains observed behaviour consistently. It was better to quote entire context from manual. Commented Mar 9, 2021 at 10:48

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.