Strange PHP memory usage in array of strings

Question

Using package ramsey/uuid I tried generating large amount of uuids v4.

<?php

require __DIR__ . '/vendor/autoload.php';
use Ramsey\Uuid\Uuid;

$initialMemoryUsage = memory_get_usage(true) / 1024 / 1024;
$test = [];

for ($i = 0; $i < 100000; $i++) {
    $test[] = Uuid::uuid4()->toString();
}

var_dump(sprintf('Memory used: %d MB', (memory_get_usage(true) / 1024 / 1024) - $initialMemoryUsage));

outputs: string(18) "Memory used: 10 MB"

<?php

$initialMemoryUsage = memory_get_usage(true) / 1024 / 1024;
$test = [];

for ($i = 0; $i < 100000; $i++) {
    $test[] = '97c2ca84-bcfe-4618-b8a3-4d404eead37a';
}

var_dump(sprintf('Memory used: %d MB', (memory_get_usage(true) / 1024 / 1024) - $initialMemoryUsage));

outputs string(17) "Memory used: 4 MB"

Just invoking uuid generation does not cause any memory increase

for ($i = 0; $i < 100000; $i++) {
    Uuid::uuid4()->toString();
}

How come that in both cases the result is array of string(36) with 100000 elements but amount of used memory differs? Any ideas?

php -v

PHP 7.3.2-3+ubuntu16.04.1+deb.sury.org+1 (cli) (built: Feb  8 2019 15:43:26) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.2, Copyright (c) 1998-2018 Zend Technologies

Interesting - I wonder if there are memory optimisations available when the data is duplicated/uniform. If you were to increment the string in your static example so that each was unique, would it increase memory usage? — OK sure
– OK sure, Commented Oct 20, 2020 at 13:26
@OKsure That's certainly worth checking. It's strange though that in the example with unique generated values it takes up 10MB to hold 100k of 36-byte values. I checked and it seems to scale linearly, e.g. for a million values it took ~100MB and for ten million it took over a gigabyte - over 700MB more than should be necessary judging by simple calculations. — Rafał G.
– Rafał G., Commented Oct 20, 2020 at 13:37
I'm imagining, cause I'm out of my wheelhouse, that the static example can be condensed to only store the indexes and a single value. As an expression, it's very small. As for linear scaling, if it's also holding the indexes then that's going to add some overhead too perhaps — OK sure
– OK sure, Commented Oct 20, 2020 at 13:42

Ulrich Eckhardt · Accepted Answer · 2020-10-26 07:53:27Z

6

+150

Strings in PHP are immutable, which means they can't be changed. This also implies that they can easily be shared. In the first case, you have an array with 100k elements, each referencing a different string. In the second case, you have an array with 100k elements, each referencing the same string.

For further reference, take a look at www.phpinternalsbook.com.

edited Oct 26, 2020 at 7:53

answered Oct 26, 2020 at 7:41

Ulrich Eckhardt

17.7k5 gold badges31 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Olivier Over a year ago

PHP strings are mutable but they use a copy-on-write mechanism (as explained here).

Rafał G. Over a year ago

Yeah, after some experimentation with a function that generated random characters it turns out it's simply how PHP handles memory. It was confusing because for the amount of data I tested the memory used roughly corresponded with the simplistic formula (number of items x 36 bytes). So, the next question was: why is array usage overhead so big in PHP? The answer is here: nikic.github.io/2011/12/12/… - and some help comes from here: php.net/manual/en/class.splfixedarray.php - still 70% overhead, but better than 200%.

Collectives™ on Stack Overflow

Strange PHP memory usage in array of strings

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related