2

Using package ramsey/uuid I tried generating large amount of uuids v4.

<?php

require __DIR__ . '/vendor/autoload.php';
use Ramsey\Uuid\Uuid;

$initialMemoryUsage = memory_get_usage(true) / 1024 / 1024;
$test = [];

for ($i = 0; $i < 100000; $i++) {
    $test[] = Uuid::uuid4()->toString();
}

var_dump(sprintf('Memory used: %d MB', (memory_get_usage(true) / 1024 / 1024) - $initialMemoryUsage));

outputs: string(18) "Memory used: 10 MB"

<?php

$initialMemoryUsage = memory_get_usage(true) / 1024 / 1024;
$test = [];

for ($i = 0; $i < 100000; $i++) {
    $test[] = '97c2ca84-bcfe-4618-b8a3-4d404eead37a';
}

var_dump(sprintf('Memory used: %d MB', (memory_get_usage(true) / 1024 / 1024) - $initialMemoryUsage));

outputs string(17) "Memory used: 4 MB"

Just invoking uuid generation does not cause any memory increase

for ($i = 0; $i < 100000; $i++) {
    Uuid::uuid4()->toString();
}

How come that in both cases the result is array of string(36) with 100000 elements but amount of used memory differs? Any ideas?

php -v

PHP 7.3.2-3+ubuntu16.04.1+deb.sury.org+1 (cli) (built: Feb  8 2019 15:43:26) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.2, Copyright (c) 1998-2018 Zend Technologies
6
  • 2
    Interesting - I wonder if there are memory optimisations available when the data is duplicated/uniform. If you were to increment the string in your static example so that each was unique, would it increase memory usage? Commented Oct 20, 2020 at 13:26
  • @OKsure Indeed, if each is unique memory usage increases. Commented Oct 20, 2020 at 13:37
  • @OKsure That's certainly worth checking. It's strange though that in the example with unique generated values it takes up 10MB to hold 100k of 36-byte values. I checked and it seems to scale linearly, e.g. for a million values it took ~100MB and for ten million it took over a gigabyte - over 700MB more than should be necessary judging by simple calculations. Commented Oct 20, 2020 at 13:37
  • Why keeping them in memory when generators are available? Commented Oct 20, 2020 at 13:39
  • I'm imagining, cause I'm out of my wheelhouse, that the static example can be condensed to only store the indexes and a single value. As an expression, it's very small. As for linear scaling, if it's also holding the indexes then that's going to add some overhead too perhaps Commented Oct 20, 2020 at 13:42

1 Answer 1

6
+150

Strings in PHP are immutable, which means they can't be changed. This also implies that they can easily be shared. In the first case, you have an array with 100k elements, each referencing a different string. In the second case, you have an array with 100k elements, each referencing the same string.

For further reference, take a look at www.phpinternalsbook.com.

Sign up to request clarification or add additional context in comments.

2 Comments

PHP strings are mutable but they use a copy-on-write mechanism (as explained here).
Yeah, after some experimentation with a function that generated random characters it turns out it's simply how PHP handles memory. It was confusing because for the amount of data I tested the memory used roughly corresponded with the simplistic formula (number of items x 36 bytes). So, the next question was: why is array usage overhead so big in PHP? The answer is here: nikic.github.io/2011/12/12/… - and some help comes from here: php.net/manual/en/class.splfixedarray.php - still 70% overhead, but better than 200%.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.