Skip to content

Conversation

@bwoebi
Copy link
Member

@bwoebi bwoebi commented Nov 19, 2025

This pre-allocates a large string, for usage with concatenations. Users must take care to keep the refcount to 1, if they desire benefiting from this.

Note that it is generally pointless to call str_extend("", $size) (i.e. extending an empty string), given that e.g. concatenation will special case empty strings, and then use the other string. (Which is why not a str_alloc($size), which would be pointless and thrown away during concat op.)

This has a slight performance improvement on the general case of appending a single byte in a loop (given that zend_string_extend now uses perealloc3) of about 8%. In particular zend_string_extend() will mostly run into the fast path of zend_mm_realloc_heap for huge allocations.

When using str_extend(), appending a single byte in a loop is 33% faster than the old baseline.

The tested loop is:

$str = str_extend("a", 1 << 26);
for ($i = 0; $i < 1 << 25; ++$i) {
        $str .= "a";
}

Specifically hyperfine (x.php being the above test script and y.php being the script, but with "a" directly instead of str_extend()):

# hyperfine '/root/php-src-X/baseline-php -dmemory_limit=1G y.php'
Benchmark 1: /root/php-src-X/baseline-php -dmemory_limit=1G y.php
  Time (mean ± σ):     495.3 ms ±  10.2 ms    [User: 348.1 ms, System: 137.8 ms]
  Range (min … max):   478.8 ms … 510.5 ms    10 runs

# hyperfine '/root/php-src-X/sapi/cli/php -dmemory_limit=1G y.php'
Benchmark 1: /root/php-src-X/sapi/cli/php -dmemory_limit=1G y.php
  Time (mean ± σ):     456.2 ms ±   8.4 ms    [User: 298.1 ms, System: 152.5 ms]
  Range (min … max):   443.4 ms … 468.5 ms    10 runs

# hyperfine '/root/php-src-X/sapi/cli/php -dmemory_limit=1G x.php'
Benchmark 1: /root/php-src-X/sapi/cli/php -dmemory_limit=1G x.php
  Time (mean ± σ):     325.7 ms ±   7.5 ms    [User: 288.7 ms, System: 29.7 ms]
  Range (min … max):   312.3 ms … 339.1 ms    10 runs

I haven't looked at improvements / overhead outside of the synthetic case though (not quite sure what public code to test against) - I could observe some improvements in string processing code though.

Ideally this feature may allow optimizations in JIT eventually whereby lightweight appending can be done with a bare capacity counter for looped string appends, compared against the value initially passed to str_extend(), avoiding repeated string extends.

This pre-allocates a large string, for usage with concatenations.
Users must take care to keep the refcount to 1, if they desire benefiting from this.

Note that it is generally pointless to call str_extend("", $size) (i.e. extending an empty string), given that e.g. concatenation will special case empty strings, and then use the other string.
(Which is why not a str_alloc($size), which would be pointless and thrown away during concat op.)

This has a very slight performance improvement on the general case of appending a single byte in a loop (given that zend_string_extend now uses perealloc3) of about 8%.
In particular zend_string_extend() will mostly run into the fast path of zend_mm_realloc_heap for huge allocations.

When using str_extend(), appending a single byte in a loop is 33% faster than the old baseline.

The tested loop is:
$str = str_extend("a", 1 << 26);
for ($i = 0; $i < 1 << 25; ++$i) {
        $str .= "a";
}
@bwoebi
Copy link
Member Author

bwoebi commented Nov 19, 2025

I suppose that test is quite pathological under asan :-D

@bwoebi
Copy link
Member Author

bwoebi commented Nov 21, 2025

Not sure who to ping on this - is this something that interests you @arnaud-lb ?

@arnaud-lb
Copy link
Member

This looks right, but this seems fragile as the user or internals can break the optimization easily (as you have pointed, str_extend("", $n) doesn't work as expected).

I would prefer if we introduced a string builder class, or something like Javascript's typed arrays (which are essentially views of memory buffers, thus are suitable for creating a string builder or manipulating binary data).


Oddly, I'm seeing smaller gains when using str_extend():

// x.php

$str = str_extend("a", 1 << 26);
for ($i = 0; $i < 1 << 25; ++$i) {
        $str .= "a";
}
// y.php

$str = "a";
for ($i = 0; $i < 1 << 25; ++$i) {
        $str .= "a";
}
; hyperfine -L x x,y --warmup 2 '/tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 {x}.php'
Benchmark 1: /tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 x.php
  Time (mean ± σ):     279.9 ms ±   3.2 ms    [User: 251.9 ms, System: 27.3 ms]
  Range (min … max):   274.9 ms … 285.1 ms    10 runs
 
Benchmark 2: /tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 y.php
  Time (mean ± σ):     278.5 ms ±   1.2 ms    [User: 251.9 ms, System: 25.8 ms]
  Range (min … max):   277.1 ms … 280.6 ms    10 runs
 
Summary
  /tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 y.php ran
    1.01 ± 0.01 times faster than /tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 x.php

However I do get similar results when not using str_extend(), which is nice. The erealloc3() change alone may be worth merging. Do you think that we could apply the same optimization for normal realloc, when new size == old size? (I don't fully understand why not entering into zend_mm_realloc_huge() has such a large impact.)

I've triggered a benchmark run: https://github.com/php/php-src/actions/runs/19572237486.


For 1-byte appends, using indexed string access seems faster:

// z.php

$str = str_repeat("\0", 1 << 26);
for ($i = 0; $i < 1 << 25; ++$i) {
  $str[$i] = "a";
}
; hyperfine -L x x,z --warmup 2 '/tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 {x}.php'
Benchmark 1: /tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 x.php
  Time (mean ± σ):     280.4 ms ±   3.6 ms    [User: 253.9 ms, System: 25.7 ms]
  Range (min … max):   276.5 ms … 285.2 ms    10 runs
 
Benchmark 2: /tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 z.php
  Time (mean ± σ):     149.5 ms ±   2.7 ms    [User: 126.0 ms, System: 23.0 ms]
  Range (min … max):   145.2 ms … 153.6 ms    19 runs
 
Summary
  /tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 z.php ran
    1.88 ± 0.04 times faster than /tmp/updt/sapi/cli/php -n -d opcache.enable_cli=1 x.php

@github-actions
Copy link

AWS x86_64 (c7i.24xl)

Attribute Value
Environment aws
Runner host
Instance type c7i.metal-24xl (dedicated)
Architecture x86_64
CPU 48 cores
CPU settings disabled deeper C-states, disabled turbo boost, disabled hyper-threading
RAM 188 GB
Kernel 6.1.158-178.288.amzn2023.x86_64
OS Amazon Linux 2023.9.20251117
GCC 14.2.1
Time 2025-11-21 13:38:19 UTC

Laravel 12.2.0 demo app - 100 consecutive runs, 50 warmups, 100 requests (sec)

PHP Min Max Std dev Rel std dev % Mean Mean diff % Median Median diff % Skew P-value Instr count Memory
PHP - baseline@9762 0.46590 0.46792 0.00045 0.10% 0.46667 0.00% 0.46660 0.00% 0.625 0.999 176184615 44.28 MB
PHP - str_extend 0.46182 0.47114 0.00093 0.20% 0.46845 0.38% 0.46838 0.38% -3.064 0.000 176198318 44.28 MB

Symfony 2.7.0 demo app - 100 consecutive runs, 50 warmups, 100 requests (sec)

PHP Min Max Std dev Rel std dev % Mean Mean diff % Median Median diff % Skew P-value Instr count Memory
PHP - baseline@9762 0.74081 0.75281 0.00146 0.20% 0.74271 0.00% 0.74239 0.00% 3.984 0.999 290370510 40.50 MB
PHP - str_extend 0.74345 0.75338 0.00147 0.20% 0.74531 0.35% 0.74497 0.35% 2.232 0.000 290405702 40.76 MB

Wordpress 6.2 main page - 100 consecutive runs, 20 warmups, 20 requests (sec)

PHP Min Max Std dev Rel std dev % Mean Mean diff % Median Median diff % Skew P-value Instr count Memory
PHP - baseline@9762 0.57802 0.58269 0.00078 0.14% 0.57951 0.00% 0.57942 0.00% 1.087 0.999 1119592809 44.06 MB
PHP - str_extend 0.57741 0.59080 0.00130 0.22% 0.58086 0.23% 0.58070 0.22% 4.608 0.000 1119541608 44.00 MB

bench.php - 100 consecutive runs, 10 warmups, 2 requests (sec)

PHP Min Max Std dev Rel std dev % Mean Mean diff % Median Median diff % Skew P-value Instr count Memory
PHP - baseline@9762 0.42560 0.43950 0.00294 0.69% 0.42899 0.00% 0.42822 0.00% 1.832 0.999 2020586613 26.99 MB
PHP - str_extend 0.42826 0.44240 0.00298 0.69% 0.43168 0.63% 0.43084 0.61% 2.024 0.000 2020983024 26.99 MB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants