-
Notifications
You must be signed in to change notification settings - Fork 8k
Add str_extend() #20527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add str_extend() #20527
Conversation
This pre-allocates a large string, for usage with concatenations.
Users must take care to keep the refcount to 1, if they desire benefiting from this.
Note that it is generally pointless to call str_extend("", $size) (i.e. extending an empty string), given that e.g. concatenation will special case empty strings, and then use the other string.
(Which is why not a str_alloc($size), which would be pointless and thrown away during concat op.)
This has a very slight performance improvement on the general case of appending a single byte in a loop (given that zend_string_extend now uses perealloc3) of about 8%.
In particular zend_string_extend() will mostly run into the fast path of zend_mm_realloc_heap for huge allocations.
When using str_extend(), appending a single byte in a loop is 33% faster than the old baseline.
The tested loop is:
$str = str_extend("a", 1 << 26);
for ($i = 0; $i < 1 << 25; ++$i) {
$str .= "a";
}
|
I suppose that test is quite pathological under asan :-D |
|
Not sure who to ping on this - is this something that interests you @arnaud-lb ? |
|
This looks right, but this seems fragile as the user or internals can break the optimization easily (as you have pointed, I would prefer if we introduced a string builder class, or something like Javascript's typed arrays (which are essentially views of memory buffers, thus are suitable for creating a string builder or manipulating binary data). Oddly, I'm seeing smaller gains when using // x.php
$str = str_extend("a", 1 << 26);
for ($i = 0; $i < 1 << 25; ++$i) {
$str .= "a";
}// y.php
$str = "a";
for ($i = 0; $i < 1 << 25; ++$i) {
$str .= "a";
}However I do get similar results when not using I've triggered a benchmark run: https://github.com/php/php-src/actions/runs/19572237486. For 1-byte appends, using indexed string access seems faster: // z.php
$str = str_repeat("\0", 1 << 26);
for ($i = 0; $i < 1 << 25; ++$i) {
$str[$i] = "a";
} |
AWS x86_64 (c7i.24xl)
Laravel 12.2.0 demo app - 100 consecutive runs, 50 warmups, 100 requests (sec)
Symfony 2.7.0 demo app - 100 consecutive runs, 50 warmups, 100 requests (sec)
Wordpress 6.2 main page - 100 consecutive runs, 20 warmups, 20 requests (sec)
bench.php - 100 consecutive runs, 10 warmups, 2 requests (sec)
|
This pre-allocates a large string, for usage with concatenations. Users must take care to keep the refcount to 1, if they desire benefiting from this.
Note that it is generally pointless to call str_extend("", $size) (i.e. extending an empty string), given that e.g. concatenation will special case empty strings, and then use the other string. (Which is why not a str_alloc($size), which would be pointless and thrown away during concat op.)
This has a slight performance improvement on the general case of appending a single byte in a loop (given that zend_string_extend now uses perealloc3) of about 8%. In particular zend_string_extend() will mostly run into the fast path of zend_mm_realloc_heap for huge allocations.
When using str_extend(), appending a single byte in a loop is 33% faster than the old baseline.
The tested loop is:
Specifically hyperfine (x.php being the above test script and y.php being the script, but with "a" directly instead of str_extend()):
I haven't looked at improvements / overhead outside of the synthetic case though (not quite sure what public code to test against) - I could observe some improvements in string processing code though.
Ideally this feature may allow optimizations in JIT eventually whereby lightweight appending can be done with a bare capacity counter for looped string appends, compared against the value initially passed to str_extend(), avoiding repeated string extends.