[wp-trac] [WordPress Trac] #64538: memoize wp_normalize_path
WordPress Trac
noreply at wordpress.org
Thu Jan 22 00:44:39 UTC 2026
#64538: memoize wp_normalize_path
--------------------------------------+---------------------
Reporter: josephscott | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: 7.0
Component: General | Version:
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests | Focuses:
--------------------------------------+---------------------
Comment (by dmsnell):
@josephscott I also noticed that `wp_is_stream()` starts with `strpos(
$path, '://' )` and yet it seems like we should have some constraints to
limit this, making the worst-case of inputs needlessly inefficient here.
in fact, it looks like there could be significant improvement in that
function and I wonder how much of an impact it would have if you applied
some optimizations there in your test code.
{{{#!php
<?php
function wp_is_stream( $path ) {
if ( ! is_string( $path ) || '' === $path ) {
return false;
}
// `php`, `file`, `http`, `https`? will always be available, or
else things would break…
if (
1 === strspn( $path, 'hfp', 0, 1 ) &&
(
str_starts_with( $path, 'http://' ) ||
str_starts_with( $path, 'https://' ) ||
str_starts_with( $path, 'file://' ) ||
str_starts_with( $path, 'php://' )
)
) {
return true;
}
// Valid protocol names must contain alphanumerics, dots (.),
plusses (+), or hyphens (-) only.
$protocol_length = strspn( $path,
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.+-' );
if ( 0 === $protocol_length || 0 !== substr_compare( $path, '://',
$protocol_length, 3 ) {
return false;
}
return in_array( substr( $path, 0, $protocol_length ),
stream_get_wrappers(), true );
}
}}}
on the other hand, I don’t know how often we expect stream wrappers to
change. the only place in Core I one in use was in a test file, and the
plugin directory mostly only shows
[https://wpdirectory.net/search/01KFHHFVQPD090H4JJ55DSK36J plugins adding
`guzzle` or `sftp`]. would it make sense to cache `stream_get_wrappers()`
instead of the paths? I’m not sure if we should generally be passing
around those paths anyway or if they are limited internally within the
`vendor`red code.
{{{#!php
<?php
function wp_is_stream( $path ) {
static $known_schemes = null;
if ( null === $known_schemes ) {
$known_schemes = ' ';
foreach ( stream_get_wrappers() as $scheme ) {
$known_schemes .= "{$scheme} ";
}
}
// Valid protocol names must contain alphanumerics, dots (.),
plusses (+), or hyphens (-) only.
$protocol_length = strspn( $path,
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.+-' );
if ( 0 === $protocol_length || 0 !== substr_compare( $path, '://',
$protocol_length, 3 ) {
return false;
}
$scheme = substr( $path, 0, $protocol_length );
return str_contains( $known_schemes, " {$scheme} " );
}
}}}
By caching the stream wrappers and eliminating the array code we should up
with an extremely fast lookup that stays within a few 32-byte cache lines
on most systems.
----
To summarize:
- how much are we losing performance-wise by looking for the scheme
separator at any point in the string vs. anchoring it at the front?
- how much loss comes in through the array functions?
- how much of the overhead is calling `stream_get_wrappers()` repeatedly,
which, if cached, would not be more stale than caching the `$path` results
but would involve considerably less memory cost. (PHP 8.5.2 on my laptop
shows 12 schemes of which there are a //total// of 65 characters).
--
Ticket URL: <https://core.trac.wordpress.org/ticket/64538#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list