[wp-trac] [WordPress Trac] #64538: memoize wp_normalize_path

WordPress Trac noreply at wordpress.org
Thu Jan 22 00:44:39 UTC 2026


#64538: memoize wp_normalize_path
--------------------------------------+---------------------
 Reporter:  josephscott               |       Owner:  (none)
     Type:  defect (bug)              |      Status:  new
 Priority:  normal                    |   Milestone:  7.0
Component:  General                   |     Version:
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+---------------------

Comment (by dmsnell):

 @josephscott I also noticed that `wp_is_stream()` starts with `strpos(
 $path, '://' )` and yet it seems like we should have some constraints to
 limit this, making the worst-case of inputs needlessly inefficient here.

 in fact, it looks like there could be significant improvement in that
 function and I wonder how much of an impact it would have if you applied
 some optimizations there in your test code.

 {{{#!php
 <?php
 function wp_is_stream( $path ) {
         if ( ! is_string( $path ) || '' === $path ) {
                 return false;
         }

         // `php`, `file`, `http`, `https`? will always be available, or
 else things would break…
         if (
                 1 === strspn( $path, 'hfp', 0, 1 ) &&
                 (
                         str_starts_with( $path, 'http://' ) ||
                         str_starts_with( $path, 'https://' ) ||
                         str_starts_with( $path, 'file://' ) ||
                         str_starts_with( $path, 'php://' )
                 )
         ) {
                 return true;
         }

         // Valid protocol names must contain alphanumerics, dots (.),
 plusses (+), or hyphens (-) only.
         $protocol_length = strspn( $path,
 '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.+-' );
         if ( 0 === $protocol_length || 0 !== substr_compare( $path, '://',
 $protocol_length, 3 ) {
                 return false;
         }

         return in_array( substr( $path, 0, $protocol_length ),
 stream_get_wrappers(), true );
 }
 }}}


 on the other hand, I don’t know how often we expect stream wrappers to
 change. the only place in Core I one in use was in a test file, and the
 plugin directory mostly only shows
 [https://wpdirectory.net/search/01KFHHFVQPD090H4JJ55DSK36J plugins adding
 `guzzle` or `sftp`]. would it make sense to cache `stream_get_wrappers()`
 instead of the paths? I’m not sure if we should generally be passing
 around those paths anyway or if they are limited internally within the
 `vendor`red code.

 {{{#!php
 <?php
 function wp_is_stream( $path ) {
         static $known_schemes = null;

         if ( null === $known_schemes ) {
                 $known_schemes = ' ';
                 foreach ( stream_get_wrappers() as $scheme ) {
                         $known_schemes .= "{$scheme} ";
                 }
         }

         // Valid protocol names must contain alphanumerics, dots (.),
 plusses (+), or hyphens (-) only.
         $protocol_length = strspn( $path,
 '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.+-' );
         if ( 0 === $protocol_length || 0 !== substr_compare( $path, '://',
 $protocol_length, 3 ) {
                 return false;
         }

         $scheme = substr( $path, 0, $protocol_length );

         return str_contains( $known_schemes, " {$scheme} " );
 }
 }}}

 By caching the stream wrappers and eliminating the array code we should up
 with an extremely fast lookup that stays within a few 32-byte cache lines
 on most systems.

 ----

 To summarize:

  - how much are we losing performance-wise by looking for the scheme
 separator at any point in the string vs. anchoring it at the front?
  - how much loss comes in through the array functions?
  - how much of the overhead is calling `stream_get_wrappers()` repeatedly,
 which, if cached, would not be more stale than caching the `$path` results
 but would involve considerably less memory cost. (PHP 8.5.2 on my laptop
 shows 12 schemes of which there are a //total// of 65 characters).

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/64538#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list