[wp-trac] [WordPress Trac] #55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand" values

WordPress Trac noreply at wordpress.org
Wed Apr 27 21:43:16 UTC 2022

#55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand"
 Reporter:  dmsnell      |      Owner:  (none)
     Type:  defect       |     Status:  new
  (bug)                  |
 Priority:  normal       |  Milestone:  Awaiting Review
Component:  Upload       |    Version:
 Severity:  normal       |   Keywords:  needs-unit-tests has-patch dev-
  Focuses:               |  feedback
 Resolves #17725

 When `wp_convert_hr_to_bytes()` was introduced in
 [https://core.trac.wordpress.org/changeset/4388/ 4388] it provided a
 simplified mechanism to parse the values returned by functions like
 `ini_get()` which represent byte sizes. The over-simplified approach has
 led to issues in that function reporting the wrong byte sizes for various
 php.ini directives, leading to confusing problems such as uploading files
 that are rejected improperly or accepted improperly.

 In this patch we're porting the parser from PHP's own source (which has
 remained stable for decades and probably can't change without major
 breakage) in order to more accurately reflect the values it uses when it
 reads those configurations.

 Unfortunately PHP doesn't offer a mechanism to read its own internal value
 for these fields and a 100% port is extremely cumbersome (at best) due to
 the different ways that PHP and C handle signed integer overflow. These
 differences should only appear when supplying discouraged/invalid values
 to the system anyway, and PHP warns that in these situations things are
 likely to break anyway.

 Over the years this function has been modified a couple of times in ways
 that this patch reverts:
  - [https://core.trac.wordpress.org/changeset/38013 38013] introduced a
 `PHP_INT_MAX` limit in a way that coerces hexadecimal and octal integer
 representations to decimal.
  - [https://core.trac.wordpress.org/changeset/35325 35325] replaced the
 hard-coded byte size with overwritable constants but if there were any
 occasion for someone to change those constants in `wp-config.php` then we
 would actually want to preserve the hard-coded values in
 `wp_convert_hr_to_bytes()` since that function refers to code inside of
 PHP, not inside of WordPress.
  - The original code from [https://core.trac.wordpress.org/changeset/4388/
 4388] looks for the presence of the suffixes //anywhere// within the value
 string and prioritizes `g` over `m` over `k` whereas PHP only looks at the
 last character in the input string (this is something that
 17725.3.diff] got right). This can cause unexpected parses, such as with
 `14gmk` when WordPress interprets it as 14GiB but PHP interprets it as

 Further we do acknowledge the mismatch between PHP's definition of
 "gigabyte"/"megabyte"/"kilobyte" being factors of 1024 apart from each
 other and the standard of being 1000. WordPress follows PHP's convention
 so this is simply noted in the function and preserved.

 This patch introduces new behaviors which might seem unexpected or wrong.
 It's important to consider that this function exists because PHP doesn't
 expose the values it parses from the php.ini directives. Therefore it's
 job in WordPress can be considered to do as best as it can to represent
 what's really happening inside of PHP; this may not match our intuition
 about what PHP should be doing. To that end the over-simplified code for
 the past 16 years has misreported many plausible-looking values like
 `100MB` (which PHP interprets as 100 bytes but WordPress thinks is 100


 In order to fully verify the updated code we have to understand PHP's
 interpretation of the php.ini directive values. One way to do this is to
 set a value, `upload_max_size` for instance, in any number of the possible
 configurable places and then make repeated uploads to see if it's
 rightfully accepted or rejected. This is cumbersome.

 An alternative approach is to compile PHP locally with added
 instrumentation; this is the approach taken in preparing this PR. The
 following patch will report three values every time a "Long" value is
 parsed from a php.ini directive: the shorthand value being parsed, the
 bound `long` value before applying the magnitude suffix, and the possibly-
 overflowed value derived from applying the possible `g`, `m`, and `k`

 diff --git a/Zend/zend_operators.c b/Zend/zend_operators.c
 index 8a0cc813..362cef76 100644
 --- a/Zend/zend_operators.c
 +++ b/Zend/zend_operators.c
 @@ -164,6 +164,9 @@ ZEND_API zend_long ZEND_FASTCALL zend_atol(const char
 *str, size_t str_len) /* {
 +       printf("zend_atol( \"%s\" ) = %lld : %lld\n", str,
 ZEND_STRTOL(str, NULL, 0), retval);
         return (zend_long) retval;
  /* }}} */

 For example, a sampling of values run through PHP produces this output.

 zend_atol( "0" ) = 0 : 0
 zend_atol( "0g" ) = 0 : 0
 zend_atol( "1g" ) = 1 : 1073741824
 zend_atol( "3G" ) = 3 : 3221225472
 zend_atol( "3mg" ) = 3 : 3221225472
 zend_atol( "3km" ) = 3 : 3145728
 zend_atol( "boat" ) = 0 : 0
 zend_atol( "-14k" ) = -14 : -14336
 zend_atol( "-14chairsg" ) = -14 : -15032385536
 zend_atol( "9223372036854775807" ) = 9223372036854775807 :
 zend_atol( "9223372036854775807g" ) = 9223372036854775807 : -1073741824
 zend_atol( "9223372036854775808" ) = 9223372036854775807 :
 zend_atol( "0xt" ) = 0 : 0
 zend_atol( "0x5teak_and_egg" ) = 5 : 5368709120

Ticket URL: <https://core.trac.wordpress.org/ticket/55635>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform

More information about the wp-trac mailing list