[wp-trac] [WordPress Trac] #55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand" values

WordPress Trac noreply at wordpress.org
Thu Apr 28 00:24:15 UTC 2022


#55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand"
values
-------------------------------------------------+-------------------------
 Reporter:  dmsnell                              |       Owner:  (none)
     Type:  defect (bug)                         |      Status:  new
 Priority:  normal                               |   Milestone:  Awaiting
                                                 |  Review
Component:  Upload                               |     Version:
 Severity:  normal                               |  Resolution:
 Keywords:  needs-unit-tests has-patch dev-      |     Focuses:
  feedback                                       |
-------------------------------------------------+-------------------------
Description changed by SergeyBiryukov:

Old description:

> Resolves #17725
>
> When `wp_convert_hr_to_bytes()` was introduced in
> [https://core.trac.wordpress.org/changeset/4388/ 4388] it provided a
> simplified mechanism to parse the values returned by functions like
> `ini_get()` which represent byte sizes. The over-simplified approach has
> led to issues in that function reporting the wrong byte sizes for various
> php.ini directives, leading to confusing problems such as uploading files
> that are rejected improperly or accepted improperly.
>
> In this patch we're porting the parser from PHP's own source (which has
> remained stable for decades and probably can't change without major
> breakage) in order to more accurately reflect the values it uses when it
> reads those configurations.
>
> Unfortunately PHP doesn't offer a mechanism to read its own internal
> value for these fields and a 100% port is extremely cumbersome (at best)
> due to the different ways that PHP and C handle signed integer overflow.
> These differences should only appear when supplying discouraged/invalid
> values to the system anyway, and PHP warns that in these situations
> things are likely to break anyway.
>
> Over the years this function has been modified a couple of times in ways
> that this patch reverts:
>  - [https://core.trac.wordpress.org/changeset/38013 38013] introduced a
> `PHP_INT_MAX` limit in a way that coerces hexadecimal and octal integer
> representations to decimal.
>  - [https://core.trac.wordpress.org/changeset/35325 35325] replaced the
> hard-coded byte size with overwritable constants but if there were any
> occasion for someone to change those constants in `wp-config.php` then we
> would actually want to preserve the hard-coded values in
> `wp_convert_hr_to_bytes()` since that function refers to code inside of
> PHP, not inside of WordPress.
>  - The original code from
> [https://core.trac.wordpress.org/changeset/4388/ 4388] looks for the
> presence of the suffixes //anywhere// within the value string and
> prioritizes `g` over `m` over `k` whereas PHP only looks at the last
> character in the input string (this is something that
> [https://core.trac.wordpress.org/attachment/ticket/17725/17725.3.diff
> 17725.3.diff] got right). This can cause unexpected parses, such as with
> `14gmk` when WordPress interprets it as 14GiB but PHP interprets it as
> 14KiB.
>
> Further we do acknowledge the mismatch between PHP's definition of
> "gigabyte"/"megabyte"/"kilobyte" being factors of 1024 apart from each
> other and the standard of being 1000. WordPress follows PHP's convention
> so this is simply noted in the function and preserved.
>
> This patch introduces new behaviors which might seem unexpected or wrong.
> It's important to consider that this function exists because PHP doesn't
> expose the values it parses from the php.ini directives. Therefore it's
> job in WordPress can be considered to do as best as it can to represent
> what's really happening inside of PHP; this may not match our intuition
> about what PHP should be doing. To that end the over-simplified code for
> the past 16 years has misreported many plausible-looking values like
> `100MB` (which PHP interprets as 100 bytes but WordPress thinks is 100
> MiB).
>
> **Testing**
>
> In order to fully verify the updated code we have to understand PHP's
> interpretation of the php.ini directive values. One way to do this is to
> set a value, `upload_max_size` for instance, in any number of the
> possible configurable places and then make repeated uploads to see if
> it's rightfully accepted or rejected. This is cumbersome.
>
> An alternative approach is to compile PHP locally with added
> instrumentation; this is the approach taken in preparing this PR. The
> following patch will report three values every time a "Long" value is
> parsed from a php.ini directive: the shorthand value being parsed, the
> bound `long` value before applying the magnitude suffix, and the
> possibly-overflowed value derived from applying the possible `g`, `m`,
> and `k` suffixes.
>

> {{{#!diff
> diff --git a/Zend/zend_operators.c b/Zend/zend_operators.c
> index 8a0cc813..362cef76 100644
> --- a/Zend/zend_operators.c
> +++ b/Zend/zend_operators.c
> @@ -164,6 +164,9 @@ ZEND_API zend_long ZEND_FASTCALL zend_atol(const char
> *str, size_t str_len) /* {
>                                 break;
>                 }
>         }
> +
> +       printf("zend_atol( \"%s\" ) = %lld : %lld\n", str,
> ZEND_STRTOL(str, NULL, 0), retval);
> +
>         return (zend_long) retval;
>  }
>  /* }}} */
> }}}
>
> For example, a sampling of values run through PHP produces this output.
>
> {{{#!bash
> zend_atol( "0" ) = 0 : 0
> zend_atol( "0g" ) = 0 : 0
> zend_atol( "1g" ) = 1 : 1073741824
> zend_atol( "3G" ) = 3 : 3221225472
> zend_atol( "3mg" ) = 3 : 3221225472
> zend_atol( "3km" ) = 3 : 3145728
> zend_atol( "boat" ) = 0 : 0
> zend_atol( "-14k" ) = -14 : -14336
> zend_atol( "-14chairsg" ) = -14 : -15032385536
> zend_atol( "9223372036854775807" ) = 9223372036854775807 :
> 9223372036854775807
> zend_atol( "9223372036854775807g" ) = 9223372036854775807 : -1073741824
> zend_atol( "9223372036854775808" ) = 9223372036854775807 :
> 9223372036854775807
> zend_atol( "0xt" ) = 0 : 0
> zend_atol( "0x5teak_and_egg" ) = 5 : 5368709120
> }}}

New description:

 Resolves #17725

 When `wp_convert_hr_to_bytes()` was introduced in [4388] it provided a
 simplified mechanism to parse the values returned by functions like
 `ini_get()` which represent byte sizes. The over-simplified approach has
 led to issues in that function reporting the wrong byte sizes for various
 php.ini directives, leading to confusing problems such as uploading files
 that are rejected improperly or accepted improperly.

 In this patch we're porting the parser from PHP's own source (which has
 remained stable for decades and probably can't change without major
 breakage) in order to more accurately reflect the values it uses when it
 reads those configurations.

 Unfortunately PHP doesn't offer a mechanism to read its own internal value
 for these fields and a 100% port is extremely cumbersome (at best) due to
 the different ways that PHP and C handle signed integer overflow. These
 differences should only appear when supplying discouraged/invalid values
 to the system anyway, and PHP warns that in these situations things are
 likely to break anyway.

 Over the years this function has been modified a couple of times in ways
 that this patch reverts:
  - [38013] introduced a `PHP_INT_MAX` limit in a way that coerces
 hexadecimal and octal integer representations to decimal.
  - [35325] replaced the hard-coded byte size with overwritable constants
 but if there were any occasion for someone to change those constants in
 `wp-config.php` then we would actually want to preserve the hard-coded
 values in `wp_convert_hr_to_bytes()` since that function refers to code
 inside of PHP, not inside of WordPress.
  - The original code from [4388] looks for the presence of the suffixes
 //anywhere// within the value string and prioritizes `g` over `m` over `k`
 whereas PHP only looks at the last character in the input string (this is
 something that
 [https://core.trac.wordpress.org/attachment/ticket/17725/17725.3.diff
 17725.3.diff] got right). This can cause unexpected parses, such as with
 `14gmk` when WordPress interprets it as 14GiB but PHP interprets it as
 14KiB.

 Further we do acknowledge the mismatch between PHP's definition of
 "gigabyte"/"megabyte"/"kilobyte" being factors of 1024 apart from each
 other and the standard of being 1000. WordPress follows PHP's convention
 so this is simply noted in the function and preserved.

 This patch introduces new behaviors which might seem unexpected or wrong.
 It's important to consider that this function exists because PHP doesn't
 expose the values it parses from the php.ini directives. Therefore it's
 job in WordPress can be considered to do as best as it can to represent
 what's really happening inside of PHP; this may not match our intuition
 about what PHP should be doing. To that end the over-simplified code for
 the past 16 years has misreported many plausible-looking values like
 `100MB` (which PHP interprets as 100 bytes but WordPress thinks is 100
 MiB).

 **Testing**

 In order to fully verify the updated code we have to understand PHP's
 interpretation of the php.ini directive values. One way to do this is to
 set a value, `upload_max_size` for instance, in any number of the possible
 configurable places and then make repeated uploads to see if it's
 rightfully accepted or rejected. This is cumbersome.

 An alternative approach is to compile PHP locally with added
 instrumentation; this is the approach taken in preparing this PR. The
 following patch will report three values every time a "Long" value is
 parsed from a php.ini directive: the shorthand value being parsed, the
 bound `long` value before applying the magnitude suffix, and the possibly-
 overflowed value derived from applying the possible `g`, `m`, and `k`
 suffixes.


 {{{#!diff
 diff --git a/Zend/zend_operators.c b/Zend/zend_operators.c
 index 8a0cc813..362cef76 100644
 --- a/Zend/zend_operators.c
 +++ b/Zend/zend_operators.c
 @@ -164,6 +164,9 @@ ZEND_API zend_long ZEND_FASTCALL zend_atol(const char
 *str, size_t str_len) /* {
                                 break;
                 }
         }
 +
 +       printf("zend_atol( \"%s\" ) = %lld : %lld\n", str,
 ZEND_STRTOL(str, NULL, 0), retval);
 +
         return (zend_long) retval;
  }
  /* }}} */
 }}}

 For example, a sampling of values run through PHP produces this output.

 {{{#!bash
 zend_atol( "0" ) = 0 : 0
 zend_atol( "0g" ) = 0 : 0
 zend_atol( "1g" ) = 1 : 1073741824
 zend_atol( "3G" ) = 3 : 3221225472
 zend_atol( "3mg" ) = 3 : 3221225472
 zend_atol( "3km" ) = 3 : 3145728
 zend_atol( "boat" ) = 0 : 0
 zend_atol( "-14k" ) = -14 : -14336
 zend_atol( "-14chairsg" ) = -14 : -15032385536
 zend_atol( "9223372036854775807" ) = 9223372036854775807 :
 9223372036854775807
 zend_atol( "9223372036854775807g" ) = 9223372036854775807 : -1073741824
 zend_atol( "9223372036854775808" ) = 9223372036854775807 :
 9223372036854775807
 zend_atol( "0xt" ) = 0 : 0
 zend_atol( "0x5teak_and_egg" ) = 5 : 5368709120
 }}}

--

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/55635#comment:1>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list