[wp-trac] [WordPress Trac] #55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand" values
WordPress Trac
noreply at wordpress.org
Wed Apr 27 21:43:16 UTC 2022
#55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand"
values
-------------------------+-------------------------------------------------
Reporter: dmsnell | Owner: (none)
Type: defect | Status: new
(bug) |
Priority: normal | Milestone: Awaiting Review
Component: Upload | Version:
Severity: normal | Keywords: needs-unit-tests has-patch dev-
Focuses: | feedback
-------------------------+-------------------------------------------------
Resolves #17725
When `wp_convert_hr_to_bytes()` was introduced in
[https://core.trac.wordpress.org/changeset/4388/ 4388] it provided a
simplified mechanism to parse the values returned by functions like
`ini_get()` which represent byte sizes. The over-simplified approach has
led to issues in that function reporting the wrong byte sizes for various
php.ini directives, leading to confusing problems such as uploading files
that are rejected improperly or accepted improperly.
In this patch we're porting the parser from PHP's own source (which has
remained stable for decades and probably can't change without major
breakage) in order to more accurately reflect the values it uses when it
reads those configurations.
Unfortunately PHP doesn't offer a mechanism to read its own internal value
for these fields and a 100% port is extremely cumbersome (at best) due to
the different ways that PHP and C handle signed integer overflow. These
differences should only appear when supplying discouraged/invalid values
to the system anyway, and PHP warns that in these situations things are
likely to break anyway.
Over the years this function has been modified a couple of times in ways
that this patch reverts:
- [https://core.trac.wordpress.org/changeset/38013 38013] introduced a
`PHP_INT_MAX` limit in a way that coerces hexadecimal and octal integer
representations to decimal.
- [https://core.trac.wordpress.org/changeset/35325 35325] replaced the
hard-coded byte size with overwritable constants but if there were any
occasion for someone to change those constants in `wp-config.php` then we
would actually want to preserve the hard-coded values in
`wp_convert_hr_to_bytes()` since that function refers to code inside of
PHP, not inside of WordPress.
- The original code from [https://core.trac.wordpress.org/changeset/4388/
4388] looks for the presence of the suffixes //anywhere// within the value
string and prioritizes `g` over `m` over `k` whereas PHP only looks at the
last character in the input string (this is something that
[https://core.trac.wordpress.org/attachment/ticket/17725/17725.3.diff
17725.3.diff] got right). This can cause unexpected parses, such as with
`14gmk` when WordPress interprets it as 14GiB but PHP interprets it as
14KiB.
Further we do acknowledge the mismatch between PHP's definition of
"gigabyte"/"megabyte"/"kilobyte" being factors of 1024 apart from each
other and the standard of being 1000. WordPress follows PHP's convention
so this is simply noted in the function and preserved.
This patch introduces new behaviors which might seem unexpected or wrong.
It's important to consider that this function exists because PHP doesn't
expose the values it parses from the php.ini directives. Therefore it's
job in WordPress can be considered to do as best as it can to represent
what's really happening inside of PHP; this may not match our intuition
about what PHP should be doing. To that end the over-simplified code for
the past 16 years has misreported many plausible-looking values like
`100MB` (which PHP interprets as 100 bytes but WordPress thinks is 100
MiB).
**Testing**
In order to fully verify the updated code we have to understand PHP's
interpretation of the php.ini directive values. One way to do this is to
set a value, `upload_max_size` for instance, in any number of the possible
configurable places and then make repeated uploads to see if it's
rightfully accepted or rejected. This is cumbersome.
An alternative approach is to compile PHP locally with added
instrumentation; this is the approach taken in preparing this PR. The
following patch will report three values every time a "Long" value is
parsed from a php.ini directive: the shorthand value being parsed, the
bound `long` value before applying the magnitude suffix, and the possibly-
overflowed value derived from applying the possible `g`, `m`, and `k`
suffixes.
{{{#!diff
diff --git a/Zend/zend_operators.c b/Zend/zend_operators.c
index 8a0cc813..362cef76 100644
--- a/Zend/zend_operators.c
+++ b/Zend/zend_operators.c
@@ -164,6 +164,9 @@ ZEND_API zend_long ZEND_FASTCALL zend_atol(const char
*str, size_t str_len) /* {
break;
}
}
+
+ printf("zend_atol( \"%s\" ) = %lld : %lld\n", str,
ZEND_STRTOL(str, NULL, 0), retval);
+
return (zend_long) retval;
}
/* }}} */
}}}
For example, a sampling of values run through PHP produces this output.
{{{#!bash
zend_atol( "0" ) = 0 : 0
zend_atol( "0g" ) = 0 : 0
zend_atol( "1g" ) = 1 : 1073741824
zend_atol( "3G" ) = 3 : 3221225472
zend_atol( "3mg" ) = 3 : 3221225472
zend_atol( "3km" ) = 3 : 3145728
zend_atol( "boat" ) = 0 : 0
zend_atol( "-14k" ) = -14 : -14336
zend_atol( "-14chairsg" ) = -14 : -15032385536
zend_atol( "9223372036854775807" ) = 9223372036854775807 :
9223372036854775807
zend_atol( "9223372036854775807g" ) = 9223372036854775807 : -1073741824
zend_atol( "9223372036854775808" ) = 9223372036854775807 :
9223372036854775807
zend_atol( "0xt" ) = 0 : 0
zend_atol( "0x5teak_and_egg" ) = 5 : 5368709120
}}}
--
Ticket URL: <https://core.trac.wordpress.org/ticket/55635>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list