[wp-trac] [WordPress Trac] #55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand" values
WordPress Trac
noreply at wordpress.org
Thu Apr 28 03:19:19 UTC 2022
#55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand"
values
-------------------------------------------------+-------------------------
Reporter: dmsnell | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting
| Review
Component: Upload | Version:
Severity: normal | Resolution:
Keywords: needs-unit-tests has-patch dev- | Focuses:
feedback changes-requested |
-------------------------------------------------+-------------------------
Comment (by dmsnell):
Thank you @SergeyBiryukov; I wasn't sure if those constants could be
redefined but I figured there was a reasonable case for doing so, that
being to avoid confusion with the way PHP and WordPress use the backwards
definitions. That is, maybe you want your site to show a 400MB image as
400MB and not as 381 (because the computer will show it as 400, a phone
will show it as that, etc…).
I get that it could be rare and we can blame someone for changing this
value if their site breaks, but on the other hand I figured it was one of
those things where it's just as simple to do it right and it was adding
the abstraction that introduced the potential for a bug, why not just keep
it simple and accurate. Again, maybe we want a different function if
people don't agree on the purpose of `wp_convert_hr_to_bytes()`, but its
sole purpose is reporting how many bytes are represented by a given
php.ini directive, and that's actually what led me to discover this
discrepancy, so my patch proposed doing that.
@costdev I'm definitely not up on WordPress coding conventions but did you
see that your patch stripped away the PHPdoc information from my patch?
With those docblock style comments various IDEs and tools do provide
helpful inline documentation whereas without them that disappears. If we
aren't allowing them (of which I see literally thousands in the existing
code so I think we are) then maybe we should remove them entirely from
this patch.
If it's something more superficial, such as the linter expected them to be
multi-line comments I think it would be better to fix the tool or add the
newlines because those comments aren't that useful if they're detached
from the variables they describe.
> Trying to access array offset on value of type int when an integer is
passed. $value should be cast to string.
Is this an error from real code or possibly just from a test? We can
always add extra runtime type-checking and casting, though it seems like
the intention here is to only be called with data from `ini_get()` which
only returns string values (or `false`, which I suppose would be good to
guard against). I checked and I don't believe any code in Core should be
passing an integer so maybe there's buggy test code out there doing that
thinking that this function wants an integer?
> Test value: '128m ' (note the trailing whitespace)
> Expected result: 134217728
> Actual result: 128
As noted in the patch description and in the patch itself I expect some
existing assumptions to break because in this case we have an invalid test
and I'll need to make sure to update the test to reflect that. WordPress
has been reporting `128m ` as 128 MiB but PHP will have limited the actual
size to 128 bytes and so it will reject basically every file uploaded and
the errors will be confusing: "Your upload limit is 128 MB, your 15 KB
file could not be uploaded because it's too big."
Thanks for pointing out the style and lint issues and also the test. It's
my plan to add tests, but I couldn't get to it today.
Concerning the list of `$digits` do we have a policy for excluding the
formatting rule? I personally found it more distracting as a large multi-
line list and so am hoping we can exclude that array for the sake of not
distracting what's going on in the rest of the function.
> I absolutely hate the idea of writing PHP in the C style, but ultimately
love the reason for doing so.
I love the feedback @dd32 :) By the way the goal wasn't to write in C
style but to simply have WordPress agree with PHP for the values of these
memory sizes. When I started my code looked much more like your patch but
I had to keep unwrapping it. In my head I keep going back to the idea:
this isn't about telling PHP what to do so much as it is to properly read
the values of those configs, though I'm confused on why PHP doesn't expose
a way to read them.
> GB_IN_BYTES can technically be changed, but if someone does that they
deserve it.
Is it worth punishing someone who makes a mistake, or punishing the users
of their site for their mishap, when doing so requires adding a layer of
abstraction which can introduce a bug that isn't present in the simpler
code? Case in point, 55635.3.1.diff has two typos (`BYES` instead of
`BYTES`); granted, they are there when pulling those constants back in
after I removed them, but in older versions of PHP that would have
returned the string `MB_IN_BYES` instead of the constant or instead of
throwing an error.
> The leading whitespace checks aren't working, as it's using single-
quoted strings, so looking for a literal '\t' 2-character string
Very nice catch 🙇♂️. I did my local testing with spaces but not with
tabs yet. It's my intention to add an illustrative test suite to this to
demonstrate cases just like this.
> Isn't written in PHP coding style
I'm happy to update things to match WordPress style. Is this more a
comment about style rules or about it not reading like other PHP tends to
look? While certain aspects of this are more complicated than using
`(int)` or `intval()`, they are written because of quirks on one or both
sides of PHP/WordPress and are there to avoid knowingly misreporting
memory size values.
There's one other approach I considered which was to build the scalar
value by joining the digits as we parse them and then use PHP casting or
`intval()` to do the numeric parsing. That's still an option but I want to
avoid using `(int)` or other approaches on the input value because (as you
noted) it incorporates a different semantic for how it parses numbers. If
the goal is to be generally accurate enough for common configurations I
don't think this patch is necessary, but if the goal is to accurately
reflect the configured value then I disagree that it's good enough to be
right most of the time and wrong in a broad number of situations. I'm
particularly sensitive to this context since it could be easy to craft
particular values that don't show errors and which WordPress shows as
being reasonable but for which PHP has a very different value. The broken
test-case that failed is an example, where WordPress is off by a factor of
a million and it's the difference between allowing fairly large uploads
and practically disabling all uploads albeit with a very conflicted error
message. Also different versions of PHP result in different behaviors for
`intval()` and some of those changes are as recent as 7.1. For example,
`1e6` is //one byte//, not as //one million bytes// as `intval()` on PHP
8.1 is reporting it.
A RegExp was another approach I considered but felt was starting to stuff
too much complexity into the code and too divergent from what's going on
in PHP under the hood, and brought in more overhead than we need. At least
with this incantation there's a general match between the purpose of the
function and the algorithm taken to parse the config values. Should PHP
decide to change this for some reason it should remain fairly
straightforward to update the WordPress side and not have to rethink
abstractions or new concepts introduced to cover the style differences.
> I don't think it needs to mirror PHP's parsing for a specific config
Noted: I disagree :) Again, if we don't particularly care about the
accuracy I don't see the point in fixing a small number of additional
cases while leaving a very wide door for very misleading parses. Obviously
I too have a subjective threshold because I've left the overflow cases out
of this. Those cases seem a bit different though for these reasons:
- The PHP code involves some amount of undefined behavior and so it may
not be possible to be generally accurate in those cases; a fix on my
machine may not also fix a build with another compiler on another
architecture or system or OS.
- The PHP code issues a warning against using these incorrect values.
- Writing the PHP to simulate the integer overflow was giving me quite
the headache and I think it would involve quite a bit more complexity than
the rest of this patch, which is mostly scanning through the string
character by character in a single pass and pulling out values of
interest.
We actually know that the digit parsing won't overflow because we have a
defined behavior in `strtol()`. If there's no final `g`, `m`, or `k` value
then I believe this should be a 100% match. We should be able to easily
detect //if// an overflow should occur, but we don't have a way to respond
to that that's clearly //right// (and while it seems reasonable to return
`0` in such cases that also is more of a feeling than something that
corresponds to the reality of the system).
> Additional edit: If you want to test the PHP parsing
Nice! Obviously more helpful for smaller values than ones like
`9223372036854775807` (64-bit max integer) but doesn't require compiling
PHP.
---
Obviously I'll need to fix the bugs mentioned above. Thank you all for
sharing your feedback here. Hope you don't mind the wall of text; it was
my intention to share the context behind why I wrote what I did and the
reasoning leading my thoughts.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/55635#comment:5>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list