[wp-trac] [WordPress Trac] #55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand" values

Thu Apr 28 03:19:19 UTC 2022

#55635: wp_convert_hr_to_bytes() report correct byte sizes for php.ini "shorthand"
values
-------------------------------------------------+-------------------------
 Reporter:  dmsnell                              |       Owner:  (none)
     Type:  defect (bug)                         |      Status:  new
 Priority:  normal                               |   Milestone:  Awaiting
                                                 |  Review
Component:  Upload                               |     Version:
 Severity:  normal                               |  Resolution:
 Keywords:  needs-unit-tests has-patch dev-      |     Focuses:
  feedback changes-requested                     |
-------------------------------------------------+-------------------------

Comment (by dmsnell):

 Thank you @SergeyBiryukov; I wasn't sure if those constants could be
 redefined but I figured there was a reasonable case for doing so, that
 being to avoid confusion with the way PHP and WordPress use the backwards
 definitions. That is, maybe you want your site to show a 400MB image as
 400MB and not as 381 (because the computer will show it as 400, a phone
 will show it as that, etc…).

 I get that it could be rare and we can blame someone for changing this
 value if their site breaks, but on the other hand I figured it was one of
 those things where it's just as simple to do it right and it was adding
 the abstraction that introduced the potential for a bug, why not just keep
 it simple and accurate. Again, maybe we want a different function if
 people don't agree on the purpose of `wp_convert_hr_to_bytes()`, but its
 sole purpose is reporting how many bytes are represented by a given
 php.ini directive, and that's actually what led me to discover this
 discrepancy, so my patch proposed doing that.

 @costdev I'm definitely not up on WordPress coding conventions but did you
 see that your patch stripped away the PHPdoc information from my patch?
 With those docblock style comments various IDEs and tools do provide
 helpful inline documentation whereas without them that disappears. If we
 aren't allowing them (of which I see literally thousands in the existing
 code so I think we are) then maybe we should remove them entirely from
 this patch.

 If it's something more superficial, such as the linter expected them to be
 multi-line comments I think it would be better to fix the tool or add the
 newlines because those comments aren't that useful if they're detached
 from the variables they describe.

 > Trying to access array offset on value of type int when an integer is
 passed. $value should be cast to string.

 Is this an error from real code or possibly just from a test? We can
 always add extra runtime type-checking and casting, though it seems like
 the intention here is to only be called with data from `ini_get()` which
 only returns string values (or `false`, which I suppose would be good to
 guard against). I checked and I don't believe any code in Core should be
 passing an integer so maybe there's buggy test code out there doing that
 thinking that this function wants an integer?

 > Test value: '128m ' (note the trailing whitespace)
 > Expected result: 134217728
 > Actual result: 128

 As noted in the patch description and in the patch itself I expect some
 existing assumptions to break because in this case we have an invalid test
 and I'll need to make sure to update the test to reflect that. WordPress
 has been reporting `128m ` as 128 MiB but PHP will have limited the actual
 size to 128 bytes and so it will reject basically every file uploaded and
 the errors will be confusing: "Your upload limit is 128 MB, your 15 KB
 file could not be uploaded because it's too big."

 Thanks for pointing out the style and lint issues and also the test. It's
 my plan to add tests, but I couldn't get to it today.

 Concerning the list of `$digits` do we have a policy for excluding the
 formatting rule? I personally found it more distracting as a large multi-
 line list and so am hoping we can exclude that array for the sake of not
 distracting what's going on in the rest of the function.

 > I absolutely hate the idea of writing PHP in the C style, but ultimately
 love the reason for doing so.

 I love the feedback @dd32  :) By the way the goal wasn't to write in C
 style but to simply have WordPress agree with PHP for the values of these
 memory sizes. When I started my code looked much more like your patch but
 I had to keep unwrapping it. In my head I keep going back to the idea:
 this isn't about telling PHP what to do so much as it is to properly read
 the values of those configs, though I'm confused on why PHP doesn't expose
 a way to read them.

 > GB_IN_BYTES can technically be changed, but if someone does that they
 deserve it.

 Is it worth punishing someone who makes a mistake, or punishing the users
 of their site for their mishap, when doing so requires adding a layer of
 abstraction which can introduce a bug that isn't present in the simpler
 code? Case in point, 55635.3.1.diff has two typos (`BYES` instead of
 `BYTES`); granted, they are there when pulling those constants back in
 after I removed them, but in older versions of PHP that would have
 returned the string `MB_IN_BYES` instead of the constant or instead of
 throwing an error.

 > The leading whitespace checks aren't working, as it's using single-
 quoted strings, so looking for a literal '\t' 2-character string

 Very nice catch 🙇‍♂️. I did my local testing with spaces but not with
 tabs yet. It's my intention to add an illustrative test suite to this to
 demonstrate cases just like this.

 > Isn't written in PHP coding style

 I'm happy to update things to match WordPress style. Is this more a
 comment about style rules or about it not reading like other PHP tends to
 look? While certain aspects of this are more complicated than using
 `(int)` or `intval()`, they are written because of quirks on one or both
 sides of PHP/WordPress and are there to avoid knowingly misreporting
 memory size values.

 There's one other approach I considered which was to build the scalar
 value by joining the digits as we parse them and then use PHP casting or
 `intval()` to do the numeric parsing. That's still an option but I want to
 avoid using `(int)` or other approaches on the input value because (as you
 noted) it incorporates a different semantic for how it parses numbers. If
 the goal is to be generally accurate enough for common configurations I
 don't think this patch is necessary, but if the goal is to accurately
 reflect the configured value then I disagree that it's good enough to be
 right most of the time and wrong in a broad number of situations. I'm
 particularly sensitive to this context since it could be easy to craft
 particular values that don't show errors and which WordPress shows as
 being reasonable but for which PHP has a very different value. The broken
 test-case that failed is an example, where WordPress is off by a factor of
 a million and it's the difference between allowing fairly large uploads
 and practically disabling all uploads albeit with a very conflicted error
 message. Also different versions of PHP result in different behaviors for
 `intval()` and some of those changes are as recent as 7.1. For example,
 `1e6` is //one byte//, not as //one million bytes// as `intval()` on PHP
 8.1 is reporting it.

 A RegExp was another approach I considered but felt was starting to stuff
 too much complexity into the code and too divergent from what's going on
 in PHP under the hood, and brought in more overhead than we need. At least
 with this incantation there's a general match between the purpose of the
 function and the algorithm taken to parse the config values. Should PHP
 decide to change this for some reason it should remain fairly
 straightforward to update the WordPress side and not have to rethink
 abstractions or new concepts introduced to cover the style differences.

 > I don't think it needs to mirror PHP's parsing for a specific config

 Noted: I disagree :) Again, if we don't particularly care about the
 accuracy I don't see the point in fixing a small number of additional
 cases while leaving a very wide door for very misleading parses. Obviously
 I too have a subjective threshold because I've left the overflow cases out
 of this. Those cases seem a bit different though for these reasons:

  - The PHP code involves some amount of undefined behavior and so it may
 not be possible to be generally accurate in those cases; a fix on my
 machine may not also fix a build with another compiler on another
 architecture or system or OS.
  - The PHP code issues a warning against using these incorrect values.
  - Writing the PHP to simulate the integer overflow was giving me quite
 the headache and I think it would involve quite a bit more complexity than
 the rest of this patch, which is mostly scanning through the string
 character by character in a single pass and pulling out values of
 interest.

 We actually know that the digit parsing won't overflow because we have a
 defined behavior in `strtol()`. If there's no final `g`, `m`, or `k` value
 then I believe this should be a 100% match. We should be able to easily
 detect //if// an overflow should occur, but we don't have a way to respond
 to that that's clearly //right// (and while it seems reasonable to return
 `0` in such cases that also is more of a feeling than something that
 corresponds to the reality of the system).

 > Additional edit: If you want to test the PHP parsing

 Nice! Obviously more helpful for smaller values than ones like
 `9223372036854775807` (64-bit max integer) but doesn't require compiling
 PHP.

 ---

 Obviously I'll need to fix the bugs mentioned above. Thank you all for
 sharing your feedback here. Hope you don't mind the wall of text; it was
 my intention to share the context behind why I wrote what I did and the
 reasoning leading my thoughts.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/55635#comment:5>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform