[wp-trac] [WordPress Trac] #22325: Abstract GPCS away from the superglobals
WordPress Trac
noreply at wordpress.org
Sun Sep 29 06:57:15 UTC 2024
#22325: Abstract GPCS away from the superglobals
----------------------------------------------------+---------------------
Reporter: rmccue | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Bootstrap/Load | Version:
Severity: normal | Resolution:
Keywords: has-patch 2nd-opinion needs-unit-tests | Focuses:
----------------------------------------------------+---------------------
Comment (by dmsnell):
[https://www.php.net/releases/5_4_0.php Magic quotes were removed] in
March 2012 with PHP 5.4. Thought this might be relevant here; no supported
version of WordPress supports running on PHP with magic quotes.
Perhaps some of the original motivation for this discussion is gone, but I
think there's still value in handling `$_GET` and `$_POST` more
explicitly. I've grown skeptical of what happens automatically and what
//doesn't// happen automatically.
I'm in favor of encapsulating access to these variables, but in a way that
runs in parallel to existing code (so as not to break it), but which
presents a more explicit and defined interface. Some things I find
surprising:
- dots and spaces in the query args are transformed into underscores
- duplicate params not using PHP's array syntax overwrite previous copies
of the same name
- array-named args explicitly create array structure, but…
- duplicates of some subset of array names overwrites previous nested
parameters
- names and values are accepted as byte streams, not as encoded text
- GET values are submitted as percent-encoded bytes (which are not
guaranteed to be UTF-8), but…
- POST values may be transformed to HTML character references, e.g. when
a browser is set to `latin1`
these are also the kind of "gotchas" that I tend to see people struggle
with, because the basic mental model they form while getting started
paints an inaccurate picture - the reality is much more complicated. and
this is only coming from an examination of legitimate uses, ignoring
malicious attacks.
Although a bit less magical, I find a number of other standard approaches
to query args and `POST` values simpler to reason about and teach about.
{{{#!php
<?php
// ?q=one&q=two
'one' === wp_get( 'q' );
array( 'one', 'two' ) === wp_get_all( 'q' );
null === wp_get( 'r' );
// ?q=😄&r=😄&s=%F0%9F%98%84;
'😄' === wp_get( 'q' );
'😄' === wp_get( 'r' );
'😄' === wp_get( 's' );
}}}
- Providing a default value in the absence of the query arg seems very
reasonable. Of course we can use `??` now so it's less of a big deal, and
there's no way a query param can bet set to `null` - only `"null"`, which
is different.
- I'm having trouble understanding the stop values, or the values to
compare against for detecting the presence of a query arg. the `get()`
function can serve the purpose of the `has()` because it can return `null`
when the arg is missing.
- might want to consider rejecting values that are invalid UTF-8. it's
quite possible that non-UTF-8 data comes in anyway, but many non-UTF-8
encodings also produce valid UTF-8 byte streams. so we can't ensure the
right decoding, but we can reject //some// invalid ones.
It's late and I'm tired so I'm stopping for now, but I'd like to add some
more illustrating examples. for one, names can be weird and wild. for two,
PHP's native system for these values is wild. it makes so many params
unavailable, and makes it really hard on developers to get the right
values when they want to.
as a reminder, **always send `accept-charset=utf8`** on your `<form>`
elements! even if a page has `<meta charset=utf8>`, the browser will still
send other encodings in the POST body, including HTML character
references, if the browser is set to an encoding override.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/22325#comment:52>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list