[wp-trac] [WordPress Trac] #61833: Post titles in Bulk Edit should show decoded HTML

WordPress Trac noreply at wordpress.org
Thu Aug 22 04:32:35 UTC 2024


#61833: Post titles in Bulk Edit should show decoded HTML
-------------------------------------+---------------------
 Reporter:  dmsnell                  |       Owner:  (none)
     Type:  defect (bug)             |      Status:  new
 Priority:  normal                   |   Milestone:  6.7
Component:  Quick/Bulk Edit          |     Version:
 Severity:  normal                   |  Resolution:
 Keywords:  has-patch needs-refresh  |     Focuses:
-------------------------------------+---------------------

Old description:

> It seems that when a post title contains HTML character references
> (entities) that the list of posts in the //Bulk Edit// screen shows the
> raw HTML markup in its non-decoded form. Instead, it ought to show the
> decoded form.
>
> || [[Image(https://cldup.com/RNH1LGEjg7.png)]] ||
> [[Image(https://cldup.com/cTnfJAgODG.png)]] ||
>
> The JavaScript is escaping the raw HTML, preserving the character
> references as syntax instead of decoding them as text.
>
> {{{
> [Log] {theTitle: "… is Λ"} (inline-edit-post.js,
> line 210)
> [Log] {theTitle: "x < 2 & y > 3…"} (inline-edit-
> post.js, line 210)
> }}}
>
> WordPress is double-encoding the post titles when sending them to the
> admin page, causing the display error. The raw value in the database is
> proper, e.g. `… is Λ`.
>
> [[Image(https://cldup.com/HJEjFMJ5uA.png)]]

New description:

 It seems that when a post title contains HTML character references
 (entities) that the list of posts in the //Bulk Edit// screen shows the
 raw HTML markup in its non-decoded form. Instead, it ought to show the
 decoded form.

 || [[Image(https://cldup.com/Ti9hsHgHpN.mp4)]] ||
 [[Image(https://cldup.com/cTnfJAgODG.png)]] ||

 The JavaScript is escaping the raw HTML, preserving the character
 references as syntax instead of decoding them as text.

 {{{
 [Log] {theTitle: "… is Λ"} (inline-edit-post.js,
 line 210)
 [Log] {theTitle: "x < 2 & y > 3…"} (inline-edit-post.js,
 line 210)
 }}}

 WordPress is double-encoding the post titles when sending them to the
 admin page, causing the display error. The raw value in the database is
 proper, e.g. `… is Λ`.

 [[Image(https://cldup.com/HJEjFMJ5uA.png)]]

--

Comment (by dmsnell):

 @peterwilsoncc thanks for going through the testing and teaching me
 something new about the post list.

 It took me a bit to figure out what you were referring to, so I've
 recorded what I think is the dual flow in case others are confused like I
 was.

 [https://cloudup.com/c2u1ZRAwxZ0 Screencast of Edits]

 > With this PR applied, using quick edit for a single post modifies the
 title stored in the database to remove the encoding, eg Pens & Pencils.
 Using Bulk edit doesn't modify how the data is stored.

 It seems like this is true but also true accidentally because the bulk
 edit doesn't allow modifying the post titles. It submits a request with
 the changed bulk parameters and list of affected post ids in the query
 args for the request to `wp-admin/edit.php` while the quick edit screen
 sends all of the arguments for a specific post as POST variables to `wp-
 admin/admin-ajax.php`.

 So quick edit allows setting title and thus it's updated.

 > Bulk edit uses the same markup as quick edit so changing one will effect
 the other.

 I'd like to hear thoughts on this because in my opinion it's going to be
 more accurate to replace the `&` from the database and store `&`
 instead. Call it a happy unintended side effect of this change. Still, it
 wasn't the goal of this change to modify the way the data is stored. I
 just happen to think, considering proper translation of layers and
 domains, that it's most reliable to store raw text in the `post_title`
 field and then let the display logic handle proper HTML escaping
 ([https://developer.wordpress.org/apis/security/escaping/#toc_3 escaping
 late]).

 ----

 In other words, this seems like an unintended but positive change.

 ----

 It's way more complicated 😰

 This is also unrelated to the patch, but it appears like `inline-edit-
 post.js` forces UTF-8 submission via jQuery serializing the post title.
 This can go wrong on the backend if the site isn't configured for UTF-8
 and it corrupts the post title. Were it submitting via an HTML FORM
 element, then the browser would automatically insert character references
 for the characters that aren't supported by the page's encoding, but the
 jQuery code doesn't do this.

 ----

 There's a lot of complexity in here. I think the existing behavior "works"
 because `esc_textarea()` and Core's `esc_` family attempts to prevent
 double-encoding. That is, it doesn't go `&` > `&` > `&&` >
 etc…

 I think we can fix this one display issue without breaking the database,
 even though we're changing it. The way the current code works I don't
 think it's possible to fix the display issue without changing the
 database, and fixing the way the post title is stored opens a can of worms
 and highlights other existing problems that probably are under-reported
 due to the relative lack of non-UTF-8 sites with non-ASCII characters in
 the post titles. The existing behavior actually does force the storing of
 the encoded form of the post title, which only happens to display properly
 on the rendered page because of the behavior of `esc_`.

 Do you feel strongly one way or the other about this? I'd prefer the post
 titles display as they do on render.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/61833#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list