[wp-trac] [WordPress Trac] #53910: `sanitize_title_with_dashes` returns partial encoded values in permalink

WordPress Trac noreply at wordpress.org
Tue Aug 10 22:09:08 UTC 2021


#53910: `sanitize_title_with_dashes` returns partial encoded values in permalink
--------------------------+--------------------------------------
 Reporter:  costdev       |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Permalinks    |     Version:  5.8
 Severity:  major         |  Resolution:
 Keywords:                |     Focuses:  ui, rtl, administration
--------------------------+--------------------------------------
Description changed by SergeyBiryukov:

Old description:

> Picked up quite an old bug (circa 2006!) while working on
> [https://core.trac.wordpress.org/ticket/47912 #47912].
>
> `sanitize_title_with_dashes()` does a check to see if the title
> `seems_utf8()` and subsequently url encodes it. The call to
> `utf8_uri_encode()` has a `$length` argument of `200`.
>
> If an encoded value crosses the 200 boundary, the encoded value is cut
> and the remainder isn't picked up by any of the subsequent actions taken
> by `sanitize_title_with_dashes()`.
>
> >this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-
> remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-
> trimmed-to-200-chars-instead-of-using-max-and-strlen**%e2%80%af**
>
> becomes:
>
> >this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-
> remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-
> trimmed-to-200-chars-instead-of-using-max-and-strlen**%e2**
>
> I've resolved this issue by:
> - Storing the `seems_utf8()` value
> - Changing the call to `utf8_uri_encode()`, so that the `length` argument
> is the `max()` of `strlen( $title )` and `200`
> - Trimming the `$title` to `200` at the end of the
> `sanitize_title_with_dashes` instead.

New description:

 Picked up quite an old bug (circa 2006!) while working on #47912.

 `sanitize_title_with_dashes()` does a check to see if the title
 `seems_utf8()` and subsequently url encodes it. The call to
 `utf8_uri_encode()` has a `$length` argument of `200`.

 If an encoded value crosses the 200 boundary, the encoded value is cut and
 the remainder isn't picked up by any of the subsequent actions taken by
 `sanitize_title_with_dashes()`.

 >this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-
 remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-
 trimmed-to-200-chars-instead-of-using-max-and-strlen**%e2%80%af**

 becomes:

 >this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-
 remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-
 trimmed-to-200-chars-instead-of-using-max-and-strlen**%e2**

 I've resolved this issue by:
 - Storing the `seems_utf8()` value
 - Changing the call to `utf8_uri_encode()`, so that the `length` argument
 is the `max()` of `strlen( $title )` and `200`
 - Trimming the `$title` to `200` at the end of the
 `sanitize_title_with_dashes` instead.

--

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/53910#comment:1>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list