[wp-trac] [WordPress Trac] #53910: `sanitize_title_with_dashes` returns partial encoded values in permalink
WordPress Trac
noreply at wordpress.org
Tue Aug 10 22:09:08 UTC 2021
#53910: `sanitize_title_with_dashes` returns partial encoded values in permalink
--------------------------+--------------------------------------
Reporter: costdev | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Permalinks | Version: 5.8
Severity: major | Resolution:
Keywords: | Focuses: ui, rtl, administration
--------------------------+--------------------------------------
Description changed by SergeyBiryukov:
Old description:
> Picked up quite an old bug (circa 2006!) while working on
> [https://core.trac.wordpress.org/ticket/47912 #47912].
>
> `sanitize_title_with_dashes()` does a check to see if the title
> `seems_utf8()` and subsequently url encodes it. The call to
> `utf8_uri_encode()` has a `$length` argument of `200`.
>
> If an encoded value crosses the 200 boundary, the encoded value is cut
> and the remainder isn't picked up by any of the subsequent actions taken
> by `sanitize_title_with_dashes()`.
>
> >this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-
> remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-
> trimmed-to-200-chars-instead-of-using-max-and-strlen**%e2%80%af**
>
> becomes:
>
> >this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-
> remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-
> trimmed-to-200-chars-instead-of-using-max-and-strlen**%e2**
>
> I've resolved this issue by:
> - Storing the `seems_utf8()` value
> - Changing the call to `utf8_uri_encode()`, so that the `length` argument
> is the `max()` of `strlen( $title )` and `200`
> - Trimming the `$title` to `200` at the end of the
> `sanitize_title_with_dashes` instead.
New description:
Picked up quite an old bug (circa 2006!) while working on #47912.
`sanitize_title_with_dashes()` does a check to see if the title
`seems_utf8()` and subsequently url encodes it. The call to
`utf8_uri_encode()` has a `$length` argument of `200`.
If an encoded value crosses the 200 boundary, the encoded value is cut and
the remainder isn't picked up by any of the subsequent actions taken by
`sanitize_title_with_dashes()`.
>this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-
remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-
trimmed-to-200-chars-instead-of-using-max-and-strlen**%e2%80%af**
becomes:
>this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-
remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-
trimmed-to-200-chars-instead-of-using-max-and-strlen**%e2**
I've resolved this issue by:
- Storing the `seems_utf8()` value
- Changing the call to `utf8_uri_encode()`, so that the `length` argument
is the `max()` of `strlen( $title )` and `200`
- Trimming the `$title` to `200` at the end of the
`sanitize_title_with_dashes` instead.
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/53910#comment:1>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list