[wp-trac] [WordPress Trac] #53910: `sanitize_title_with_dashes` returns partial encoded values in permalink

WordPress Trac noreply at wordpress.org
Tue Aug 10 23:33:28 UTC 2021


#53910: `sanitize_title_with_dashes` returns partial encoded values in permalink
-------------------------------------+-------------------------------------
 Reporter:  costdev                  |       Owner:  SergeyBiryukov
     Type:  defect (bug)             |      Status:  reviewing
 Priority:  normal                   |   Milestone:  5.9
Component:  Permalinks               |     Version:  5.8
 Severity:  major                    |  Resolution:
 Keywords:  has-patch has-unit-      |     Focuses:  ui, rtl,
  tests dev-feedback                 |  administration
-------------------------------------+-------------------------------------

Comment (by costdev):

 Replying to [comment:6 audrasjb]:
 > Aside, I feel a bit hesitant to move it to the next minor milestone. I
 don't think it could break anything, but I'm not sure it could not be
 considered as a ''breaking change'' since it changes the behavior of the
 URL sanitization…

 Absolutely, it ''is'' a ''breaking change'', in certain circumstances.
 I've also now realised that this bug doesn't just occur with encoded
 substrings like `%e2%80%93`, but with all sanitized substrings including
 `©` that exist on either side of the 200 character boundary.

 ----

 Before applying the patch, create a post with this title:

 > This very long title is to help demonstrate that partial encoded values
 remain when you try to use sanitize title with dashes on encoded strings
 trimmed to 200 chars instead of using max and strlen©

 It will be sanitized to:

 > this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-
 remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-
 trimmed-to-200-chars-instead-of-using-max-and-strlenco

 In your theme:

 {{{
 <?php
     if ( have_posts() ) : while( have_posts() ) : the_post();
         if ( get_post_field( 'post_name', get_the_ID() ) ===
 sanitize_title_with_dashes( get_the_title() ) {
             echo 'Yes';
         } else {
             echo 'No';
         }
     endwhile; endif;
 ?>
 }}}

 Prior to the patch, it should return 'Yes'. After the patch, the
 previously stored slug will still have, for example, `%e2` at the end,
 resulting in a 'No' result.

 ----

 While this is a very specific edge case whereby there is an old post with
 a title that is over 200 characters long and just happens to have any
 sanitized substring that crosses the 200 character boundary, it's still a
 ''breaking change'' after fixing `sanitize_title_with_dashes()`.

 Throw into the mix that `sanitize_title_with_dashes()` can be, and is,
 used to sanitize other strings that may be stored in the database and used
 later in comparative operations, and this is a ''breaking change'' with
 some potential reach.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/53910#comment:8>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list