[wp-trac] [WordPress Trac] #48106: Revisit post GUID sanitization on `&`

WordPress Trac noreply at wordpress.org
Mon Sep 23 08:26:37 UTC 2019


#48106: Revisit post GUID sanitization on `&`
--------------------------+-----------------------------
 Reporter:  zzxiang       |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Post Formats  |    Version:  5.2.3
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 === The source code of core which needs to be revisit

 When a new post/attachment is inserted into the database, the post GUID is
 sanitized and the `&` character in GUID is converted to `&`.

 More specifically, in `wp-includes/default-filters.php`, function
 `wp_filter_kses` is added as a default `pre_post_guid` filter.

 {{{#!php
 // Save URL
 foreach ( array( 'pre_comment_author_url', 'pre_user_url', 'pre_link_url',
 'pre_link_image',
         'pre_link_rss', 'pre_post_guid' ) as $filter ) {
         add_filter( $filter, 'wp_strip_all_tags' );
         add_filter( $filter, 'esc_url_raw'       );
         add_filter( $filter, 'wp_filter_kses'    );
 }
 }}}

 Before a post GUID is saved, function `wp_filter_kses` in `wp-
 includes/kses.php` is called, and eventually function
 `wp_kses_normalize_entities` does the real conversion, so that `&` is
 converted to `&`.

 === The problem it causes

 The plugin External Media without Import (https://wordpress.org/plugins
 /external-media-without-import/) inserts external image URLs as post GUIDs
 into database so that users can add external images into their media
 libraries without actually uploading the image files to their WordPress
 servers. If the image URL contains `&`, such as

 https://pbs.twimg.com/media/D_NKa3yWkAYwZwn?name=900x900&format=png

 it is converted to

 https://pbs.twimg.com/media/D_NKa3yWkAYwZwn?name=900x900&format=png

 The result is that the image is not correctly displayed in some places,
 such as the media library page of the admin dashboard.

 There're also other plugins, such as Imposer
 (https://github.com/dirtsimple/imposer/) and Postmark
 (https://github.com/dirtsimple/postmark/), encountering the same issue.
 Imposer fixes the issue by forcing to save post GUIDs again with the
 unsanitized version. I think it is equivalent to removing `wp_filter_kses`
 from the default `pre_post_guid` filters.

 === The reason of post GUID sanitization

 Post GUID sanitization was added with a commit in 2011:
 https://github.com/WordPress/WordPress/commit/81a5f821fbfb63be6c5517d033b8e7a0a4172f07.
 The commit log message does not state why post GUIDs need to be sanitized
 on save and display. Also, the commit is so long time ago that seems that
 even the members of the core channel of WordPress Slack group can't tell
 the reason.

 At first it was thought that it is because when exporting RSS feeds, `&`
 needs to be converted due to XML specification. But I did some experiments
 and inspected the core source code, and found that in fact WordPress core
 does convert `&` to `&` while exporting RSS2 feed, even if I changed
 the `&` back to `&` in the database via MySQL client. The convertion
 is done by function `wptexturize` in `wp-includes/formatting.php`. The
 function is added as a default `the_content` filter.

 So I really don't understand why post GUIDs should be sanitized,
 especially for the `&` issue. This might be a core issue rather than a
 plugin issue. It might be fine to not add `wp_filter_kses` as a default
 `pre_post_guid`, i.e. not do the post GUID sanitization.

 This issue has also been discussed here: https://github.com/zzxiang
 /external-media-without-import/issues/17

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/48106>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list