[wp-trac] [WordPress Trac] #33121: wp_kses_attr_check fails to process html data-* attributes

WordPress Trac noreply at wordpress.org
Thu Oct 11 22:55:20 UTC 2018


#33121: wp_kses_attr_check fails to process html data-* attributes
--------------------------------------+-----------------------
 Reporter:  isoftware                 |       Owner:  (none)
     Type:  defect (bug)              |      Status:  reopened
 Priority:  normal                    |   Milestone:  5.0
Component:  Editor                    |     Version:  4.2.3
 Severity:  major                     |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+-----------------------

Comment (by peterwilsoncc):

 tl;dr in [attachment:"33121.4.diff"]:

 * Test added to ensure prefix is followed by a hyphen
 * Test added to ensure attributes with multiple hyphens allowed
 * Regex altered to require one or more instance of `(-[a-z0-9_]+)`, ie
 `'/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)+$/'`

 Replying to [comment:16 azaozz]:
 > @peterwilsoncc thanks for adding the test :)

 And thanks for the review.

 > Looking at `data--invaild="gone"` and `data-also-invaild-="gone"`, it
 seems having two hyphens or a hyphen as last char of the data-* attribute
 name is valid per https://developer.mozilla.org/en-
 US/docs/Web/HTML/Global_attributes/data-* and https://www.w3.org/TR/REC-
 xml/#NT-Name. Also seems quite a few chars are valid there, but still
 thinking we should only support a-z0-9_-.

 This is true but it does some strange things to the `element.dataset`
 property available in JavaScript so I decided to prevent it. I've created
 a bin with an example
 https://jsbin.com/muloxeq/edit?html,js,console,output

 I'm happy to change this if needs be.

 > The TL;DR: don't think allowing wildcard attributes in KSES is a good
 thing. It brings us to a pretty dangerous place and at the same time
 reduces some of the existing functionality: sanitizing attribute values.

 I'm not sure this is the case if we require the hyphen following any
 prefixes, so a developer won't be able to add `href-*` and bypass
 checking. The regex change I mention below hardens against this.

 I'm also prepared to be misunderstanding something, so are you able to let
 me know if that's the case.

 > That would mean --somebody-- can add `on-*` or even `o-*` and allow all
 `onerror`, `onclick`, `onmouseover`, etc. attributes.

 This isn't the case as the hyphen is required before any characters in the
 regex group `(-[a-z0-9_]+)`.

 As it's important to block, I've added a test in
 [attachment:"33121.4.diff"] to ensure against it.

 > Also `preg_match( '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)*$/',
 $name_low )` would mean we don't allow attribute names containing two
 hyphens like `data-wp-id` (which is somewhat common).

 This is also incorrect, I've added such an example as a test in
 [attachment:"33121.4.diff"].

 However, the zero or more regex (`*`) did allow users to add
 `data="something"`, so I've changed that in the latest patch to be one or
 more (`+`).

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/33121#comment:17>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list