[wp-trac] [WordPress Trac] #31645: Press This: Reject relative URLs when scraping source html

WordPress Trac noreply at wordpress.org
Tue Mar 17 19:35:13 UTC 2015


#31645: Press This: Reject relative URLs when scraping source html
--------------------------+--------------------
 Reporter:  kraftbj       |       Owner:
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  4.2
Component:  Press This    |     Version:  trunk
 Severity:  normal        |  Resolution:
 Keywords:  has-patch     |     Focuses:
--------------------------+--------------------

Comment (by azaozz):

 Replying to [comment:10 stephdau]:
 > [attachment:31645.6.patch] will not work as is because `$url` goes
 through `esc_url_raw()` before being tested, which prepends `http://` to
 whatever value is passed to it... So `123.html` becomes
 `http://123.html/`.

 Right, we have to run that test before `esc_url_raw()` as it will prepend
 `http://` to some relative URLs.

 Looking at 31645.7.patch:
 - `'/^[\/]{1}[^\/]+/'` is exactly the same as `'%^/[^/]+%'` and `$url{0}
 === '/' && $url{1} !== '/'` with the last one being much faster than any
 PCRE function.
 - This `'/^[\/]{2}[^\/]+/'` matches protocol-relative URLs, then we
 prepend the current protocol to them. Not sure this is desirable. URLs
 starting with `//` are the best choice for links, embeds, images, etc. (as
 long as the server supports both http and https). They will never trigger
 "Mixed/Insecure content" warnings. If the page is telling us to use these,
 we should :)
 - At the end we just return the whole URL passed by the user? So if there
 is an image src `../../assets/images/test.gif` we will replace the src
 with the page's URL. We should be rejecting non-root relative URLs.

 We can try to extrapolate the absolute URL out of the page's URL and an
 relative image src. We can attempt that or discard relative image sources.
 Shouldn't return a wrong src though.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/31645#comment:13>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list