[wp-trac] [WordPress Trac] #14347: URLs are not handeled properly

WordPress Trac wp-trac at lists.automattic.com
Sun Jul 18 19:00:45 UTC 2010


#14347: URLs are not handeled properly
--------------------------+-------------------------------------------------
 Reporter:  hakre         |       Owner:                 
     Type:  defect (bug)  |      Status:  new            
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  General       |     Version:                 
 Severity:  normal        |    Keywords:                 
--------------------------+-------------------------------------------------
 While digging into #14201, #14292 and similars, it came to my attention,
 that wordpress does not filter the URL input properly. This can lead to
 404 responses where content is actually available as specified by http /
 RFC 2612.

 Example run against current trunk to illustrate the issue:

 {{{
 # curl -I http://webroot.loc/wordpress/tag/%e4%b8%80%e6%a0%b7

 HTTP/1.1 200 OK
 Date: Sun, 18 Jul 2010 18:53:02 GMT
 Server: Apache
 X-Pingback: http://webroot.loc/wordpress/xmlrpc.php
 Content-Type: text/html; charset=UTF-8
 }}}

 Doing the ''same'' request with an alternative writing in the URL does
 lead to a 404. Remind that the "a" of tag has been encoded as %41:

 {{{
 # curl -I http://webroot.loc/wordpress/t%41g/%e4%b8%80%e6%a0%b7
 HTTP/1.1 404 Not Found
 Date: Sun, 18 Jul 2010 18:54:32 GMT
 Server: Apache
 Cache-Control: no-cache, must-revalidate, max-age=0
 Expires: Wed, 11 Jan 1984 05:00:00 GMT
 Pragma: no-cache
 X-Pingback: http://webroot.loc/wordpress/xmlrpc.php
 Last-Modified: Sun, 18 Jul 2010 18:54:33 GMT
 Content-Type: text/html; charset=UTF-8
 }}}

 RFC 2613 clearly write about this in the comparison of URLs (3.2.3):

 >    Characters other than those in the "reserved" and "unsafe" sets (see
 >   RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

 These so called character triplets are written uppercase by the PHP
 urlencode() and rawurlencode() functions, are written lowercase mostly
 inside worpdress (e.g. slugs generation). They can be written either and
 even mixed case, even the RFCs introduce them uppercase first. But both
 variants are okay, even {{{%dD}}} is.

 The webapplication should handle both URLs the same.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/14347>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list