[wp-trac] [WordPress Trac] #22530: garbage query strings on URLs are not sanitized or removed
WordPress Trac
noreply at wordpress.org
Wed Nov 21 15:52:50 UTC 2012
#22530: garbage query strings on URLs are not sanitized or removed
-----------------------------+--------------------------
Reporter: rawalex | Type: defect (bug)
Status: new | Priority: normal
Milestone: Awaiting Review | Component: General
Version: 3.4.2 | Severity: critical
Keywords: needs-patch |
-----------------------------+--------------------------
Here is an interesting problem I ran into, a bug / feature that appears to
be used by malicious people to cause Google to see your site as full of
duplicate content.
If you visit a wordpress site, and add a garbage query string to the end
of the URL, that garbage gets carried forward. Example:
yourblog.here/page/2?ssdlfkjsdlkfjsdfs
When you scroll down, the "previous" and "next" links will automatically
carry that query string forward.
Normally, this would not be a big issue. However, some people appear
intent on specifically creating these sorts of links to wordpress sites,
and Googlebot is finding those links on remote sites. Those links are
followed, and then the "previous - next" situation perpetuates the problem
through every page on the site. If you have 1000 posts, at 10 per page,
Google just indexed 100 duplicate content pages.
So the bug is the following:
Passed query strings need to be sanitized, and junk removed - there is no
reason to pass it on. In the case of a junk passed string, there should
be an http 301 or 302 reply and the user / bot redirected to the proper
page without the query string.
Further, query strings should not be perpetuated forward through the
"previous - next" links on the pages unless they are relevant to that page
change. As an example, a valid search string might be worth moving
forward with. Other passed items may not be worth carrying forward.
Potentially, any unsanitized input accepted in a query is a vector for
other attacks. Having that query carry forward is a real issue. As an
example, full select * from queries are not accepted and not dealt with,
and perpetuated forward. No, they are not currently actually causing
anything to happen, but a failure to sanitize these inputs suggests a
vector for a future attack, such as an input overflow or similar.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/22530>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list