[wp-trac] [WordPress Trac] #16892: make_clickable segfault

WordPress Trac wp-trac at lists.automattic.com
Sun May 8 21:44:49 UTC 2011


#16892: make_clickable segfault
-------------------------------------------------+-------------------------
 Reporter:  westi                                |       Owner:
     Type:  defect (bug)                         |      Status:  reopened
 Priority:  normal                               |   Milestone:  Awaiting
Component:  Formatting                           |  Review
 Severity:  normal                               |     Version:  3.1
 Keywords:  has-patch reporter-feedback needs-   |  Resolution:
  testing                                        |
-------------------------------------------------+-------------------------
Changes (by mdawaffe):

 * keywords:  has-patch reporter-feedback => has-patch reporter-feedback
     needs-testing
 * status:  closed => reopened
 * resolution:  fixed =>
 * milestone:  3.1.1 => Awaiting Review


Comment:

 With the current code, an input of sufficient length can still cause
 segfaults.  Increasing the length required before an input will cause a
 segfault doesn't solve the problem.

 Attached:
  1. Fixes the ~2000 character limit for auto linking URLs.
  2. Improves the efficiency of the regex.
  3. Simplifies the regex.
  4. Passes all tests in {{{php wp-test.php -t TestMakeClickable}}}
  5. Is 10-20% faster on "typical" inputs with no or sparse links.
  6. Is 10-20% faster on "atypical" inputs with very dense links.
  7. Is ~80% faster on malicious inputs that the current code can handle.
  8. Does not segfault on malicious inputs on which the current code
 segfaults.

 The ~2000 character limit is imposed in two parts.  First, if the input is
 larger than 10000 characters (~1500 english words), the input is broken up
 into chunks by splitting it at whitespace characters.  Chunks that can't
 be split (i.e. chunks with no whitespace) are skipped so that the regex
 doesn't have to process them.  Second, the URL's total length is limited
 by the regex.

 I'm sure there are some edge cases that this patch treats differently than
 the current code.  I'm not sure if those edge case come up "naturally".
 If they do, I'm not sure if they have well defined expected behaviors.

 Needs testing, especially with non-ASCII and multibyte inputs.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/16892#comment:21>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list