[wp-trac] [WordPress Trac] #17267: twentyeleven_url_grabber() needs work

Thu Apr 28 10:28:50 UTC 2011

#17267: twentyeleven_url_grabber() needs work
--------------------------+-------------------------
 Reporter:  nacin         |      Owner:
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  3.2
Component:  Themes        |    Version:  3.2
 Severity:  normal        |   Keywords:  needs-patch
--------------------------+-------------------------
 {{{
 /**
  * Grab the first URL from a Link post
  */
 function twentyeleven_url_grabber() {
         global $post, $posts;

         $first_url = '';

         ob_start();
         ob_end_clean();

         $output = preg_match_all('/<a.+href=[\'"]([^\'"]+)[\'"].*>/i',
 $post->post_content, $matches);

         $first_url = $matches [1] [0];

         if ( empty( $first_url ) )
                 return false;

         return $first_url;
 }
 }}}

 The regex needs some work:
  - `a.+href` would match `area href`. While that isn't much of a concern,
 it would also skip to the second link as it's greedy.
  - Likewise we should probably use a non-greedy `(.+?)` rather than
 `([\'"]+)` for what we're capturing.
  - We probably want the `s` pattern modifier there, in case there's an
 `\n` somewhere such as after `<a ` and before `href`.
  - No need for the `.*>` at the end, just grab the href and bail.

 Something like this might work: `'/<a\s[^>]+href=[\'"](.*?)[\'"]/is`.

 Also:

 No need for the `$posts` global to be called upon.

 Rather than preg_match_all, we should just use preg_match, since we want
 the first one. If `! preg_match`, then we should return false. No need to
 assign the result to `$output`.

 We should arguably use get_the_content() since this is being called in
 within a loop, rather than the raw, unfiltered data.

 Anyone know what the output buffering was designed to do?

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/17267>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software