[wp-trac] [WordPress Trac] #21914: Improve pingback page block parsing

WordPress Trac wp-trac at lists.automattic.com
Mon Sep 17 23:12:44 UTC 2012


#21914: Improve pingback page block parsing
-----------------------------+------------------------------
 Reporter:  Otto42           |       Type:  enhancement
   Status:  new              |   Priority:  normal
Milestone:  Awaiting Review  |  Component:  Pings/Trackbacks
  Version:  3.4.2            |   Severity:  normal
 Keywords:  has-patch        |
-----------------------------+------------------------------
 The code in pingback_ping which reads the remote page and parses the page
 into a suitable comment for the pingback needs some improvement with the
 paragraph parse.

 This line of code converts blocks into paragraph separators, which are
 later split up and handled independently:

 {{{
 $linea = preg_replace( "/
 <(h1|h2|h3|h4|h5|h6|p|th|td|li|dt|dd|pre|caption|input|textarea|button|body)[^>]*>/",
 "\n\n", $linea );
 }}}

 Problem is that it's only considering blocks with spaces at the beginning
 of them, and not also taking closing blocks into consideration as well.

 This minor improvement produces better results, I think.

 {{{
 $linea = preg_replace(
 "/<\/*(h1|h2|h3|h4|h5|h6|p|th|td|li|dt|dd|pre|caption|input|textarea|button|body)[^>]*>/",
 "\n\n", $linea );
 }}}

 Change is to eliminate the required space in front of the < and to also
 allow the closing blocks to mark separation of paragraphs.

 This tends to give much cleaner results, especially when the last
 paragraph in the content is the one that contains the pingback link
 (pretty commonplace). This happens because by allowing closing blocks to
 separate the paragraphs, stuff added after them (using the_content
 filters) doesn't get added into that last paragraph as well. Almost all
 content has a /p at the end of it, and filters that add to the_content
 area usually use div's or span's to add to it, not p's.

 Patch attached for trunk.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/21914>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list