[wp-trac] [WordPress Trac] #5678: Respectfully strip newlines in some importers

Wed Jan 16 11:20:02 GMT 2008

#5678: Respectfully strip newlines in some importers
-------------------------+--------------------------------------------------
 Reporter:  jdub         |       Owner:  anonymous
     Type:  enhancement  |      Status:  new      
 Priority:  normal       |   Milestone:  2.6      
Component:  General      |     Version:           
 Severity:  normal       |    Keywords:           
-------------------------+--------------------------------------------------
 Filing this as an enhancement because it could do with some discussion and
 insight from wiser and more experienced heads before being labelled
 "defect". :-)

 I noticed while helping some users import their blogs that importers of
 HTML content (such as the RSS importer) don't tidy up superfluous newlines
 in the import format, which results in unnecessary {{{<br/>}}} elements
 after {{{wpautop()}}} filtering for display. They turn up in the editor
 too, which reinforces the problem.

 I've adapted one of the filter functions to strip superfluous newlines,
 and changed my RSS importer to use it. The results have been warmly
 welcomed by users, who no longer have to clean up their imported blog
 content. ;-)

 {{{strip_newlines()}}} should probably go into {{{wp-
 includes/formatting.php}}}, if there isn't already a function that already
 serves this purpose. I couldn't find one, so I adapted this.

 Given that similar HTML block/inline-savvy string-replacement code exists
 in other formatting functions, perhaps there's an opportunity for some
 refactoring here? I feel kind of silly proposing a function that is almost
 entirely duplicated from other code in the core.

 I've used it immediately before the "Clean up content" section in {{{wp-
 admin/import/rss.php}}}'s {{{get_posts()}}}, and in an Advogato importer
 that I've written (which also uses HTML as the content format).

 {{{
 function strip_newlines($text) {
         // Respectfully strip unnecessary newlines
         $textarr = preg_split("/(<[^>]+>)/Us", $text, -1,
 PREG_SPLIT_DELIM_CAPTURE);
         $stop = count($textarr); $skip = false; $output = ''; // loop
 stuff
         for ($ci = 0; $ci < $stop; $ci++) {
                 $curl = $textarr[$ci];
                 if (! $skip && isset($curl{0}) && '<' != $curl{0}) { // If
 it's not a tag
                         $curl = preg_replace('/[\n\r]+/', ' ', $curl);
                 } elseif (strpos($curl, '<code') !== false ||
 strpos($curl, '<pre') !== false || strpos($curl, '<kbd') !== false ||
 strpos($curl, '<style') !== false || strpos($curl, '<script') !== false) {
                         $next = false;
                 } else {
                         $next = true;
                 }
                 $output .= $curl;
         }
         return $output;
 }
 }}}

 Thoughts?

-- 
Ticket URL: <http://trac.wordpress.org/ticket/5678>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software