[wp-trac] [WordPress Trac] #56294: WordPress search finds block name in comment

WordPress Trac noreply at wordpress.org
Mon Apr 3 14:32:40 UTC 2023


#56294: WordPress search finds block name in comment
--------------------------------------+------------------------------
 Reporter:  zodiac1978                |       Owner:  (none)
     Type:  enhancement               |      Status:  closed
 Priority:  normal                    |   Milestone:  Awaiting Review
Component:  Database                  |     Version:  5.0
 Severity:  normal                    |  Resolution:  maybelater
 Keywords:  needs-patch dev-feedback  |     Focuses:  performance
--------------------------------------+------------------------------

Comment (by espiat):

 Replying to [comment:13 zodiac1978]:
 > Replying to [comment:12 l1nuxjedi]:
 > > In fact the first example here shows how to modify that DB query to
 filter all meta tags: https://mariadb.com/kb/en/regexp_replace/
 >
 > The problem with parsing HTML with RegEx is, that there are so many edge
 cases that will break it. One "<" in the content is filtering out
 everything until the next closing ">" for example ...
 >
 > See: https://stackoverflow.com/a/1732454
 >
 > And with the example from the MariaDB knowledge base:
 https://regex101.com/r/CY0zuJ/1 (just added "5 < 1" in the content).


 Okay. Then we can change the code back to:

 {{{#!php
 <?php
 function tl_custom_search_query($search, $query)
 {
     if (!is_search() || !$query->is_main_query()) {
         return $search;
     }
     global $wpdb;
     $search_query = get_search_query();
     $search_query = $wpdb->esc_like($search_query);
     // Remove the original search condition
     $search = preg_replace("/\({$wpdb->posts}.post_content LIKE
 '%[^%]+%'\)/", "", $search);
     // Add a custom search condition
     $search .=  " AND
 REGEXP_REPLACE(REGEXP_REPLACE({$wpdb->posts}.post_content, '\"[^\"]*\"',
 ''), '<!-.*?->', '') LIKE '%{$search_query}%'";
     return $search;
 }
 add_filter('posts_search', 'tl_custom_search_query', 10, 2);
 }}}


 I was making a search in a default wp install. (no plugins)

 and i was a bit surprised that the search in the core is currently very
 imprecise. If I search for "alt" then I get an article in the results that
 has ONLY an alt tag (the html img attribute) in the code:


 {{{
 <!-- wp:paragraph -->
 <p>i am a para </p>
 <!-- /wp:paragraph -->

 <!-- wp:paragraph -->
 <p></p>
 <!-- /wp:paragraph -->

 <!-- wp:image {"id":321,"sizeSlug":"large","linkDestination":"none"} -->
 <figure class="wp-block-image size-large paragraph"><img
 src="https://olliedemo.local/wp-content/uploads/2023/03/about-
 792x1024.png" alt="" class="wp-image-321"/></figure>
 <!-- /wp:image -->
 }}}


 That shouldn't happen either.
 Is the search currently in such a bad state?

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/56294#comment:15>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list