[wp-trac] [WordPress Trac] #56294: WordPress search finds block name in comment

WordPress Trac noreply at wordpress.org
Wed Jul 27 12:36:55 UTC 2022


#56294: WordPress search finds block name in comment
-------------------------+--------------------------------------
 Reporter:  zodiac1978   |      Owner:  (none)
     Type:  enhancement  |     Status:  new
 Priority:  normal       |  Milestone:  Awaiting Review
Component:  Database     |    Version:  5.0
 Severity:  normal       |   Keywords:  needs-patch dev-feedback
  Focuses:               |
-------------------------+--------------------------------------
 There is a known issue with the WP search, that it is a full text search
 over
 `post_content` which also finds HTML tags like `table`. So searching for
 the word "table" also finds every post/page with table markup in it.

 This problem is very limited, so it wasn't necessary to fix it, although
 there is a plugin to fix it:
 https://wordpress.org/plugins/wp-search-ignore-html-tags/

 Now with the block editor (aka Gutenberg) this has changed. Every block is
 using the block name in a HTML comment. For example:
 {{{
 <!-- wp:syntaxhighlighter/code {"language":"php"} -->
 }}}

 If I now search in a tech blog about the term "syntaxhighlighter" I get
 every post/page with a code block and not only if the post/page is really
 containing this word in the text.

 And with every new block the chance is higher to get more false positive
 search results.

 Even the core blocks have problems, as "paragraph" (instead of just "p")
 or image (instead of "img") have a much higher chance for false positive
 search results, because of the ambiguity.

 There is a Github issue for the block editor about it:
 https://github.com/WordPress/gutenberg/issues/3739

 But it was closed from @pento due to the fact, that it is a known
 WordPress issue and not necessarily a problem of the block editor and its
 type of data.

 @danielbachhuber was asking at
 https://github.com/WordPress/gutenberg/issues/10307#issuecomment-426995580

 > However, I don't have any great ideas for how to resolve this with
 MySQL. I'd love to hear of a solution if someone has one. Barring that,
 this probably won't be a priority to fix with WP 5.0

 After looking at the plugin linked above, I created a solution (with the
 support from @kau-boy):

 {{{
 /**
  * Modify search query to ignore the search term in HTML comments.
  *
  * @param string   $where The WHERE clause of the query.
  * @param WP_Query $query The WP_Query instance (passed by reference).
  *
  * @return string The modified WHERE clause.
  */
 function tl_update_search_query( $where, $query ) {
     if ( ! is_search() || ! $query->is_main_query() ) {
         return $where;
     }

     global $wpdb;
     $search_query = get_search_query();
     $search_query = $wpdb->esc_like( $search_query );

     $where .= " AND {$wpdb->posts}.post_content NOT REGEXP
 '<!--.*$search_query.*-->' ";

     return $where;
 }
 add_filter( 'posts_where', 'tl_update_search_query', 10, 2 );
 }}}

 Before I try to create a PR for it. Would this be a possible way to solve
 this or is a `NOT REGEXP` too slow if many posts exist? I am running this
 solution on my blog but it has no high traffic and not very much posts -
 so my finding may not show the big picture here.

 Feedback about possible problems (and hopefully how to solve them) are
 much appreciated! Thanks in advance.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/56294>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list