[wp-trac] [WordPress Trac] #56294: WordPress search finds block name in comment
WordPress Trac
noreply at wordpress.org
Mon Apr 3 14:32:40 UTC 2023
#56294: WordPress search finds block name in comment
--------------------------------------+------------------------------
Reporter: zodiac1978 | Owner: (none)
Type: enhancement | Status: closed
Priority: normal | Milestone: Awaiting Review
Component: Database | Version: 5.0
Severity: normal | Resolution: maybelater
Keywords: needs-patch dev-feedback | Focuses: performance
--------------------------------------+------------------------------
Comment (by espiat):
Replying to [comment:13 zodiac1978]:
> Replying to [comment:12 l1nuxjedi]:
> > In fact the first example here shows how to modify that DB query to
filter all meta tags: https://mariadb.com/kb/en/regexp_replace/
>
> The problem with parsing HTML with RegEx is, that there are so many edge
cases that will break it. One "<" in the content is filtering out
everything until the next closing ">" for example ...
>
> See: https://stackoverflow.com/a/1732454
>
> And with the example from the MariaDB knowledge base:
https://regex101.com/r/CY0zuJ/1 (just added "5 < 1" in the content).
Okay. Then we can change the code back to:
{{{#!php
<?php
function tl_custom_search_query($search, $query)
{
if (!is_search() || !$query->is_main_query()) {
return $search;
}
global $wpdb;
$search_query = get_search_query();
$search_query = $wpdb->esc_like($search_query);
// Remove the original search condition
$search = preg_replace("/\({$wpdb->posts}.post_content LIKE
'%[^%]+%'\)/", "", $search);
// Add a custom search condition
$search .= " AND
REGEXP_REPLACE(REGEXP_REPLACE({$wpdb->posts}.post_content, '\"[^\"]*\"',
''), '<!-.*?->', '') LIKE '%{$search_query}%'";
return $search;
}
add_filter('posts_search', 'tl_custom_search_query', 10, 2);
}}}
I was making a search in a default wp install. (no plugins)
and i was a bit surprised that the search in the core is currently very
imprecise. If I search for "alt" then I get an article in the results that
has ONLY an alt tag (the html img attribute) in the code:
{{{
<!-- wp:paragraph -->
<p>i am a para </p>
<!-- /wp:paragraph -->
<!-- wp:paragraph -->
<p></p>
<!-- /wp:paragraph -->
<!-- wp:image {"id":321,"sizeSlug":"large","linkDestination":"none"} -->
<figure class="wp-block-image size-large paragraph"><img
src="https://olliedemo.local/wp-content/uploads/2023/03/about-
792x1024.png" alt="" class="wp-image-321"/></figure>
<!-- /wp:image -->
}}}
That shouldn't happen either.
Is the search currently in such a bad state?
--
Ticket URL: <https://core.trac.wordpress.org/ticket/56294#comment:15>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list