[wp-meta] [Making WordPress.org] #4450: Does WordPress.org Plugin Repo Elasticsearch function_score penalize plugins with fewer than one million installs?
Making WordPress.org
noreply at wordpress.org
Fri May 10 03:22:30 UTC 2019
#4450: Does WordPress.org Plugin Repo Elasticsearch function_score penalize
plugins with fewer than one million installs?
------------------------------+---------------------
Reporter: jadonn | Owner: (none)
Type: defect | Status: new
Priority: normal | Milestone:
Component: Plugin Directory | Resolution:
Keywords: |
------------------------------+---------------------
Description changed by dd32:
Old description:
> I was recently looking over the
> [https://meta.trac.wordpress.org/browser/sites/trunk/wordpress.org/public_html
> /wp-content/plugins/plugin-directory/libs/site-search/jetpack-
> search.php#L1001 source code for the Plugin Repo's Elasticsearch
> function_score query]. If I understand correctly, it seems like the query
> penalizes plugins with less than one million active installs, but the
> comments in the code suggest this should be otherwise. The filter clause
> in the Elasticsearch query applies the exponential decay scoring function
> to plugins with less-than-or-equal to 1000000 active installs. The
> exponential decay scoring function with a plugin with 500000 (five
> hundred thousand) active installs should look like this when plugging in
> all the values in accordance with
> [https://www.elastic.co/guide/en/elasticsearch/reference/current/query-
> dsl-function-score-query.html#exp-decay Elasticsearch's example]
> :
>
> custom score = e^((ln(decay)/scale) * max(0, |actual_value - origin| -
> offset))
>
> decay = 0.75
> scale = 900000
> actual_value = 500000
> origin = 1000000
> offset = 0
>
> e^((ln(0.75)/900000) * max(0, |500000 - 1000000| - 0))^ = 0.8522943134
>
> For Google Sheets:
> EXP(LN(0.75)/900000 * MAX(0, ABS(500000 - 1000000) - 0)) = 0.8522943134
>
> The resulting score is multiplied, along with other calculated factors,
> with the document relevance score Elasticsearch returns based on how well
> the search input matches the plugin text content. If my understanding of
> the exponential decay function is correct and if my math is correct, it
> appears that the resulting relevance document score for the plugin is
> going to be reduced to 85% of what it should otherwise be. This
> multiplier is not calculated or applied to plugins with more than 1000000
> active installs.
>
> If I have misunderstood this query scoring, I would be grateful to have
> my understanding and my math corrected.
New description:
I was recently looking over the
[https://meta.trac.wordpress.org/browser/sites/trunk/wordpress.org/public_html
/wp-content/plugins/plugin-directory/libs/site-search/jetpack-
search.php#L1001 source code for the Plugin Repo's Elasticsearch
function_score query]. If I understand correctly, it seems like the query
penalizes plugins with less than one million active installs, but the
comments in the code suggest this should be otherwise. The filter clause
in the Elasticsearch query applies the exponential decay scoring function
to plugins with less-than-or-equal to 1000000 active installs. The
exponential decay scoring function with a plugin with 500000 (five hundred
thousand) active installs should look like this when plugging in all the
values in accordance with
[https://www.elastic.co/guide/en/elasticsearch/reference/current/query-
dsl-function-score-query.html#exp-decay Elasticsearch's example]:
{{{
custom score = e^((ln(decay)/scale) * max(0, |actual_value - origin| -
offset))
decay = 0.75
scale = 900000
actual_value = 500000
origin = 1000000
offset = 0
e^((ln(0.75)/900000) * max(0, |500000 - 1000000| - 0))^ = 0.8522943134
}}}
For Google Sheets:
{{{EXP(LN(0.75)/900000 * MAX(0, ABS(500000 - 1000000) - 0)) =
0.8522943134}}}
The resulting score is multiplied, along with other calculated factors,
with the document relevance score Elasticsearch returns based on how well
the search input matches the plugin text content. If my understanding of
the exponential decay function is correct and if my math is correct, it
appears that the resulting relevance document score for the plugin is
going to be reduced to 85% of what it should otherwise be. This multiplier
is not calculated or applied to plugins with more than 1000000 active
installs.
If I have misunderstood this query scoring, I would be grateful to have my
understanding and my math corrected.
--
--
Ticket URL: <https://meta.trac.wordpress.org/ticket/4450#comment:3>
Making WordPress.org <https://meta.trac.wordpress.org/>
Making WordPress.org
More information about the wp-meta
mailing list