[wp-meta] [Making WordPress.org] #174: Link to generally related functions/classes
Making WordPress.org
noreply at wordpress.org
Mon Jun 26 14:46:01 UTC 2017
#174: Link to generally related functions/classes
---------------------------+-----------------------
Reporter: samuelsidler | Owner:
Type: task | Status: assigned
Priority: high | Milestone:
Component: Developer Hub | Resolution:
Keywords: has-patch |
---------------------------+-----------------------
Comment (by pbiron):
I started work a while back on trying to identify "related" references,
tho I've had to put it aside to work on other things recently.
The general idea I was exploring is based on the realization that most
(tho not all) function/method/class/hook names are of the form: `[Verb]
[Noun]`, e.g., `(add|get|update|delete)_post_meta()`, etc...where `add,
get, update, delete` are `Verbs` and `post_meta` is a `Noun`.
So, on import:
1. do phrase level parsing of function/method/class/hook names (stripping
stopwords, but only limited stemming)
1. do "part of speech" (POS) tagging of the phrases (see
[[http://phpir.com/part-of-speech-tagging|Part Of Speech Tagging]])
1. then, the "related" references are those with the same `Noun` but a
different `Verb`
Using this technique, I hope, will produce "related" references with a
much higher degree of
[[https://en.wikipedia.org/wiki/Precision_and_recall#Precision|Precision]]
than stemming alone; altho the recall would undoubtedly be lower.
Personally, getting 602 references "related" to `get_terms()` would be
less than useful.
Granted, the method I was working on requires **A LOT** of work up-front,
building/refining the POS lexicon. But once that up-front work is done,
the indexing process is relatively quick (and doesn't require human
input).
I built a mostly fully functioning plugin that provides a UI for assigning
POS to the phrases generated in step 1. The plugin's intended use is:
1. do an import from the sources (i.e., run `phpdoc-parser`), which
generates potential phrases for step 1 above
1. assign POS for each phrase (the plugin provides a UI that makes this
pretty easy)
1. iterate the process, refining the POS lexicon on each iteration
I'll try to find the time to get the plugin to the point where I can
release it and get others involved in refining the POS lexicon.
--
Ticket URL: <https://meta.trac.wordpress.org/ticket/174#comment:30>
Making WordPress.org <https://meta.trac.wordpress.org/>
Making WordPress.org
More information about the wp-meta
mailing list