[wp-meta] [Making WordPress.org] #2114: Possible abuse on popular themes list based on active installs
Making WordPress.org
noreply at wordpress.org
Tue Jul 25 11:16:09 UTC 2017
#2114: Possible abuse on popular themes list based on active installs
-----------------------------+------------------
Reporter: acosmin | Owner:
Type: defect | Status: new
Priority: low | Milestone:
Component: Theme Directory | Resolution:
Keywords: close |
-----------------------------+------------------
Comment (by dingdang):
Hello,
since this is a different ticket regarding the same problems which can be
easily solved I copy/paste my proposal from core's ticket here as well.
'''Regarding the topic of this ticket - if/after the proposed solution is
implemented, all the counts of "active installs" will automatically count
only themes (all of their versions) that are published at wordpress.org,
so this will correct even past cases in the directory and "Popular themes"
tab. (Ex: A theme that has hijacked other's active install will lose them
and will display only the number of active installs of that specific theme
and its versions at wordpress.org.)'''
https://core.trac.wordpress.org/ticket/14179#comment:44
Here is the version 2, no need of MD5 anymore and with an explanation in
How it works section, easy to understand by anyone.
'''Proposal for a solution to the “collisions” of WordPress themes.'''
'''Simplified Version 2.'''
'''Table of contents:'''
Changes compared to Version 1.
Introduction.
Formal composition of a unique ID.
API: determination of available theme updates.
Other: calculation of theme's active installs.
Benefits.
How it works.
Technical data.
Software changes.
'''Changes compared to Version 1.'''
- eliminated the need of the '''Author URI''' field
- eliminated the need to calculate MD5 hashes
- new section “How it works”
- new section “Software changes”
After analysis of the content of the current set of “native” to
wordpress.org themes and all of their versions (4876 themes, 56730
versions) a conclusion has been made that only two fields are needed in
the process: the '''theme slug''' and the '''author'''. The '''author
URI''' is redundant.
As a result the composition of the UIDs is simplified thus calculation of
MD5 hashes is unnecessary which simplifies even more the changes to the
system.
'''Introduction.'''
A '''collision''' is a term that is describing the slug match of two
themes that are not related to each other but have the same name.
'''Two main problems''' are related to these cases of collisions:
1. If there is a theme in the wordpress.org's database of themes and
another one, created by another author, the second one would get an
“Update” option and possibly will be replaced by the theme, published at
wordpress.org. This can happen also to well distributed themes after
uploading a new theme with the same name at wordpress.org and unexpectedly
after an unwanted update to replace themes of web sites published long
time before that.
2. Calculation of active installations is taking in count not just those
of the themes from wordpress.org's database, but as well other external
themes as well random child themes residing in a folder with a matching
name. Thus, authors exploit this to artificially place their new themes on
top of the list by catching names of long time distributed external
popular themes.
'''The proposed techniques solve all of the problems, with very little
coding, while keeping backward compatibility, and solving the related
problems for the old themes as well, not just the newly released.'''
'''Formal composition of a unique ID.'''
1. Need to chose a separator, that is currently not allowed to be present
in theme names. Ex: “|”, will be used below.
2. For every theme since WordPress 3.0 (and may be even earlier versions)
the core code is already reporting the following two strings:
- theme '''Slug''' (ex: nicetheme)
- theme '''Author''' (ex: John Doe). May not be present, if not – this is
an empty string.
3. Compose thе '''UID''': “slug|author”. Ex: “nicetheme|John Doe”
Since all of the '''two fields''' are present in the themes (trough
style.css) and are reported by WordPress (even by the very old versions)
there is no need to implement and add any new data to the themes (like
manually adding codes/hashes) nor to the code of the core or API to handle
them.
'''The invention:''' A one-time composition of the UIDs for the current
themes and all of their versions must be performed and store the list in a
table. For all new theme version updates and new theme uploads, the UID
will be composed and added to the same table if it's not existing already.
As the UID contains the theme slug as a prefix, it is trivial to relate a
given UID unambiguously to the theme slug if needed by extracting the
string that precedes the first occurrence of the separator. No other
relations need to be stored.
'''API: determination of available theme updates.'''
A small update (several lines of code) is needed to identify themes not by
just a slug, but by this new UID, checking in the table of UIDs. Only if
the UID is present the algorithm continues by identifying the theme slug
from the UID and checking as usual if there is newer version and if so –
to send back an “update available” reply.
'''Other: calculation of theme's active installs.'''
Active installations of a given theme are calculated by the sum of active
installations for all the UIDs related to that theme. This will result in
real numbers and the “Popular themes” list will be sorted using the real
numbers for the themes at wordpress.org, automatically excluding all the
counts related to external themes (the wrong current numbers will be
corrected to their true values).
'''Benefits.'''
- it is handled automatically;
- solves all the problems;
- fully backward compatible (old WP versions);
- solves the problem for the old existing themes as well;
- solves the "Active installs" count problem – active installs will count
automatically just the real active installs of the wordpress.org's theme
even for the old cases and exploits;
- theme authors don't have to do anything – no changes to style.css or
anywhere from their standpoint;
- external authors don't have to do anything to prevent their themes to be
messed by unwanted updates – no need for "private" tag;
- no need for changes in the core (unless for optimization);
- the check for updates at the backend (API) is almost the same, the
search is performed in a table of UIDs instead of theme slugs;
- since there is no change in the theme's structure and new fields, the
software updates related to the API and Active installations counting are
independent; can be done at different points in time;
- backward compatibility for the old versions of WordPress and old
versions of the themes w/o the need to change them which is the best part
of this proposal;
- handles well the cases where a theme is acquired by another author – the
theme will continue to catch updates;
- handles well the cases of themes distributed by an author prior
uploading it to wordpress.org – all previous installations will continue
to catch updates from wordpress.org.
'''With simple words – implementing it the proposed way will put
everything in place in a way like it was so from the beginning of
WordPress existence.'''
'''How it works.'''
- There are N themes "native" for wordpress.org (those that are currently
active) for which the UIDs are precomposed for all of their old and the
current versions in the SVN, and a table with that list is created;
'''only unique values are stored, they act like a database of
fingerprints, like humans can have 10 different fingerprints that link to
one and the same person''';
- There are a total of N*1.16 UIDs (that's because some themes have
"evolved" and got changed their authors);
- Which means that one theme is identified in general by more than one
UID;
- Any site with any of these UIDs is unambiguously linked by the API to
specific theme slug (the part that precedes the delimiter) and the API
sends back the new version as usual;
- Any external theme with the same name however comes with different UID
and so the API stops at that point where this UID is unknown (not present
in the table of UIDs) and as a result doesn't send back an update info,
nor counts this as an active install.
'''Technical data.'''
Some tests were performed to help on decisions.
1. There are:
- 4876 total themes at wordpress.org;
- 56730 total different versions;
- 11.6 average versions per theme;
- 1.16 the average ratio of different UIDs per theme (a single theme has
more than one related UID if the author has been changed over the time);
- 5600 (approximately) generated UIDs for the current themes (the new list
to search in, instead 4876), i.e. no difference in the CPU time needed to
process search requests.
'''Software changes.'''
This is a guess where in the system software updates are needed.
'''The API:'''
1. compose the UID based on slug, author
2. check in the table of native UIDs
3. if the UID is present, slug = the part that precedes the delimiter and
continue as usual
4. else, ignore that theme and continue (the same way it is ignored if the
slug is not present in wordpress' database of slugs now)
'''The "one time job":'''
1. foreach active themes and all of their versions in the SVN
2. read their style.css and compose the UID based on slug, author
3. store the UID in the table of UIDs (only if it's non existing)
'''On new theme/update approval:'''
1. compose the UID based on slug, author
2. store the UID in the table of UIDs (only if it's non existing)
'''The active themes counter/collector:'''
1. compose the UID based on slug, author
2. checks if it is present in the table of UIDs
3. only if it is present increase the counter for the slug which is the
part that precedes the delimiter
4. count in a second table the active installs for non-existing UIDs as
well (as it probably does now for non-existing slugs – to be able to
inform how much active installs has the newly uploaded theme so the
reviewer could investigate if it is a legitimate author that must be
linked to these copies, or someone uploaded someone else's theme)
'''The code that reports "currently has ... active installations"'''
1. it must report not just >500 cases but now the exact number of
installations of the exact UID match (which is for the exact combination
of slug, author) - we have this in the table 4. from the previous section
2. to prevent abuse on theme updates – if there is an author change (those
cases are very rare) and the number of active installations of that newly
composed UID is not 0 (or close to 0 taking in mind that there may be
testing installations of that version), it shouldn't be auto-approved by
themetrackbot but a reviewer must check manually the author's change in
style.css to avoid hijacking of an external theme's UID
''07/22/2017
by dingdang''
--
Ticket URL: <https://meta.trac.wordpress.org/ticket/2114#comment:19>
Making WordPress.org <https://meta.trac.wordpress.org/>
Making WordPress.org
More information about the wp-meta
mailing list