[wp-meta] [Making WordPress.org] #2114: Possible abuse on popular themes list based on active installs

Making WordPress.org noreply at wordpress.org
Tue Jul 25 11:16:09 UTC 2017


#2114: Possible abuse on popular themes list based on active installs
-----------------------------+------------------
 Reporter:  acosmin          |       Owner:
     Type:  defect           |      Status:  new
 Priority:  low              |   Milestone:
Component:  Theme Directory  |  Resolution:
 Keywords:  close            |
-----------------------------+------------------

Comment (by dingdang):

 Hello,

 since this is a different ticket regarding the same problems which can be
 easily solved I copy/paste my proposal from core's ticket here as well.

 '''Regarding the topic of this ticket - if/after the proposed solution is
 implemented, all the counts of "active installs" will automatically count
 only themes (all of their versions) that are published at wordpress.org,
 so this will correct even past cases in the directory and "Popular themes"
 tab. (Ex: A theme that has hijacked other's active install will lose them
 and will display only the number of active installs of that specific theme
 and its versions at wordpress.org.)'''

 https://core.trac.wordpress.org/ticket/14179#comment:44

 Here is the version 2, no need of MD5 anymore and with an explanation in
 How it works section, easy to understand by anyone.


 '''Proposal for a solution to the “collisions” of WordPress themes.'''
 '''Simplified Version 2.'''

         '''Table of contents:'''
         Changes compared to Version 1.
         Introduction.
         Formal composition of a unique ID.
         API: determination of available theme updates.
         Other: calculation of theme's active installs.
         Benefits.
         How it works.
         Technical data.
         Software changes.


 '''Changes compared to Version 1.'''

 - eliminated the need of the '''Author URI''' field
 - eliminated the need to calculate MD5 hashes
 - new section “How it works”
 - new section “Software changes”

 After analysis of the content of the current set of “native” to
 wordpress.org themes and all of their versions (4876 themes, 56730
 versions) a conclusion has been made that only two fields are needed in
 the process: the '''theme slug''' and the '''author'''. The '''author
 URI''' is redundant.
 As a result the composition of the UIDs is simplified thus calculation of
 MD5 hashes is unnecessary which simplifies even more the changes to the
 system.


 '''Introduction.'''

 A '''collision''' is a term that is describing the slug match of two
 themes that are not related to each other but have the same name.

 '''Two main problems''' are related to these cases of collisions:

 1. If there is a theme in the wordpress.org's database of themes and
 another one, created by another author, the second one would get an
 “Update” option and possibly will be replaced by the theme, published at
 wordpress.org. This can happen also to well distributed themes after
 uploading a new theme with the same name at wordpress.org and unexpectedly
 after an unwanted update to replace themes of web sites published long
 time before that.
 2. Calculation of active installations is taking in count not just those
 of the themes from wordpress.org's database, but as well other external
 themes as well random child themes residing in a folder with a matching
 name. Thus, authors exploit this to artificially place their new themes on
 top of the list by catching  names of long time distributed external
 popular themes.

 '''The proposed techniques solve all of the problems, with very little
 coding, while keeping backward compatibility, and solving the related
 problems for the old themes as well, not just the newly released.'''


 '''Formal composition of a unique ID.'''

 1. Need to chose a separator, that is currently not allowed to be present
 in theme names. Ex: “|”, will be used below.
 2. For every theme since WordPress 3.0 (and may be even earlier versions)
 the core code is already reporting the following two strings:
 - theme '''Slug''' (ex: nicetheme)
 - theme '''Author''' (ex: John Doe). May not be present, if not – this is
 an empty string.
 3. Compose thе '''UID''': “slug|author”. Ex: “nicetheme|John Doe”

 Since all of the '''two fields''' are present in the themes (trough
 style.css) and are reported by WordPress (even by the very old versions)
 there is no need to implement and add any new data to the themes (like
 manually adding codes/hashes) nor to the code of the core or API to handle
 them.

 '''The invention:''' A one-time composition of the UIDs for the current
 themes and all of their versions must be performed and store the list in a
 table. For all new theme version updates and new theme uploads, the UID
 will be composed and added to the same table if it's not existing already.

 As the UID contains the theme slug as a prefix, it is trivial to relate a
 given UID unambiguously to the theme slug if needed by extracting the
 string that precedes the first occurrence of the separator. No other
 relations need to be stored.


 '''API: determination of available theme updates.'''

 A small update (several lines of code) is needed to identify themes not by
 just a slug, but by this new UID, checking in the table of UIDs. Only if
 the UID is present the algorithm continues by identifying the theme slug
 from the UID and checking as usual if there is newer version and if so –
 to send back an “update available” reply.


 '''Other: calculation of theme's active installs.'''

 Active installations of a given theme are calculated by the sum of active
 installations for all the UIDs related to that theme. This will result in
 real numbers and the “Popular themes” list will be sorted using the real
 numbers for the themes at wordpress.org, automatically excluding all the
 counts related to external themes (the wrong current numbers will be
 corrected to their true values).


 '''Benefits.'''

 - it is handled automatically;
 - solves all the problems;
 - fully backward compatible (old WP versions);
 - solves the problem for the old existing themes as well;
 - solves the "Active installs" count problem – active installs will count
 automatically just the real active installs of the wordpress.org's theme
 even for the old cases and exploits;
 - theme authors don't have to do anything – no changes to style.css or
 anywhere from their standpoint;
 - external authors don't have to do anything to prevent their themes to be
 messed by unwanted updates – no need for "private" tag;
 - no need for changes in the core (unless for optimization);
 - the check for updates at the backend (API) is almost the same, the
 search is performed in a table of UIDs instead of theme slugs;
 - since there is no change in the theme's structure and new fields, the
 software updates related to the API and Active installations counting are
 independent; can be done at different points in time;
 - backward compatibility for the old versions of WordPress and old
 versions of the themes w/o the need to change them which is the best part
 of this proposal;
 - handles well the cases where a theme is acquired by another author – the
 theme will continue to catch updates;
 - handles well the cases of themes distributed by an author prior
 uploading it to wordpress.org – all previous installations will continue
 to catch updates from wordpress.org.

 '''With simple words – implementing it the proposed way will put
 everything in place in a way like it was so from the beginning of
 WordPress existence.'''


 '''How it works.'''

 - There are N themes "native" for wordpress.org (those that are currently
 active) for which the UIDs are precomposed for all of their old and the
 current versions in the SVN, and a table with that list is created;
 '''only unique values are stored, they act like a database of
 fingerprints, like humans can have 10 different fingerprints that link to
 one and the same person''';
 - There are a total of N*1.16 UIDs (that's because some themes have
 "evolved" and got changed their authors);
 - Which means that one theme is identified in general by more than one
 UID;
 - Any site with any of these UIDs is unambiguously linked by the API to
 specific theme slug (the part that precedes the delimiter) and the API
 sends back the new version as usual;
 - Any external theme with the same name however comes with different UID
 and so the API stops at that point where this UID is unknown (not present
 in the table of UIDs) and as a result doesn't send back an update info,
 nor counts this as an active install.


 '''Technical data.'''

 Some tests were performed to help on decisions.

 1. There are:
 - 4876 total themes at wordpress.org;
 - 56730 total different versions;
 - 11.6 average versions per theme;
 - 1.16 the average ratio of different UIDs per theme (a single theme has
 more than one related UID if the author has been changed over the time);
 - 5600 (approximately) generated UIDs for the current themes (the new list
 to search in, instead 4876), i.e. no difference in the CPU time needed to
 process search requests.


 '''Software changes.'''

 This is a guess where in the system software updates are needed.

 '''The API:'''

 1. compose the UID based on slug, author
 2. check in the table of native UIDs
 3. if the UID is present, slug = the part that precedes the delimiter and
 continue as usual
 4. else, ignore that theme and continue (the same way it is ignored if the
 slug is not present in wordpress' database of slugs now)

 '''The "one time job":'''

 1. foreach active themes and all of their versions in the SVN
 2. read their style.css and compose the UID based on slug, author
 3. store the UID in the table of UIDs (only if it's non existing)

 '''On new theme/update approval:'''

 1. compose the UID based on slug, author
 2. store the UID in the table of UIDs (only if it's non existing)

 '''The active themes counter/collector:'''

 1. compose the UID based on slug, author
 2. checks if it is present in the table of UIDs
 3. only if it is present increase the counter for the slug which is the
 part that precedes the delimiter
 4. count in a second table the active installs for non-existing UIDs as
 well (as it probably does now for non-existing slugs – to be able to
 inform how much active installs has the newly uploaded theme so the
 reviewer could investigate if it is a legitimate author that must be
 linked to these copies, or someone uploaded someone else's theme)

 '''The code that reports "currently has ... active installations"'''

 1. it must report not just >500 cases but now the exact number of
 installations of the exact UID match (which is for the exact combination
 of slug, author) - we have this in the table 4. from the previous section
 2. to prevent abuse on theme updates – if there is an author change (those
 cases are very rare) and the number of active installations of that newly
 composed UID is not 0 (or close to 0 taking in mind that there may be
 testing installations of that version), it shouldn't be auto-approved by
 themetrackbot but a reviewer must check manually the author's change in
 style.css to avoid hijacking of an external theme's UID

 ''07/22/2017
 by dingdang''

--
Ticket URL: <https://meta.trac.wordpress.org/ticket/2114#comment:19>
Making WordPress.org <https://meta.trac.wordpress.org/>
Making WordPress.org


More information about the wp-meta mailing list