[wp-trac] [WordPress Trac] #23256: Mitigate Plugins SVN Exponential Storage Growth

WordPress Trac noreply at wordpress.org
Mon Jan 21 23:41:56 UTC 2013


#23256: Mitigate Plugins SVN Exponential Storage Growth
-----------------------------+--------------------------------
 Reporter:  bpetty           |       Type:  defect (bug)
   Status:  new              |   Priority:  normal
Milestone:  Awaiting Review  |  Component:  WordPress.org site
  Version:                   |   Severity:  major
 Keywords:                   |
-----------------------------+--------------------------------
 We all know Subversion is terribly inefficient with storage compared to
 most modern version control systems, however I've noticed a pretty serious
 problem that goes way beyond this in the plugins SVN repo. Please forgive
 me if some details are slightly off as obviously anything running WP.org
 infrastructure is mostly a black box that no-one outside of Automattic or
 Audrey has any insight into (and I'm lucky to have found this problem in
 the first place).

 If you take a look at the attached graph, you can see that the plugins
 repository is growing exponentially by disk usage regardless of what rate
 commits are coming in (but that has always been growing too, making this
 worse). I am assuming the repository is using FSFS, though this is
 actually still a problem if it were using BDB. All SVN repositories suffer
 from this weakness if used the same way the plugins SVN repo is being
 used.

 The problem is that every SVN commit stores off node IDs of every sibling
 node of every parent node of all nodes that have changed in that revision.
 This means that a single commit to `/myplugin/trunk/readme.txt` contains
 references to all files and directories (and their related revision) in
 the `/myplugin/trunk` directory, the references to the `branches`, `tags`,
 and `trunk` nodes in `/myplugin`, and finally references to every
 directory in the root node (`/`) which means every single plugin in the
 repository.

 Since the root node is related to any changed node in every single commit,
 and the list of plugins is constantly growing, this means that even though
 the repository is somewhere around 450GB right now, the actual data in the
 repo, including the full history, is only about 30GB. You can confirm with
 a simple dump of the repository. The other 420GB or so is entirely wasted
 space by SVN overhead.

 If nothing is done in the next two years, the SVN repository is expected
 to double in size to about 900GB, and it’s performance will quickly
 degrade as the server takes longer to read revisions and the filesystem
 cache can no longer be used (which I suspect is already the case now).
 Another four years, and we could be looking at a 2TB Subversion repository
 with every single commit being required to write about 8MB to disk even if
 it's a one line change.

 I know that any solution to this is going to take years to fully implement
 mostly because I believe this is going to require plugin SVN URLs to
 change during a migration at some point most likely. However, at the
 least, we should be heading up this problem by getting new plugin
 submissions started in their own repository rather than creating new
 directories for them in the current plugins SVN repo. This would at least
 stop the exponential growth of the plugins repository, extending it's
 lifespan significantly.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/23256>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list