[wp-trac] [WordPress Trac] #23256: Mitigate Plugins SVN Exponential Storage Growth
WordPress Trac
noreply at wordpress.org
Mon Jan 21 23:41:56 UTC 2013
#23256: Mitigate Plugins SVN Exponential Storage Growth
-----------------------------+--------------------------------
Reporter: bpetty | Type: defect (bug)
Status: new | Priority: normal
Milestone: Awaiting Review | Component: WordPress.org site
Version: | Severity: major
Keywords: |
-----------------------------+--------------------------------
We all know Subversion is terribly inefficient with storage compared to
most modern version control systems, however I've noticed a pretty serious
problem that goes way beyond this in the plugins SVN repo. Please forgive
me if some details are slightly off as obviously anything running WP.org
infrastructure is mostly a black box that no-one outside of Automattic or
Audrey has any insight into (and I'm lucky to have found this problem in
the first place).
If you take a look at the attached graph, you can see that the plugins
repository is growing exponentially by disk usage regardless of what rate
commits are coming in (but that has always been growing too, making this
worse). I am assuming the repository is using FSFS, though this is
actually still a problem if it were using BDB. All SVN repositories suffer
from this weakness if used the same way the plugins SVN repo is being
used.
The problem is that every SVN commit stores off node IDs of every sibling
node of every parent node of all nodes that have changed in that revision.
This means that a single commit to `/myplugin/trunk/readme.txt` contains
references to all files and directories (and their related revision) in
the `/myplugin/trunk` directory, the references to the `branches`, `tags`,
and `trunk` nodes in `/myplugin`, and finally references to every
directory in the root node (`/`) which means every single plugin in the
repository.
Since the root node is related to any changed node in every single commit,
and the list of plugins is constantly growing, this means that even though
the repository is somewhere around 450GB right now, the actual data in the
repo, including the full history, is only about 30GB. You can confirm with
a simple dump of the repository. The other 420GB or so is entirely wasted
space by SVN overhead.
If nothing is done in the next two years, the SVN repository is expected
to double in size to about 900GB, and it’s performance will quickly
degrade as the server takes longer to read revisions and the filesystem
cache can no longer be used (which I suspect is already the case now).
Another four years, and we could be looking at a 2TB Subversion repository
with every single commit being required to write about 8MB to disk even if
it's a one line change.
I know that any solution to this is going to take years to fully implement
mostly because I believe this is going to require plugin SVN URLs to
change during a migration at some point most likely. However, at the
least, we should be heading up this problem by getting new plugin
submissions started in their own repository rather than creating new
directories for them in the current plugins SVN repo. This would at least
stop the exponential growth of the plugins repository, extending it's
lifespan significantly.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/23256>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list