[wp-trac] [WordPress Trac] #20057: Media upload for multi-webserver setups introduces a nasty race condition that could corrupt uploaded files
WordPress Trac
wp-trac at lists.automattic.com
Sun Feb 19 02:59:28 UTC 2012
#20057: Media upload for multi-webserver setups introduces a nasty race condition
that could corrupt uploaded files
--------------------------+------------------------------
Reporter: archon810 | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Media | Version: 3.3.1
Severity: major | Resolution:
Keywords: |
--------------------------+------------------------------
Comment (by archon810):
@dd32, I think you're misunderstanding the issue. The problem is not that
there's a collision, but that depending on the state of the rsync between
the 2 servers, the file name that is going to be picked as the result of
the collision detection is based on what's on the current server's disk.
So, if it's missing a file (or 10), the name may get selected that was
already assigned on server A, thus kind of poisoning the integrity of the
uploads.
A file with the same name will then exist on both servers but will
represent different images.
In fact, upon the next rsync, one of them will win and eat the other,
depending on the direction of the sync. This is why keeping track of file
names in the database and using the centralized database for name conflict
resolution is crucial. There's only 1 database and many servers.
I think this request is very reasonable and has merits of a true HA setup.
A single separate server becomes both your point of failure and point of
scaling issues (in case it alone can't handle the traffic). Having a
network filestore introduces high complexity and goes against KISS that
I'm trying to use here (a good old fashioned proven rsync vs a network
file system or a file system that replicates have a lot of idiosyncrasies
and pitfalls. Debugging them could turn into a nightmare). I don't have
the necessary means to set up a SAN or NAS in the colo, and file systems
like gluster are prone to getting out of sync the same way rsync would,
resulting in the same issue.
I also don't want to rely on plugins, like W3TC, to upload all files to a
push CDN, since that makes me 100% reliant on that plugin and potentially
destroys the site infrastructure if I ever need to switch it off.
So with that in mind, I first need to make sure you guys understand what
the core issue is, and then urge you to consider a solution in the core.
As for the solution, naming uploaded files on disk based on the same
algorithm as the post-name column rather than the file on disk would be
ideal. For example, a recent upload I am looking at has post-name in the
database "image-png-7096" while its on-disk name is
uploads/2012/02/image75.png. If it were named image-7096.png or image-
png-7096.png, it'd solve all our problems (so long as there's no race
condition between querying the database and adding the new entry into it,
in case 2 posts are getting uploaded at the same time - but I think it's
already a solved problem).
--
Ticket URL: <http://core.trac.wordpress.org/ticket/20057#comment:3>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list