[wp-trac] [WordPress Trac] #20057: Media upload for multi-webserver setups introduces a nasty race condition that could corrupt uploaded files

WordPress Trac wp-trac at lists.automattic.com
Sun Feb 19 02:59:28 UTC 2012


#20057: Media upload for multi-webserver setups introduces a nasty race condition
that could corrupt uploaded files
--------------------------+------------------------------
 Reporter:  archon810     |       Owner:
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Media         |     Version:  3.3.1
 Severity:  major         |  Resolution:
 Keywords:                |
--------------------------+------------------------------

Comment (by archon810):

 @dd32, I think you're misunderstanding the issue. The problem is not that
 there's a collision, but that depending on the state of the rsync between
 the 2 servers, the file name that is going to be picked as the result of
 the collision detection is based on what's on the current server's disk.
 So, if it's missing a file (or 10), the name may get selected that was
 already assigned on server A, thus kind of poisoning the integrity of the
 uploads.

 A file with the same name will then exist on both servers but will
 represent different images.

 In fact, upon the next rsync, one of them will win and eat the other,
 depending on the direction of the sync. This is why keeping track of file
 names in the database and using the centralized database for name conflict
 resolution is crucial. There's only 1 database and many servers.

 I think this request is very reasonable and has merits of a true HA setup.
 A single separate server becomes both your point of failure and point of
 scaling issues (in case it alone can't handle the traffic). Having a
 network filestore introduces high complexity and goes against KISS that
 I'm trying to use here (a good old fashioned proven rsync vs a network
 file system or a file system that replicates have a lot of idiosyncrasies
 and pitfalls. Debugging them could turn into a nightmare). I don't have
 the necessary means to set up a SAN or NAS in the colo, and file systems
 like gluster are prone to getting out of sync the same way rsync would,
 resulting in the same issue.

 I also don't want to rely on plugins, like W3TC, to upload all files to a
 push CDN, since that makes me 100% reliant on that plugin and potentially
 destroys the site infrastructure if I ever need to switch it off.

 So with that in mind, I first need to make sure you guys understand what
 the core issue is, and then urge you to consider a solution in the core.

 As for the solution, naming uploaded files on disk based on the same
 algorithm as the post-name column rather than the file on disk would be
 ideal. For example, a recent upload I am looking at has post-name in the
 database "image-png-7096" while its on-disk name is
 uploads/2012/02/image75.png. If it were named image-7096.png or image-
 png-7096.png, it'd solve all our problems (so long as there's no race
 condition between querying the database and adding the new entry into it,
 in case 2 posts are getting uploaded at the same time - but I think it's
 already a solved problem).

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/20057#comment:3>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list