[wp-trac] [WordPress Trac] #15148: Cron Storage Abstraction

WordPress Trac noreply at wordpress.org
Tue Feb 10 07:37:05 UTC 2015


#15148: Cron Storage Abstraction
------------------------------------+-----------------------------
 Reporter:  ryan                    |       Owner:
     Type:  enhancement             |      Status:  new
 Priority:  normal                  |   Milestone:  Future Release
Component:  Cron API                |     Version:
 Severity:  major                   |  Resolution:
 Keywords:  has-patch dev-feedback  |     Focuses:
------------------------------------+-----------------------------
Changes (by archon810):

 * severity:  normal => major


Comment:

 Hello gents,

 Any movement here? Let me tell you a wonderful tale that made me stay up
 till 8am while the site and all servers were on fire, all due to cron, as
 I found out eventually. It proved once again that using a single
 serialized array in wp_options is a terrible, horrible, evil idea.

 The sequence of events was as follows.

 1. On a busy site (15+mln pageviews a month), cron is used relatively
 extensively, but within reason.
 2. Heavy load leads to some db performance issues.
 3. A sneaky bug in one of our plugins results in a single cron entry being
 scheduled by probably 10% of the requests. Usually, the only side effect
 would be that this cron entry would stick around in the cron job list
 because it gets reinserted over and over.
 4. These db performance issues result in both slow writes and reads, which
 in turn means that due to a race condition, wp-cron gets read and written
 over and over, overwriting anything the jobs that are in progress and
 potentially even already complete and putting them back in. It's easy to
 see how that turns into a nightmare. But it gets worse. Btw, I have to
 mention that due to cron performance issues in the past, I've long
 switched off the built-in wp cron and enabled a Linux cron-initiated wp-
 cron.
 5. The heightened load and constant cron overwriting grows the cron job
 list to about 450 jobs, at least that was the state I found it in. At this
 point, just the cron array is 1.8MB, and the servers are burning way up.
 6. Did I mention how things manage to get more fun? The db server is
 actually a master in a master-slave configuration, and it's starting to
 spit out binlogs like it's nobody's business. I'm talking a gig per minute
 at its worst. Network is burning up too, but... then the master runs out
 of space after eating through 100GB of space in a matter of 2-3 hours.
 7. Oh, and additionally, the huge increases in cron array size (i.e. the
 call to get_options()) ends up in tons of OOM errors in the database, at
 which point Wordpress does wonderful things like considering the query
 result as returning empty. At that point, the site may switch to the
 "Hiii, it's my first install, here's an install page where you can create
 an admin user" message, various plugin settings get reset, and other fun
 stuff. It's really really fun. I can't recommend enough trying this once,
 but don't forget to take some speed first.
 8. At this point replication breaks, but the slave remains online, which
 was the only saving grace and together with W3 Total Cache kept the site
 mostly up, as long as pages remained in the cache.

 It took me many hours to finally figure out what's going on, fix the cron
 bug, delete the ballooned cron job list, re-initiate replication, restore
 plugin settings from a backup, and watch the sun rise.

 Yes, there was a bug that was writing a cron entry too frequently. But if
 the cron system in WP used a dedicated table, none of this would have
 happened. It would be way more robust and not easily broken by race
 conditions.

 I sincerely hope this ticket gets some action for a release this year. It
 would seriously make my day/month/year.

 Thanks for reading.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/15148#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list