[wp-trac] [WordPress Trac] #15148: Cron Storage Abstraction
WordPress Trac
noreply at wordpress.org
Tue Feb 10 07:37:05 UTC 2015
#15148: Cron Storage Abstraction
------------------------------------+-----------------------------
Reporter: ryan | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Future Release
Component: Cron API | Version:
Severity: major | Resolution:
Keywords: has-patch dev-feedback | Focuses:
------------------------------------+-----------------------------
Changes (by archon810):
* severity: normal => major
Comment:
Hello gents,
Any movement here? Let me tell you a wonderful tale that made me stay up
till 8am while the site and all servers were on fire, all due to cron, as
I found out eventually. It proved once again that using a single
serialized array in wp_options is a terrible, horrible, evil idea.
The sequence of events was as follows.
1. On a busy site (15+mln pageviews a month), cron is used relatively
extensively, but within reason.
2. Heavy load leads to some db performance issues.
3. A sneaky bug in one of our plugins results in a single cron entry being
scheduled by probably 10% of the requests. Usually, the only side effect
would be that this cron entry would stick around in the cron job list
because it gets reinserted over and over.
4. These db performance issues result in both slow writes and reads, which
in turn means that due to a race condition, wp-cron gets read and written
over and over, overwriting anything the jobs that are in progress and
potentially even already complete and putting them back in. It's easy to
see how that turns into a nightmare. But it gets worse. Btw, I have to
mention that due to cron performance issues in the past, I've long
switched off the built-in wp cron and enabled a Linux cron-initiated wp-
cron.
5. The heightened load and constant cron overwriting grows the cron job
list to about 450 jobs, at least that was the state I found it in. At this
point, just the cron array is 1.8MB, and the servers are burning way up.
6. Did I mention how things manage to get more fun? The db server is
actually a master in a master-slave configuration, and it's starting to
spit out binlogs like it's nobody's business. I'm talking a gig per minute
at its worst. Network is burning up too, but... then the master runs out
of space after eating through 100GB of space in a matter of 2-3 hours.
7. Oh, and additionally, the huge increases in cron array size (i.e. the
call to get_options()) ends up in tons of OOM errors in the database, at
which point Wordpress does wonderful things like considering the query
result as returning empty. At that point, the site may switch to the
"Hiii, it's my first install, here's an install page where you can create
an admin user" message, various plugin settings get reset, and other fun
stuff. It's really really fun. I can't recommend enough trying this once,
but don't forget to take some speed first.
8. At this point replication breaks, but the slave remains online, which
was the only saving grace and together with W3 Total Cache kept the site
mostly up, as long as pages remained in the cache.
It took me many hours to finally figure out what's going on, fix the cron
bug, delete the ballooned cron job list, re-initiate replication, restore
plugin settings from a backup, and watch the sun rise.
Yes, there was a bug that was writing a cron entry too frequently. But if
the cron system in WP used a dedicated table, none of this would have
happened. It would be way more robust and not easily broken by race
conditions.
I sincerely hope this ticket gets some action for a release this year. It
would seriously make my day/month/year.
Thanks for reading.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/15148#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list