[wp-trac] [WordPress Trac] #7068: cron improvement
WordPress Trac
wp-trac at lists.automattic.com
Fri May 30 17:02:57 GMT 2008
#7068: cron improvement
---------------------+------------------------------------------------------
Reporter: hailin | Owner: anonymous
Type: defect | Status: new
Priority: high | Milestone: 2.7
Component: General | Version:
Severity: normal | Keywords:
---------------------+------------------------------------------------------
There are several key issues associated with current cron implementation.
1. cron is not atomic.
Every page load will call wp_cron(), check the first timestamp in cron
array, if it has expired, it calls spawn_cron which calls wp-cron.php to
do fire up the jobs.
This runs into massive concurrency issue on a large system with hundreds
of servers, where millions of pages views are generated every day.
The current method to address this issue is in wp-cron.php:
if ( get_option('doing_cron') > $local_time )
exit;
update_option('doing_cron', $local_time + 30);
However, the check does not solve the issues resulted from concurrency.
Example:
On a busy site, in the particular second when first cron timestamp is
expiring, there are 10 blog page loads on 10 different servers.
Suppose process#1 on server #1 goes first, yet before it has reached
update_option('doing_cron', $local_time + 30), process #2 on server#2
begins the sequence too.
Since ‘doing_cron” is still being updated by the process#1, or the
updated value has not taken effect yet (due to db or cache delays, several
milliseconds or longer usually) , process#2 will pass if (
get_option('doing_cron') > $local_time )
Check and also update_option('doing_cron', $local_time + 30). So both
processes will proceed to fire up the cron job.
I’ve observed that on a popular blog on a busy production site, ANY cron
job was executed 5-7 times! That may be ok for publish_future_post
operation, but may not be good for other cron tasks.
An ideal solution is to guarantee every cron is executed once and once
only.
I can envision storing all cron jobs in a central table, then a daemon
processes it on a PARTICULAR server. Yet this approach may not be as
flexible as it may not handle blog-specific jobs well.
A practical solution is to make the cron operation as atomic as possible,
knowing that we can never make it truly atomic as there will be database
and cross-data center communication delays.
2. Server timers are not always correct
Because cron job condition is tested on every blog page load on every
server. Any server with a bad clock can ruin the cron jobs, causing
future posts being published earlier or never being published.
We can build in some protection mechanism to guard against this.
3. Minor issue
Calling time() in multiple places in cron operation chain can be tricky on
a busy server, as each call can give different values if the server is
overloaded. Passing the first timestamp at cron entry point is logically
sound.
4. Lack of a central standard time source
Server timer drifting issue caused by power outage, etc poses a
fundamental challenge. Software can not prevent hardware failure, and can
only do so much to adapt to those failure cases.
--
Ticket URL: <http://trac.wordpress.org/ticket/7068>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list