[wp-trac] [WordPress Trac] #8384: Improving WP hierarchical data
structure
WordPress Trac
wp-trac at lists.automattic.com
Thu Nov 27 01:51:57 GMT 2008
#8384: Improving WP hierarchical data structure
-------------------------+--------------------------------------------------
Reporter: hailin | Owner: anonymous
Type: enhancement | Status: new
Priority: high | Milestone: 2.8
Component: General | Version:
Severity: normal | Keywords:
-------------------------+--------------------------------------------------
Currently WordPress uses simple Adjacency List Model to manage
hierarchical data, including pages, categories, terms.
As the size of data sets grow larger and larger, we run into inherent
limitations imposed by our current data structure.
For example:
When there are 5500 pages, in order to list a particular subset of pages,
we have to retrieve all of them from the db (step 1), and enumerate
through each one to construct a sub-tree (step 2).
We did some algorithmic improvement before to improves part (#2) by
reducing the complexity from O(n^2) to O(n) in
function page_rows($pages, $pagenum = 1, $per_page = 20)
it reduced the processing time in step 2 from 20 sec to 0.3 sec in a case
when there are ~2000 pages.
However, step 1 is still a major hurdle, because limitations of our
current pages data structure.
In the case of one blog with 5500+ pages, it takes 33 seconds to display
wp-admin/edit-pages.php. In particular, 17 seconds are spent in
update_post_caches(), and 11 seconds are spent in
apply_filters('the_posts', $this->posts), simply because we are operating
over all 5500+ pages! And because the we only store rudimentary parent-
child information in the table, we had to query the whole 5500 pages.
If we improve the data structure and store the pages in efficient
hierarchical order in db, then for every operation, we could retrieve only
the ones we want, eg, 30 pages at a time. This can bring down the page
load time from 33 seconds to sub-second, substantially improving user
experience.
The above is just one example, the same can be applied to any other cases
involving
hierarchical data sets.
The potential change will have a lot of ripple effects, as it may affect a
lot of other functions or even maybe themes, which depend on the existing
behavior. So we should proceed cautiously and pay great attention to
backward compatibility, etc.
We could consider:
Modified Preorder Tree Traversal Algorithm
http://www.sitepoint.com/article/hierarchical-data-database/
Or the ones recommended in the classical book:
http://www.amazon.com/Hierarchies-Smarties-Kaufmann-Management-
Systems/dp/1558609202/ref=pd_bxgy_b_img_b
Besides, WordPress is evolving into a CMS, and that mandates us having a
better foundation on which to manipulate various data formats. Such a
solid, efficient, elegant, robust data structure will serve as a strong
foundation for us to evolve well into the future.
--
Ticket URL: <http://trac.wordpress.org/ticket/8384>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list