[wp-trac] [WordPress Trac] #8384: Improving WP hierarchical data structure

WordPress Trac wp-trac at lists.automattic.com
Thu Nov 27 01:51:57 GMT 2008


#8384: Improving  WP hierarchical data structure
-------------------------+--------------------------------------------------
 Reporter:  hailin       |       Owner:  anonymous
     Type:  enhancement  |      Status:  new      
 Priority:  high         |   Milestone:  2.8      
Component:  General      |     Version:           
 Severity:  normal       |    Keywords:           
-------------------------+--------------------------------------------------
 Currently WordPress uses simple Adjacency List Model to manage
 hierarchical data, including pages, categories, terms.

 As the size of data sets grow larger and larger, we run into inherent
 limitations imposed by our current data structure.

 For example:

 When there are 5500 pages, in order to list a particular subset of pages,
 we have to retrieve all of them from the db (step 1), and enumerate
 through each one to construct a sub-tree (step 2).

 We did some algorithmic improvement before to improves part (#2) by
 reducing the complexity from  O(n^2) to O(n) in
 function page_rows($pages, $pagenum = 1, $per_page = 20)
  it reduced the processing time in step 2 from 20 sec to 0.3 sec in a case
 when there are ~2000 pages.

 However, step 1 is still a major hurdle, because limitations of our
 current pages data structure.

 In the case of  one blog with 5500+ pages, it takes 33 seconds to display
 wp-admin/edit-pages.php. In particular, 17 seconds are spent in
 update_post_caches(), and 11 seconds are spent in
 apply_filters('the_posts', $this->posts), simply because we are operating
 over all 5500+ pages!  And because the we only store rudimentary parent-
 child information in the table, we had to query the whole 5500 pages.

 If we improve the data structure and store the pages in efficient
 hierarchical order in db, then for every operation, we could retrieve only
 the ones we want, eg, 30 pages at a time. This can bring down the page
 load time from 33 seconds to sub-second, substantially improving user
 experience.

 The above is just one example, the same can be applied to any other cases
 involving
 hierarchical data sets.

 The potential change will have a lot of ripple effects, as it may affect a
 lot of other functions or even maybe themes, which depend on the existing
 behavior. So we should proceed cautiously and pay great attention to
 backward compatibility, etc.

 We could consider:
 Modified Preorder Tree Traversal Algorithm
 http://www.sitepoint.com/article/hierarchical-data-database/

 Or the ones recommended in the classical book:
  http://www.amazon.com/Hierarchies-Smarties-Kaufmann-Management-
 Systems/dp/1558609202/ref=pd_bxgy_b_img_b

 Besides, WordPress is evolving into a CMS, and that mandates us having a
 better foundation on which to manipulate various data formats.  Such a
 solid, efficient, elegant, robust data structure will serve as a strong
 foundation for us to evolve well into the future.

-- 
Ticket URL: <http://trac.wordpress.org/ticket/8384>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list