[wp-trac] [WordPress Trac] #56598: Performance optimization in _update_post_term_count()

Fri Sep 1 18:24:05 UTC 2023

#56598: Performance optimization in _update_post_term_count()
-------------------------------------------+--------------------------
 Reporter:  dimitrisv                      |       Owner:  (none)
     Type:  enhancement                    |      Status:  new
 Priority:  normal                         |   Milestone:  6.4
Component:  Taxonomy                       |     Version:  6.0.2
 Severity:  normal                         |  Resolution:
 Keywords:  needs-patch reporter-feedback  |     Focuses:  performance
-------------------------------------------+--------------------------

Comment (by dimitrisv):

 Hello,

 I am sorry for the delay - my apologies.

 When I started this thread I was dealing with a WP installation of a news
 site, with over 70million rows in their post-meta table. Since then I have
 delivered the project I have only a backup of a "mere" 20million rows in
 postmeta, 350k in posts.

 It did make a lot of difference especially, given that the default indexes
 were inadequate. Theoretically speaking it is the right thing to do.

 Here is what chat GPT said about the comment by @peterwilsoncc

 {{{
 Based on your description, it seems you're discussing a database-related
 query optimization topic.

 In SQL, `count(*)` and `count(field)` are two different ways to count rows
 based on certain criteria:

 1. `count(*)`: This counts all rows irrespective of whether they have null
 values or not. It's used when you just want to know the number of rows in
 a result set.

 2. `count(field)`: This counts only the rows where the specified `field`
 has a non-null value.

 Factors that might influence the cost between `count(*)` and
 `count(field)`:

 1. **Table Structure**: If a table doesn't have any nullable columns, then
 `count(*)` and `count(any_field)` would theoretically have similar costs.

 2. **Indexes**: If a field/column you're using in `count(field)` is
 indexed, the database might retrieve the count more efficiently compared
 to counting all rows with `count(*)`.

 3. **Database Optimization**: Modern databases might have optimizations in
 place for certain operations. For instance, some databases might have a
 quick way to determine the row count for a table without scanning all
 rows.

 4. **Storage Engine**: Some database storage engines might have
 optimizations that make one type of count faster than the other.

 How to analyze the costs:
 For example, in MySQL, you can prefix your query with `EXPLAIN` to see an
 execution plan:

 ```sql
 EXPLAIN SELECT count(*) FROM your_table WHERE some_condition;
 EXPLAIN SELECT count(field) FROM your_table WHERE some_condition;
 ```

 ...

 It's worth noting that the actual difference in execution cost between
 `count(*)` and `count(field)` may vary depending on the specific database,
 the data distribution, and other factors. So it's always a good idea to
 test in your own environment and use tools like EXPLAIN to understand
 what's going on under the hood.

 }}}

 What might have made the difference so noticeable in this case is the
 usage of many Advanced Custom Fields (with a lot of postmeta entries -
 most are empty or null).

 Since then I have migrated them to pods (using "real" table storage
 outside postmeta) so the difference is not pronounced.

 I hope this helps.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/56598#comment:12>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform