[wp-trac] [WordPress Trac] #61545: HTML API: Performance Improvements for 6.7

WordPress Trac noreply at wordpress.org
Mon Jul 1 21:41:33 UTC 2024


#61545: HTML API: Performance Improvements for 6.7
--------------------------------------+--------------------------
 Reporter:  dmsnell                   |       Owner:  (none)
     Type:  enhancement               |      Status:  new
 Priority:  normal                    |   Milestone:  6.7
Component:  HTML API                  |     Version:  trunk
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:  performance
--------------------------------------+--------------------------
Description changed by dmsnell:

Old description:

> The HTML API is already efficient, but it can (possibly) be better.
>
> There are multiple ways to potentially improve the performance of the Tag
> Processor, HTML Processor, and HTML Decoder. This ticket is a tracking
> ticket for those experiments and changes.

New description:

 The HTML API is already efficient, but it can (possibly) be better.

 There are multiple ways to potentially improve the performance of the Tag
 Processor, HTML Processor, and HTML Decoder. This ticket is a tracking
 ticket for those experiments and changes

 == Experiments

 === Optimize low-level details: [https://github.com/WordPress/wordpress-
 develop/pull/6890 #6890]

 **Hypothesis**: by auditing various low-level functions and adding a few
 micro-optimizations, the HTML Tag Processor will scan faster.

 **Testing results**: This change seems to have a marked improvement in
 scanning times, but since there are several changes incorporated into the
 patch it's unclear if any specific change was dominant. Of interest are a
 few places in the hot patch where a branch was removed.

 Improvement in the token-scanning measures between 3.5% and 7.5% on
 average, a small tail of documents are slower, and a long tail are much
 faster, even above 15% faster. It's unclear what exactly directs the
 performance behaviors, but it's complicated and document-dependent.

 **Conclusion**:

         - Merge this patch.
         - Continue trying to build a model for what directs the
 performance behaviors.

 === Replace the attribute associative array with a simple list.
 [https://github.com/WordPress/wordpress-develop/pull/5774 #5774]

 **Hypothesis**: by replacing the associative array and hash lookup of
 parsed attributes with a numeric array the Tag Processor will perform less
 work and go faster. This should be accomplished by skipping that hash
 lookup for the associative array of known attributes and by skipping the
 comparable lookup for detecting duplicates. In HTML, the first attribute
 is the real one, so it's okay to track a location for every duplicate
 attribute and simply find the //first// parsed one as the real attribute.

 **Testing results**: TBD.

 **Conclusion**:

         - Need to update the PR to rebase against the current `trunk`.
         - Need to run benchmarks against one of the large datasets, like
 the `top100` list.

--

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/61545#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list