Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DOM HTML serialization for UTF-8 #16376

Merged
merged 2 commits into from
Oct 22, 2024

Commits on Oct 11, 2024

  1. Configuration menu
    Copy the full SHA
    d345c7e View commit details
    Browse the repository at this point in the history
  2. Add fast path for UTF-8 HTML serialization

    This patch adds a fast path to the HTML serialization encoding that has
    to encode to UTF-8. Because the DOM internally represents all strings
    using UTF-8, we only need to validate here.
    
    Tested on Wikipedia English home page on an i7-4790:
    ```
    Benchmark 1: ./sapi/cli/php x.php
      Time (mean ± σ):     516.0 ms ±   6.4 ms    [User: 511.2 ms, System: 3.5 ms]
      Range (min … max):   506.0 ms … 527.1 ms    10 runs
    
    Benchmark 2: ./sapi/cli/php_old x.php
      Time (mean ± σ):     682.8 ms ±   6.5 ms    [User: 676.8 ms, System: 3.8 ms]
      Range (min … max):   675.8 ms … 695.6 ms    10 runs
    
    Summary
      ./sapi/cli/php x.php ran
        1.32 ± 0.02 times faster than ./sapi/cli/php_old x.php
    ```
    
    (And if you're interested: it takes over a second on my machine using the old DOMDocument class)
    
    Future optimizations are certainly possible, but let's start here.
    nielsdos committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    6b77cf5 View commit details
    Browse the repository at this point in the history