首页|Scanning HTML at Tens of Gigabytes Per Second on ARM Processors

Scanning HTML at Tens of Gigabytes Per Second on ARM Processors

扫码查看
Background: Modern processors feature Single Instruction, Multiple Data (SIMD) instructions capable of processing 16 bytes or more simultaneously, enabling significant performance enhancements in data-intensive tasks. Two major Web browser engines (WebKit and Blink) have adopted SIMD algorithms for parsing HTML. Objective: This study reviews recent advances in utilizing SIMD instructions to accelerate HTML parsing through vectorized classification techniques. Methods: We compare these HTML parsing techniques with a faster alternative. Performance is benchmarked against traditional methods on recent ARM processors. Results: Our measurements demonstrate a 20-fold performance improvement in HTML scanning using SIMD-based approaches compared to conventional parsing methods on modern ARM architectures. Conclusion: These findings underscore the transformative potential of SIMD-based algorithms in optimizing Web browser performance, offering substantial speedups for processing Internet formats and HTML parsing.

text parsingvectorizationweb performance

Daniel Lemire

展开 >

Data Science Research Center, University du Quebec (TELUQ), Montreal, Quebec, Canada

2025

Software, practice & experience

Software, practice & experience

ISSN:0038-0644
年,卷(期):2025.55(7)
  • 14