HTTP Archive uses Wappalyzer to track the state of the web
The HTTP Archive tracks how the web is built. It provides historical data to quantitatively illustrate how the web is evolving over time. The project is open-source and backed by technology giants such as Google, Mozilla and Akamai.
The initiative is part of the Internet Archive, a nonprofit organisation that provides free public access to digitised materials, including millions of books, software applications and websites. Its web archive, the Wayback Machine, contains hundreds of billions of historical web pages.
The HTTP Archive evaluates the composition of millions of web pages on a monthly basis and makes its terabytes of metadata available for analysis on BigQuery, a petabyte-scale data warehouse by Google.
In one such example, Paul Calvano, web performance architect at Akamai, demonstrates how the dataset can be leveraged to analyse CPU times across JavaScript frameworks with the help of Wappalyzer.
Web Almanac
Once a year, the HTTP Archive produces a comprehensive 'state of the web' report called Web Almanac, backed by real data and trusted web experts. Its many chapters span aspects of page content, user experience, publishing, and distribution. While the Web Almanac is primarily the domain of developers and web architects, it's full of helpful stats for content managers, marketers and publishers. It's an invaluable resource to understanding how the web is trending.
Notable insights in 2019 include:
- jQuery is used on 85% of web pages
- 10% of pages are on an ecommerce platform
- 40% of pages are powered by a CMS
- Only 20% of sites serve HTML content via a CDN
- At the 90th percentile, 91% of bytes on pages are from media
In addition to tools such as Google's Lighthouse and Chrome UX Report, The HTTP Archive adopted Wappalyzer to segment reports and help break down the aggregate analysis of how the web is doing by various technologies. The Web Almanac project uses Wappalyzer extensively as a way to compare the performance of all technologies within a category, such as CMS and ecommerce.
The Web Almanac is the brainchild of Rick Viscomi, a software engineer and web performance evangelist at Google, who noted that Wappalyzer is well-aligned with their principles of web transparency because of its open and community-driven approach to software development.
WebPageTest
WebPageTest, created by Catchpoint software engineer Patrick Meenan, is a prominent web performance testing tool and the backbone of HTTP Archive. It makes use of Wappalyzer's profiling methods to analyse web pages and create security and performance reports.
Conclusion
Successful societies and institutions recognise the need to record their history, providing a way to review the past, find explanations for current behavior, and spot emerging trends. The HTTP Archive provides a permanent record of how digitised content is constructed and served.
Wappalyzer is proud to play a part and continues to work directly with core maintainers of the HTTP Archive project to add and improve technology fingerprints and methods of inspection.