Until recently, I just assumed you could put anything equivalent to an HTTP header in an http-equiv
meta tag, and browsers would treat it like the header itself. Maybe you thought the same thing—why wouldn’t you, with a name like that.
But as it turns out, there are actually very few standard values that you can set here. And some values don’t even behave the same way as their header equivalents! What’s going on here and how are we supposed to use this thing?
Let’s take this as an example:
<meta
http-equiv="X-UA-Compatible"
content="IE=edge">
Code language: HTML, XML (xml)
SERIOUSLY, WHAT DOES THIS DO? Why is it that if you load up any three random websites, one of them is bound to have this? And what does Internet Explorer have to do with anything anymore?
I could go on:
<meta
http-equiv="content-type"
content="text/html; charset=UTF-8">
Code language: HTML, XML (xml)
Is this even necessary? It sure looks important—I wouldn’t want my web page to not be parsed as text/html
.
Look, I know http-equiv
meta tags of all things are not what most people get too worried about. It’s easy to copy-paste boilerplate markup from one project to the next because of some unquestioned folklore about what meta tags all HTML documents need. And if it works, it works, right?
Sure, but I’d argue that having a deeper understanding of what our code does and how to use it properly and effectively makes us all better developers. We can save ourselves the trouble of reaching for the wrong tool at first, only to find out later after burning time on debugging that maybe the http-equiv
meta tag doesn’t do what we thought it does after all.
After a lot of researching and testing, I think I’m finally starting to get it. In this post I’ll share what the HTML spec says about http-equiv
, how sites are actually using it in the wild, and argue why you probably* don’t need http-equiv
meta tags.
👉 If you’d like to skip right to the takeaways, I’ve put together a cheatsheet with all of my http-equiv
keyword recommendations.
*Unless…
I’ll start by giving my best arguments for needing http-equiv
. I can break it down into two use cases: the response headers are hard or impossible to configure, and there might be tags added at runtime.
The first argument is about simplicity. If you’re deploying a static site somewhere like GitHub Pages, you don’t have control over the server or its response headers. If you need to set a header, your only choice is to use http-equiv
or to migrate your site somewhere else.
The other argument is more about flexibility. You might not know what you need until the page is already running on the client. Maybe a third party needs to add the http-equiv
meta tag for some feature to work.
These reasons don’t apply equally to all http-equiv
use cases, though. For example, some use cases unlock features that require server-side logic to work anyway, while others are only applicable when parsed directly from the static HTML.
You really need to understand what each value does in order to be sure that you’re using http-equiv
correctly. So let’s go back and see where it all started and how it’s supposed to be used today.
A brief history of http-equiv
In 1994, Roy Fielding proposed a new HTML element:
HTTP-EQUIV
This attribute binds the element to an HTTP response header. It means that if you know the semantics of the HTTP response header named by this attribute, then you can process the contents based on a well-defined syntactic mapping, whether or not your DTD tells you anything about it. HTTP header names are not case sensitive. If not present, the attribute
NAME
should be used to identify this metainformation and it should not be used within an HTTP response header.…
HTTP servers can read the content of the document
HEAD
to generate response headers corresponding to any elements defining a value for the attributeHTTP-EQUIV
. This provides document authors a mechanism (not necessarily the preferred one) for identifying information which should be included in the response headers for an HTTP request.…
One example of an inappropriate usage for the
META
element is to use it to define information that should be associated with an already existing HTML element, e.g.
<meta name="Title" content="The Etymology of Dunsel">
Code language: HTML, XML (xml)A second example of inappropriate usage is to name an
https://www.w3.org/MarkUp/html-spec/Elements/META.htmlHTTP-EQUIV
equal to a response header that should normally only be generated by the HTTP server. Example names that are inappropriate includeServer
,Date
, andLast-Modified
—the exact list of inappropriate names is dependent on the particular server implementation. It is recommended that servers ignore anyMETA
elements which specify http-equivalents which are equal (case-insensitively) to their own reserved response headers.
This is useful context to understand what the original intent of http-equiv
was and was not. It wasn’t meant to replace more semantic HTML elements like title
. It also wasn’t for HTTP headers that would have otherwise been more appropriately set by the server.
Unlike their actual usage today, http-equiv
meta tags were initially intended to be read by the server so that it can set the corresponding response headers. Nowadays though, they’re read by the user agent to parse and handle the document accordingly. The HTML spec calls these pragma directives.
Today, rather than permissively supporting any and all HTTP headers, the only standard keywords (specced values of the http-equiv
attribute) are, in their entirety:
Keyword | Standard | Conforming |
---|---|---|
content-language | ✅ | ❌ |
content-type | ✅ | ✅ |
default-style | ✅ | ✅ |
refresh | ✅ | ✅ |
set-cookie | ✅ | ❌ |
x-ua-compatible | ✅ | ✅ |
content-security-policy | ✅ | ✅ |
http-equiv
keywords, according to the HTML spec. (Source)That’s a pretty short list! Not only that, but two of them are actually non-conforming, meaning that using them is actively discouraged or even completely ignored by the browser. That leaves us with only five conforming http-equiv
keywords.
So, given that we’re all disciplined web developers, you wouldn’t expect to find anything improper in what people actually use this for, right?
Let’s look at the data.
For the rest of this post, I’ll be sharing stats from the June 2023 crawl of the public HTTP Archive dataset. Jump to the Methodology section for the queries and more info on the results.
Fun fact: the title
used in Fielding’s example is “The Etymology of Dunsel”. Dunsel is a fictional word from the Star Trek universe meaning useless, superfluous, or unnecessary. It’s an ominously fitting description for a lot of today’s http-equiv
usage, as you’ll see in the results below.
http-equiv
adoption
Of the 17,389,897 websites in HTTP Archive’s June 2023 crawl, 11,722,086—67%— of them contain an http-equiv
meta tag. That’s a huge proportion of the web, on par with a behemoth third party resource like Google Analytics.
|
http-equiv
meta tag (red). Each pixel represents 100 websites.Let’s dig deeper and see what the most popular http-equiv
keywords are:
Rank | Keyword | Sites | Standard | Conforming |
---|---|---|---|---|
1 | x-ua-compatible | 6,469,282 | ✅ | ✅ |
2 | content-type | 4,570,525 | ✅ | ✅ |
3 | origin-trial | 4,136,699 | ❌ | ❌ |
4 | content-language | 487,072 | ✅ | ❌ |
5 | cache-control | 441,570 | ❌ | ❌ |
6 | etag | 438,906 | ❌ | ❌ |
7 | x-wix-published-version | 438,666 | ❌ | ❌ |
8 | x-wix-application-instance-id | 438,665 | ❌ | ❌ |
9 | x-wix-meta-site-id | 438,665 | ❌ | ❌ |
10 | pragma | 390,770 | ❌ | ❌ |
11 | expires | 387,003 | ❌ | ❌ |
12 | accept-ch | 242,017 | ❌ | ❌ |
13 | content-style-type | 237,535 | ❌ | ❌ |
14 | x-dns-prefetch-control | 201,511 | ❌ | ❌ |
15 | content-script-type | 197,005 | ❌ | ❌ |
16 | imagetoolbar | 137,625 | ❌ | ❌ |
17 | content-security-policy | 100,552 | ✅ | ✅ |
18 | cleartype | 99,083 | ❌ | ❌ |
19 | refresh | 35,880 | ✅ | ✅ |
20 | keywords | 30,526 | ❌ | ❌ |
21 | last-modified | 19,515 | ❌ | ❌ |
22 | page-enter | 13,937 | ❌ | ❌ |
23 | x-xrds-location | 13,698 | ❌ | ❌ |
24 | description | 12,848 | ❌ | ❌ |
25 | msthemecompatible | 12,797 | ❌ | ❌ |
26 | encoding | 11,908 | ❌ | ❌ |
27 | x-rim-auto-match | 10,837 | ❌ | ❌ |
28 | reply-to | 9,788 | ❌ | ❌ |
29 | language | 9,292 | ❌ | ❌ |
30 | x-frame-options | 7,173 | ❌ | ❌ |
31 | content-location | 7,128 | ❌ | ❌ |
32 | copyright | 6,708 | ❌ | ❌ |
33 | window-target | 5,494 | ❌ | ❌ |
34 | page-exit | 5,335 | ❌ | ❌ |
35 | title | 5,050 | ❌ | ❌ |
36 | x-ua-compatiable | 4,867 | ❌ | ❌ |
37 | pics-label | 4,223 | ❌ | ❌ |
38 | screenorientation | 3,302 | ❌ | ❌ |
39 | mobile-agent | 3,255 | ❌ | ❌ |
40 | audience | 2,863 | ❌ | ❌ |
41 | access-control-allow-origin | 2,693 | ❌ | ❌ |
42 | cache | 2,632 | ❌ | ❌ |
43 | author | 2,535 | ❌ | ❌ |
44 | dc.description | 2,031 | ❌ | ❌ |
45 | robots | 1,757 | ❌ | ❌ |
46 | distribution | 1,592 | ❌ | ❌ |
47 | p3p | 1,553 | ❌ | ❌ |
48 | vary | 1,449 | ❌ | ❌ |
49 | x-webkit-csp | 1,426 | ❌ | ❌ |
50 | revisit-after | 1,335 | ❌ | ❌ |
51 | default-style | 1183 | ✅ | ✅ |
52 | charset | 1124 | ❌ | ❌ |
53 | x-xss-protection | 1017 | ❌ | ❌ |
54 | vw96.object type | 969 | ❌ | ❌ |
55 | x-content-type-options | 951 | ❌ | ❌ |
56 | no-cache | 931 | ❌ | ❌ |
57 | resource-type | 886 | ❌ | ❌ |
58 | referrer-policy | 830 | ❌ | ❌ |
59 | ”content-language” | 808 | ❌ | ❌ |
60 | cache-directive: no-cache | 765 | ❌ | ❌ |
61 | pragma-directive: no-cache | 765 | ❌ | ❌ |
62 | x-content-security-policy | 726 | ❌ | ❌ |
63 | generator | 718 | ❌ | ❌ |
64 | pragram | 706 | ❌ | ❌ |
65 | date | 602 | ❌ | ❌ |
66 | lang | 553 | ❌ | ❌ |
67 | content-encoding | 534 | ❌ | ❌ |
68 | [no value] | 530 | ❌ | ❌ |
69 | x-pjax-version | 513 | ❌ | ❌ |
70 | ”content-type” | 511 | ❌ | ❌ |
71 | strict-transport-security | 505 | ❌ | ❌ |
72 | delegate-ch | 492 | ❌ | ❌ |
73 | apple-mobile-web-app-capable | 475 | ❌ | ❌ |
74 | site-enter | 471 | ❌ | ❌ |
75 | creation-date | 469 | ❌ | ❌ |
76 | content-type-script | 451 | ❌ | ❌ |
77 | expire | 436 | ❌ | ❌ |
78 | keyword | 428 | ❌ | ❌ |
79 | ”x-ua-compatible” | 427 | ❌ | ❌ |
80 | “expires” | 424 | ❌ | ❌ |
81 | onion-location | 416 | ❌ | ❌ |
82 | “cache-control” | 413 | ❌ | ❌ |
83 | set-cookie | 406 | ✅ | ❌ |
84 | “pragma” | 398 | ❌ | ❌ |
85 | rating | 398 | ❌ | ❌ |
86 | x-pjax-js-version | 369 | ❌ | ❌ |
87 | x-pjax-csp-version | 369 | ❌ | ❌ |
88 | x-pjax-css-version | 369 | ❌ | ❌ |
89 | ”cache-control” | 360 | ❌ | ❌ |
90 | accept-encoding | 357 | ❌ | ❌ |
91 | permissions-policy | 356 | ❌ | ❌ |
92 | x-clacks-overhead | 346 | ❌ | ❌ |
93 | theme-color | 336 | ❌ | ❌ |
94 | x-ua-textlayoutmetrics | 333 | ❌ | ❌ |
95 | classification | 328 | ❌ | ❌ |
96 | site-exit | 299 | ❌ | ❌ |
97 | pragma-directive | 294 | ❌ | ❌ |
98 | format-detection | 279 | ❌ | ❌ |
99 | cache-directive | 277 | ❌ | ❌ |
100 | x-creyle-projectid | 274 | ❌ | ❌ |
… | 274 | ❌ | ❌ | |
137 | content-security-policy-report-only | 114 | ❌ | ❌ |
http-equiv
values.(HTTP Archive, June 2023. View full results. View query.)
The emoji in the last two columns indicate whether the value is either standard or conforming. It’s easy to see at a glance that there are a lot of non-standard values that are in use on thousands of websites.
The two most popular values do happen to be standard and conforming. But are they being used correctly? And are they actually necessary?
Let’s explore a few of the most interesting results.
Obsolete keywords
Rank | Keyword | Sites | Standard |
---|---|---|---|
1 |
| 6,469,282 | ✅ |
13 | content-style-type | 237,535 | ❌ |
15 | content-script-type | 197,005 | ❌ |
16 | imagetoolbar | 137,625 | ❌ |
18 | cleartype | 99,083 | ❌ |
22 | page-enter | 13,937 | ❌ |
25 | msthemecompatible | 12,797 | ❌ |
30 | x-frame-options | 7,173 | ❌ |
http-equiv
.(HTTP Archive, June 2023)
Many of the top http-equiv
keywords are bygone features of the Internet Explorer era. Official support for Internet Explorer, whose latest major version (IE 11) was released in October 2013, ended in June 2022.
As of July 2023, IE adoption is at an all-time low. According to Statcounter, 0.2% of web traffic comes from users on IE.
So the question is, if Microsoft won’t even support IE users, why should you?
x-ua-compatible
The most popular keyword is x-ua-compatible
. It’s standard, it conforms, but the spec is quite clear that it should have no effect in modern browsers:
In practice, this pragma encourages Internet Explorer to more closely follow the specifications.
For
meta
elements with anhttp-equiv
attribute in theX-UA-Compatible
state, thecontent
attribute must have a value that is an ASCII case-insensitive match for the string “IE=edge
“.User agents are required to ignore this pragma.
WHATWG HTML spec
The spec also requires that the value be exactly IE=edge
, so let’s see if the sites abide:
Value | Sites | Standard |
---|---|---|
ie=edge | 5,047,117 | ✅ |
ie=edge,chrome=1 | 1,258,286 | ❌ |
ie=emulateie7 | 27,752 | ❌ |
ie=7;ie=9;ie=10;ie=11 | 23,192 | ❌ |
ie=10 | 22,676 | ❌ |
ie=9 | 21,842 | ❌ |
chrome=1 | 18,033 | ❌ |
ie=9,chrome=1 | 12,401 | ❌ |
ie=9;ie=8;ie=7;ie=edge | 12,045 | ❌ |
ie=11 | 10,671 | ❌ |
ie=8 | 9,953 | ❌ |
ie=edge;chrome=1 | 7,666 | ❌ |
content
values for x-ua-compatible
.(HTTP Archive, June 2023. View query.)
The 11 content
values listed above make up 99% of the x-ua-compatible
usage. The most popular one is ie=edge
, which is the only standard value.
What’s the point, though? Are you testing your website in IE? Are you gracefully degrading down to decades-old HTML, CSS, and JavaScript? Is your website even remotely presentable to the 1 out of every 500 users on IE? No, your site is almost certainly not IE-compatible, and this meta tag isn’t a magic wand to make it so.
Modern web pages should not need x-ua-compatible
.
For a more detailed history of x-ua-compatible
, check out Almost (Standards) Doesn’t Count by Jay Hoffmann.
content-style-type
, content-script-type
Everyone knows <script>
means JavaScript and <style>
means CSS. But that wasn’t always the case.
Back in the day when the W3C’s HTML spec ruled the land, they wrote of the necessity to “specify the style sheet language of style information associated with an HTML document.” And similarly, everyone should “specify the default scripting language for all scripts in a document.”
Today though, none of this is necessary. You can write <script>
and all modern browsers knows you mean JavaScript.
You don’t need content-style-type
or content-script-type
in 2023.
x-frame-options
The thirtieth most used http-equiv
keyword is x-frame-options
, used by 7k sites. A site might set this keyword if they wanted to prevent their page from being surreptitiously embedded in a malicious page, for example for clickjacking purposes.
This one is obsoleted by the frame-ancestors
directive of the Content Security Policy API (CSP), which provides much more flexibility for security controls. The CSP directive is supported by all modern browsers, so there’s really no reason to continue using x-frame-options
.
Use CSP instead of x-frame-options
.
content-type
The next most popular keyword is content-type
, which, not to be confused with the script- and style-specific keywords above, is an alias for the charset
meta tag.
content-type
is used to declare the document’s character encoding and it’s used by about 4.6 million websites.
Note that the spec requires that pages must not contain both a http-equiv=content-type
meta tag and a charset
meta tag.
<meta
http-equiv="content-type"
content="text/html; charset=utf-8">
<meta
charset="utf-8">
Code language: HTML, XML (xml)
In other words, these two elements do exactly the same thing, but it’s invalid to have both of them.
Content-Type: text/html; charset=utf-8
Code language: HTTP (http)
Don’t forget about the Content-Type
HTTP header, which is yet another way to declare the character encoding of the document (among other things). The spec is unclear if it’s also invalid to have both the header and the http-equiv
versions, but I’m guessing it’s discouraged.
And there’s one other idiosyncrasy of the charset
and content-type
meta tags, which is that they must be included in the first 1024 bytes of the document. For example, this is one of the reasons why capo.js assigns the highest possible weight to charset
meta tags, otherwise a late-discovered character encoding could screw up the parsing, causing the browser to have to start over.
Content-Type | charset | http-equiv | Sites | Valid |
---|---|---|---|---|
✔️ | ✔️ | ✔️ | 801,404 | ❌ |
✔️ | ✔️ | 11,382,946 | ❓ | |
✔️ | ✔️ | 3,115,266 | ❓ | |
✔️ | 727,032 | ✅ | ||
✔️ | ✔️ | 98,656 | ❌ | |
✔️ | 1,228,396 | ✅ | ||
✔️ | 731,908 | ✅ | ||
223,932 | ❌ | |||
16,026,648 | 13,511,402 | 4,747,234 | 18,309,540 |
(HTTP Archive, June 2023. View query.)
A few things surprised me about these results:
- Despite so many pages using the
http-equiv=content-type
keyword, it’s a fraction of the usage that its alternatives get:Content-Type
is 3.4x more popular, andcharset
is 2.8x more popular. - Most pages (79%) are in this ambiguous validity zone denoted by the ❓, having both a
Content-Type
header and either of the meta character encoding declarations. - 1 in 20 pages are clearly invalid and redundantly declare both the
charset
andhttp-equiv
, or nothing at all.
Given the popularity of the alternatives, and the validation risks associated with the http-equiv
approach—again, I have to ask—what’s the point? For the relatively few sites (732k or 4%) that wouldn’t have otherwise declared a content encoding, they’d be better off going with the alternatives.
The Content-Type
header avoids the requirement for meta tags to be in the first 1024 bytes, and the charset
declaration is much more concise than http-equiv
. There’s no real advantage to using the content-type
keyword.
All HTML pages should set a character encoding. Prefer the Content-Type
HTTP header first, otherwise use the charset
meta tag in the first 1024 bytes.
origin-trial
The third most popular value is used by 4.1 million pages and it’s technically non-standard according to the spec: origin-trial
. I recently wrote a post called Origin trials and tribulations, which explores how they’re—often incorrectly—used to enable experimental web platform features. I recommend checking that out for a closer look at the stats behind individual origin trials.
The way they usually work is for a developer to sign up to use a particular experimental feature directly with a browser like Chrome, Edge, or Firefox. The browser gives the developer a token, which they serve from their website as a way to instruct the browser to enable that feature. The token can be served in one of two ways:
- As an HTTP header:
Origin-Trial: [token]
Code language: HTTP (http)
- As a meta tag:
<meta
http-equiv="origin-trial"
content="[token]">
Code language: HTML, XML (xml)
Firefox began supporting origin trials in early 2022. Until then, only Chromium browsers had supported it. And for that reason, the WHATWG hadn’t considered adding it to the HTML spec. However, while researching this post and learning about the status of the origin-trial
value, I left a comment on the spec issue recommending that they reconsider it. Now, having the support of at least two implementers, it meets the criteria and it sounds like they’re open to adding it.
Standardizing origin-trial
is a good idea. Unlike the previous http-equiv
keywords that we’ve looked at so far, in some cases it’s actually necessary to declare the origin trial token in the markup as opposed to the HTTP header equivalent. Third party origin trials can only be declared by dynamically injecting the meta tag into the main document. And, as discussed in my last blog post, third parties are responsible for 99% of origin trial usage.
Use either the Origin-Trial
HTTP header or the origin-trial
meta tag, whichever is more convenient. You must use the meta tag if you’re injecting a third party token into a page.
Cache headers
The http-equiv
keywords ranked 5, 6, 10, 11, and 21 are in a category of cache control headers.
Rank | Keyword | Sites | Standard |
---|---|---|---|
5 |
| 441,570 | ❌ |
6 | etag | 438,906 | ❌ |
10 | pragma | 390,770 | ❌ |
11 | expires | 387,003 | ❌ |
21 | last-modified | 19,515 | ❌ |
http-equiv
.(HTTP Archive, June 2023)
I’m going to skip over describing what each header does. See Prevent unnecessary network requests with the HTTP Cache for a broader overview of caching headers and strategies.
Recall that Fielding specifically called out last-modified
as an example of what not to do. Servers are better able to tell when a file was last modified or what the current time is, and developers should definitely not be hard-coding those things in their HTML.
We’ve already established that the original intent of http-equiv
was for servers to read the HTML and respond with the corresponding headers, so what’s wrong with declaring something like a cache-control
policy in the HTML? Other than being non-standard, I’m not even sure if servers actually behave like that or if they ever did. It wouldn’t be a great idea to have to parse the HTML on the server, due to the complexity and performance. Also, if there are proxies between the origin server and the client, they too would need to support that behavior.
We’ve just seen an example of a keyword that is non-standard, yet browsers support it anyway: origin-trial
. So maybe browsers support caching headers too? Spoiler: they don’t.
Amusingly, even the all-knowing AI gets this wrong:
None of these are standard http-equiv
keywords and as far as I know all modern browsers ignore them.
Use HTTP headers for cache directives, not http-equiv
.
Wix metadata
The keywords ranked 7–9 are all prefixed with x-wix
. And given that all three keywords are found on about 439k sites within a margin of ±1 site, I think it’s a safe bet that these are all set by the Wix CMS.
Rank | Keyword | Sites |
---|---|---|
7 | x-wix-published-version | 438,666 |
8 | x-wix-application-instance-id | 438,665 |
9 | x-wix-meta-site-id | 438,665 |
http-equiv
.(HTTP Archive, June 2023)
Just to be sure, the number of Wix websites in the dataset is very close, at 445,164 sites (view query), and the HTML on the Wix blog confirms this theory:
<meta
http-equiv="X-Wix-Meta-Site-Id"
content="058b1f4f-09cf-426f-bb00-cec48b9da4b0">
<meta
http-equiv="X-Wix-Application-Instance-Id"
content="7dbdad6e-27ef-44ac-8270-48f414db3dc8">
<meta
http-equiv="X-Wix-Published-Version"
content="3834"/>
<meta
http-equiv="etag"
content="bug"/>
<meta
http-equiv="X-UA-Compatible"
content="IE=edge">
Code language: HTML, XML (xml)
As an added bonus, it even looks like Wix is responsible for 99.9% of the high etag
usage and some (6.8%) of the x-ua-compatible
usage.
So, is this “valid” HTML?
No, definitely not.
Is there any harm to it? Well, the default behavior for a browser that doesn’t recognize an http-equiv
value is to ignore it, so no these are harmless.
But isn’t there a more semantic way to set metadata like these? Yes! It’s the <meta name=generator>
tag. Wix pages already have one of these, and it looks like this:
<meta
name="generator"
content="Wix.com Website Builder">
Code language: HTML, XML (xml)
There’s nothing wrong with having multiple generator
tags so it’d be more appropriate for Wix to use those instead of http-equiv
. That said, it’s needless work and there’s no real benefit to making this switch other than technical correctness, but hey that counts for something!
Use generator
meta tags for page metadata, not http-equiv
.
content-language
The eighth most popular http-equiv
keyword is content-language
, found on 487k sites. The HTML spec considers this keyword to be non-conforming and it recommends using the lang
attribute instead.
The important thing to know about the lang
attribute is that it can influence the UI of the page: fonts, pronunciation, dictionaries, date pickers, etc. For accessibility, the WCAG explicitly requires that all pages declare the document language, primarily using the lang
attribute. Setting the lang
attribute is one of those core web dev best practices, and you’ll find it baked into projects like HTML5 Boilerplate and create-react-app.
Curiously, the spec also includes this note:
This pragma is almost, but not quite, entirely unlike the HTTP
Content-Language
header of the same name.
As best I can tell, the semantic distinction between these two values is that the Content-Language
header indicates what language the reader is expected to speak, while the content-language
pragma indicates what language the page is written in.
Clients can also use the Accept-Language
header to politely ask for the content in their preferred language(s). The server will ideally take the client’s preferences into consideration when serving the response, as indicated by the Content-Language
header.
So to boil it down:
- If your web page is meant to be read by everyone, you should omit the
Content-Language
header. - If you serve multiple translations of the same resource, you should serve it with the appropriate
Content-Language
header based on the client’sAccept-Language
preference. - You should always set
<html lang>
to whatever language the document is written in. For example, this page is set to<html lang="en-US">
.
The specification for the lang
attribute describes the order of precedence for all three of these directives on a given HTML node:
- The
lang
attribute on the nearest ancestor - The value set by the
http-equiv
meta tag - The value set by HTTP headers
The spec doesn’t explicitly say that the language falls back to the Content-Language
header, just a “higher-level protocol” like HTTP. I’m going to assume that means the Content-Language
header as that’s really the entire point of the http-equiv
attribute.
We know how frequently the http-equiv
keyword is used, so let’s compare that with all of the other ways to set the content language:
Content-Language | lang | http-equiv | Sites | Valid |
---|---|---|---|---|
✔️ | ✔️ | ✔️ | 20,730 | ❓ |
✔️ | ✔️ | 1,573,439 | ✅ | |
✔️ | ✔️ | 7,009 | ❌ | |
✔️ | 107,572 | ❌ | ||
✔️ | ✔️ | 300,890 | ❓ | |
✔️ | 12,913,992 | ✅ | ||
✔️ | 172,239 | ❌ | ||
2,910,087 | ❌ | |||
1,708,750 | 14,809,051 | 500,868 | 18,005,958 |
(HTTP Archive, June 2023. View query.)
Most sites (14.8 million) opt to declare the content language by setting the lang
attribute. And the most popular combination of the three is the lang
attribute alone (12.9 million).
The second most popular combination of directives is to set no directives at all, as seen on 2.9 million sites. While it’s fine to omit the Content-Language
header, all documents should set a lang
attribute at a minimum. So this case is clearly invalid, but unlike the content-type
keyword, the spec isn’t as clear if it’s invalid for both lang
and content-language
to be set on the same document.
Don’t use http-equiv=content-language
. The spec recommends using the lang
attribute instead. If you need to support a resource in multiple languages, negotiate using the Accept-Language
request header and the Content-Language
response header to serve it in the client’s preferred language.
Client hints
The keywords ranked 12 and 72 are accept-ch
and delegate-ch
. They’re part of the Client Hints API available in Chromium-based browsers. You can learn more about the performance and privacy benefits of client hints.
This API allows browsers to provide servers with specific, opt-in information about the client like device capabilities or network conditions, which servers can use to adapt their content delivery, with headers like accept-ch
and delegate-ch
controlling the negotiation and delegation of these hints, respectively.
Rank | Keyword | Sites |
---|---|---|
12 | accept-ch | 242,017 |
72 | delegate-ch | 492 |
http-equiv
.(HTTP Archive, June 2023)
While these http-equiv
keywords are not standard as far as the HTML spec is concerned, the Chromium engine alone does support them.
As I was browsing through the list of sites that use accept-ch
, I noticed that a lot of them are hosted on the Squarespace domain. Given that this keyword depends entirely on server support, it makes sense that CMS hosts would be among its biggest power users. It turns out that sites hosted by Squarespace account for 97% of accept-ch
usage, specifically its http-equiv
adoption.
So how are these (mostly Squarespace) sites actually using it? What policy directives are they promoting to clients?
Directive | Type | Sites | Valid |
---|---|---|---|
sec-ch-ua-platform-version | user agent | 235,454 | ✅ |
sec-ch-ua-model | user agent | 235,442 | ✅ |
dpr | device | 6,073 | ✅ |
width | device | 5,959 | ✅ |
viewport-width | device | 5,724 | ✅ |
device-memory | device | 511 | ✅ |
downlink | network | 417 | ✅ |
save-data | network | 295 | ✅ |
ect | network | 289 | ✅ |
rtt | network | 228 | ✅ |
sec-ch-ua-platform | user agent | 139 | ✅ |
sec-ch-ua | user agent | 120 | ✅ |
sec-ch-ua-mobile | user agent | 111 | ✅ |
sec-ch-ua-full-version-list | user agent | 107 | ✅ |
sec-ch-ua-arch | user agent | 15 | ✅ |
sec-ch-ua-bitness | user agent | 10 | ✅ |
sec-ch-ua-full-version | user agent | 9 | ✅ |
sec-ch-prefers-color-scheme | media | 4 | ✅ |
sec-ch-prefers-contrast | media | 1 | ✅ |
sec-ch-prefers-reduced-motion | media | 1 | ✅ |
sec-ch-prefers-reduced-transparency | media | 1 | ✅ |
sec-ch-forced-colors | media | 1 | ✅ |
sec-ch-prefers-reduced-data | media | 0 | ✅ |
accept-ch
directives.(HTTP Archive, June 2023. View query.)
Note that “valid” is used loosely here to mean that they’re accepted by browsers that support client hints. They’re not necessarily supported from the HTML spec’s point of view. Also, keep in mind that these stats don’t take HTTP header adoption into account—these are only the sites that set it in http-equiv
specifically. Feel free to remix the query if you’re interested in HTTP header adoption.
It seems like the 97% of Squarespace accept-ch
usage is for two things: sec-ch-ua-platform-version
and sec-ch-ua-model
. These are part of the User-Agent Client Hints expansion pack that provide more secure access to the UA info. The rest of the UA directives are used much less frequently.
The second most popular pack of directives is the Device Client Hints, including: dpr
, width
, viewport-width
, and device-memory
. The first three are used on about 6k sites while device-memory
is used almost as much as the third most popular pack, Network Client Hints. These include downlink
, save-data
, ect
, and rtt
.
The usage patterns of device and network hints kind of makes sense if you think about the way these would be used. The three most popular device hints are all the visual ones that would be useful for responsive design. The network hints plus device-memory
are useful for performance, either as a diagnostic for analytics or as a predicate for serving lighter-weight content.
The least popular category are the user preference media features. The most popular directive in this category is sec-ch-prefers-color-scheme
, which gives websites the ability to serve pages in dark mode if that’s what the user prefers. I think it’s cool that this doesn’t have to be “figured out” by CSS on the client side, but I’m interested to see some real-world examples of it to understand how much more performance or simplicity it provides.
So now that we’ve seen how the http-equiv
keywords are used, are they even needed at all? The purpose of client hints is for the client to give the server information that it can use to deliver a better experience. For that to work, the server would need to process the client’s request header detailing its hints and serve the alternate version of the page. So it seems to me that any server capable of handling client hints ought to be capable of setting these as response headers as opposed to http-equiv
meta tags.
Maybe there are some valid reasons to dynamically inject these meta tags into the page? Well, as a security restriction, the WICG draft specifically calls out delegate-ch
as being ineffective when injected by JavaScript. It’s possible there are use cases for injecting accept-ch
but none come to mind.
If your server can handle client hints, it can declare its support using HTTP headers rather than relying on http-equiv
.
x-dns-prefetch-control
x-dns-prefetch-control
The fourteenth most popular http-equiv
keyword is x-dns-prefetch-control
, which is not yet standardized. It’s used by about 200k sites and it serves one and only one purpose: turn off the browser’s default behavior to speculatively prefetch the DNS record of URLs that the client is likely to need soon.
This behavior is very good for performance, especially for mobile users. It’s similar to the developer-controlled way of doing DNS prefetching, using another header tag:
<link
rel="dns-prefetch"
href="https://www.example.com">
Code language: HTML, XML (xml)
The key difference is that the browser will perform additional prefetches automatically to further improve the user’s loading performance.
There might be privacy concerns with prefetching domains that the user hasn’t actually indicated an intention to visit yet. And there might also be some performance concerns with resolving domain names that never get visited, for example maybe for a site that includes lots of links that users rarely follow. If a site owner chooses, they can disable this behavior by setting x-dns-prefetch-control
to off
.
It doesn’t make sense to set it to anything else, because the default behavior is for it to be on. Can you guess what the data says?
Value | Sites | Valid |
---|---|---|
on | 199,095 | ✅ |
off | 1,688 | ✅ |
[null] | 736 | ❌ |
TRUE | 2 | ❌ |
[empty string] | 1 | ❌ |
yes | 1 | ❌ |
text/html;charset=utf-8 | 1 | ❌ |
no | 1 | ❌ |
ie=edge,chrome=1 | 1 | ❌ |
x-dns-prefetch-control
.(HTTP Archive, June 2023. View query.)
Note that “valid” is used loosely here to mean that the values are accepted by browsers that support x-dns-prefetch-control
.
99% of sites that set this keyword are doing so unnecessarily by setting the value to on
. Again, that’s the default!
1% (1,688) sites are actually disabling DNS prefetching.
The rest of the sites are passing garbage values.
Do sites need this keyword? The vast majority of them certainly don’t. Even the few that do intentionally disable the feature could assumedly set the corresponding HTTP response header.
Only use x-dns-prefetch-control
if you have security or performance concerns with built-in DNS prefetching, in which case be sure to set the value to off
.
content-security-policy
Skipping to number 17, the content-security-policy
(CSP) meta tag is found on 101k sites. CSP helps to lock down all of the ways external content can be added to a page, which can make it vulnerable to attacks like cross-site scripting.
CSP is great and everyone should use a policy that works for them. The CSP meta tag is standard, it’s conforming, but I would strongly discourage anyone from using it.
On a good day, placing a script tag before a meta CSP would disable Chrome’s preload scanner. Alas, today is not a good day. While researching how all of this works, it turns out that I discovered a bug in Chrome that disables the preload scanner for all meta CSPs everywhere.
Even after the bug gets fixed, why risk it? The convenience of a meta tag is not worth the liability of losing one of the best performance optimizations that you can get for free. The Content-Security-Policy
HTTP header is a much safer option that behaves exactly the same way, without the performance risks.
Don’t use content-security-policy
. Use the HTTP header instead.
content-security-policy-report-only
content-security-policy-report-only
I know this is a top 100 list, but I had to tack number 137 on the end just to show how often this related value is used. Not only is the content-security-policy-report-only
value non-standard, but Chromium browsers will actually warn you if you try to use it. I guess that’s been helpful to drive down adoption; it’s set on only 114 websites.
This is a nice reminder that http-equiv
is not for arbitrary HTTP headers, despite what you might think. Even though the CSP header is a perfectly standard value for http-equiv
, its friend the CSP Report-Only header is unsupported. Go figure!
The CSP
—MDNreport-to
directive should be used with this header, otherwise this header will be an expensive no-op machine.
After taking a closer look at how sites are trying (and failing) to use this header, it actually looks like in every single case they mixed it up for the CSP header. There’s one thing that distinguishes this from the CSP header, and that’s the report-to
directive. When it’s set properly as an HTTP header, the reporting API will check resources against the enclosed policy and report violations to a given URL. “Otherwise,” as MDN brilliantly puts it, “this header will be an expensive no-op machine.” All 114 of the sites that use the –report-only
meta tag omit the necessary report-to
directive.
Don’t use content-security-policy-report-only
. Use the HTTP header instead.
refresh
The next value on the list is refresh
, found on only 28k sites. Get this—it refreshes the page 🤯
An example of this that you might be familiar with is the WebPageTest loading screen as a page is being tested:
<noscript>
<meta
http-equiv="refresh"
content="30">
</noscript>
Code language: HTML, XML (xml)
The noscript
wrapper means that users with JavaScript enabled would get a less disruptive progress update UX. For all others with JavaScript disabled, this ancient directive is the only other way I can think of to get the page to automatically refresh on an interval.
And that’s the big downside to refresh
: it’s disruptive. So much so that its redirection powers (refreshing to a different URL, technically) are discouraged by accessibility groups.
Here are some fun facts about how refresh
is used in the wild:
- 31% of sites that use
refresh
are using it for redirection. - The most popular redirect timeout is 5 seconds, used by 18% of sites that redirect.
- The most popular refresh timeout is 1800 seconds (30 minutes), used by 13% of sites that refresh.
- 4% of sites that use
refresh
don’t set a valid timeout value at all. - The largest timeout is 30,000,000,000,000,000,000,000,000,000,000 seconds.
To put that in perspective: you could visit this website and buy a Powerball ticket for every second you wait. And every time you hit the jackpot, you fill out a March Madness bracket. By the time the page refreshes, you will have correctly predicted 11,131 brackets.
There are much better alternatives for redirection than using the refresh
directive, like the HTTP 3xx status codes, as recommended by the Google Search docs and WCAG. And unless you really need to fall back to the primitive behavior like in WebPageTest’s case, an asynchronous JavaScript solution would be much less disruptive.
Only use refresh
when you really need to reload the current page and there are no other, less disruptive options available.
default-style
Skipping ahead again, we have default-style
at number 50, found on just 1k sites.
According to the CSSOM spec:
A preferred CSS style sheet set name is a concept to determine which CSS style sheets need to have their disabled flag unset. Initially its value is the empty string.
How does a stylesheet get disabled? That’s a side-effect of alternative stylesheets. If you set the rel
attribute of your link
tag to "alternate stylesheet"
, it’s disabled by default.
So how does a stylesheet get re-enabled? default-style
, for one! You can kind of think of it like the way label
and input
elements relate to each other by way of the for
/id
attributes:
<label for="name">
What's your name?
</label>
<input id="name">
Code language: HTML, XML (xml)
The way to indicate a preferred stylesheet is to set the meta tag’s content
attribute to the value of the stylesheet’s title
attribute:
<meta
http-equiv="default-style"
content="green">
<link
rel="alternate stylesheet"
title="green"
href="green.css">
<link
rel="alternate stylesheet"
title="red"
href="red.css">
Code language: HTML, XML (xml)
If you’d like to try it out for yourself, I built a little demo.
As you can see, this keyword only works so long as the content
attribute refers to a valid title
attribute value. How often do you reckon that happens?
Value | Pages | Valid |
---|---|---|
text/css | 176 | ❌ |
au normal contrast | 109 | ✅ |
styles_portal | 54 | ✅ |
ie=edge | 53 | ❌ |
default | 36 | ✅ |
text/javascript | 26 | ❌ |
main_style | 23 | ✅ |
text/html;charset=utf-8 | 14 | ❌ |
style.css | 13 | ❌ |
toppage | 8 | ✅ |
(HTTP Archive, June 2023. View query.)
By my count, 31% of the values set in the default-style
tag are invalid. For example, take the most popular value: text/css
. That’s a perfectly valid CSS content type, but I highly doubt someone set the title
attribute of their stylesheet to it.
The next one that jumps out at me is ie=edge
. Look familiar? That’s the top value of the x-ua-compatible
pragma. Not valid here.
<meta
http-equiv="default-style"
content="the document's preferred stylesheet">
Code language: HTML, XML (xml)
Two sites took the spec a bit too literally 🤣
Update: this code seems to have been lifted directly from W3Schools!
There are a couple of other content types on the list: text/javascript
and text/html;charset=utf-8
. Why would anyone set the title
attribute of a stylesheet to the MIME types for JavaScript and HTML? Nope, not valid either.
The last one that caught my eye is style.css
. It seems they mistakenly set the value to the href
of the stylesheet, not the title
. So close. The intent was there, but not valid.
This might be unique to Chrome’s implementation, but when I test this out, I see a flash of unstyled content. The page renders with the default styles (black text), then the preferred styles (red text) kick in shortly after. It’s not a great user experience.
This is a tiny demo, so I can only imagine the flash being even more jarring on sites that use this for real.
So, is it worth it? I don’t think so. I just don’t see enough value added by this feature that you couldn’t get with modern CSS anyway. For example, MDN recommends using @media
features instead.
Use modern CSS instead of default-style
for alternative stylesheets.
set-cookie
The last value of interest in this list is set-cookie
. It’s standard—it’s in the spec—but it’s non-conforming. A mere 317 sites use it.
The spec doesn’t have too much to say about it:
This pragma is non-conforming and has no effect.
User agents are required to ignore this pragma.
That’s it.
If you need to set cookies, only use the HTTP response header.
Chromium behavior
Šime Vidas responded to a related tweet of mine about standard http-equiv
keywords, asking what non-standard keywords browsers do support.
To answer to Šime’s question, I had to trawl through the Chromium source. Here are the implemented keywords that I found:
Keyword | Supported | Standard |
---|---|---|
default-style | ✅ | ✅ |
refresh | ✅ | ✅ |
set-cookie | ❌ | ✅ |
content-language | ✅ | ✅ |
x-dns-prefetch-control | ✅ | ❌ |
x-frame-options | ❌ | ❌ |
accept-ch | ✅ | ❌ |
delegate-ch | ✅ | ❌ |
content-security-policy | ✅ | ✅ |
content-security-policy-report-only | ❌ | ❌ |
origin-trial | ✅ | ❌ |
content-type (source) | ✅ | ✅ |
http-equiv
keywords implemented by Chromium browsers and their levels of support and standardization. (Source)So the supported, non-standard keywords are: x-dns-prefetch-control
, accept-ch
, delegate-ch
, and origin-trial
.
It’s interesting to see that some keywords are implemented, but only to warn developers when found:
set-cookie
triggers an errorx-frame-options
triggers an errorcontent-security-policy-report-only
logs a friendlier message
Chromium is not the only engine, and other browsers may handle http-equiv
keywords differently. If you’d like to contribute keyword support for other browsers, please reach out in the comments, and I’d be happy to include it here.
Cheatsheet
If you take away one thing from this post, have it be this cheatsheet with my condensed recommendations for each keyword. You can refer to this list if you’re ever unsure whether you need a given http-equiv
meta tag.
Keyword | Recommendation |
---|---|
accept-ch | ❌ Use the Accept-CH HTTP header instead |
cache-control | ❌ Use the Cache-Control HTTP header instead |
cleartype | ❌ You don’t need it |
content-language | ❌ Use the lang attribute instead |
content-security-policy | ❌ Use the Content-Security-Policy HTTP header instead |
content-security-policy-report-only | ❌ Use the Content-Security-Policy-Report-Only HTTP header instead |
content-script-type | ❌ You don’t need it |
content-style-type | ❌ You don’t need it |
content-type | ❌ Use the Content-Type HTTP header instead, or the charset meta tag in the first 1024 bytes |
default-style | ❌ Use modern CSS instead |
delegate-ch | ❌ Use the Delegate-CH HTTP header instead |
etag | ❌ Use the ETag HTTP header instead |
expires | ❌ Use the Expires HTTP header instead |
imagetoolbar | ❌ You don’t need it |
last-modified | ❌ Use the Last-Modified HTTP header instead |
msthemecompatible | ❌ You don’t need it |
origin-trial | ✅ Prefer the HTTP header if you can, otherwise the meta tag is fine |
page-enter | ❌ You don’t need it |
pragma | ❌ Use the Cache-Control HTTP header instead |
refresh | ❌ Use HTTP 3xx for redirects ✅ Use it for reloads as a noscript fallback |
set-cookie | ❌ Use the Set-Cookie HTTP header instead |
x-dns-prefetch-control | ✅ Use it if you have legitimate security or performance concerns |
x-frame-options | ❌ Use the Content-Security-Policy HTTP header instead |
x-ua-compatible | ❌ You don’t need it |
x-wix-application-instance-id | ❌ Use generator meta tags instead |
x-wix-meta-site-id | ❌ Use generator meta tags instead |
x-wix-published-version | ❌ Use generator meta tags instead |
http-equiv
keywords explored in this post and my recommended actions.If you don’t see the keyword you’re looking for in this list, chances are you’re not gonna need it. You’re almost always better off setting the HTTP header directly where possible. But just to be sure, test it out in a modern browser. You can also check the HTML spec—it’s a rapidly evolving living standard—or your favorite web developer documentation site for more info.
Based on all of my reading of the spec, analysis of the data, and interpretation of the Chromium source code, it’s clear to me that there’s a lot of unnecessary usage of the http-equiv
meta tag. I hope you’re convinced that you probably don’t need most of these tags anymore, and you can use this new knowledge to write cleaner, more modern HTML.
Please reach out to me in the comments if there’s anything in this post that I can improve. I’m eager to continue building my understanding of how this all works and I’d be happy to update this post accordingly.
Appendix: Methodology
For all queries, I used the June 2023 crawl of the public HTTP Archive dataset. The queries do not distinguish between client type or root/secondary pages. For example, if http-equiv
is used only on a site’s mobile secondary page, I count that site as using http-equiv
. If a site uses it on all four combinations of desktop/mobile and root/secondary pages, the site is counted once towards the overall stats.
Popular website builders like the WordPress CMS make up about a third of the dataset and have a disproportionate effect on the stats. This is ok, as I’m trying to measure adoption across the whole web, regardless of whether the site owner added the tags themselves or their CMS did it.
Warning: these queries process between 6 and 14 TB each. Run at your own expense.
Querying http-equiv
adoption
Show query
WITH meta AS (
SELECT
root_page,
LOWER(JSON_VALUE(meta, '$.http-equiv')) AS http_equiv
FROM
`httparchive.all.pages`
LEFT JOIN
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01'
)
SELECT
COUNT(DISTINCT IF(http_equiv IS NOT NULL, root_page, NULL)) AS http_equiv,
COUNT(DISTINCT root_page) AS total,
COUNT(DISTINCT IF(http_equiv IS NOT NULL, root_page, NULL)) / COUNT(DISTINCT root_page) AS pct
FROM
meta
Code language: SQL (Structured Query Language) (sql)
Querying the top 100 http-equiv
values
Show query
CREATE TEMP FUNCTION IS_VALID(value STRING) RETURNS BOOL AS (
value IN (
'content-language',
'content-type',
'default-style',
'refresh',
'set-cookie',
'x-ua-compatible',
'content-security-policy'
)
);
CREATE TEMP FUNCTION IS_CONFORMING(value STRING) RETURNS BOOL AS (
value IN (
'content-type',
'default-style',
'refresh',
'x-ua-compatible',
'content-security-policy'
)
);
WITH meta AS (
SELECT
root_page,
LOWER(JSON_VALUE(meta, '$.http-equiv')) AS http_equiv
FROM
`httparchive.all.pages`,
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01'
)
SELECT
ROW_NUMBER() OVER (ORDER BY COUNT(DISTINCT root_page) DESC) AS rank,
http_equiv AS value,
COUNT(DISTINCT root_page) AS sites,
IF(IS_VALID(http_equiv), '✅', '❌') AS valid,
IF(IS_CONFORMING(http_equiv), '✅', '❌') AS conforming
FROM
meta
WHERE
http_equiv IS NOT NULL
GROUP BY
http_equiv
ORDER BY
sites DESC
LIMIT
100
Code language: SQL (Structured Query Language) (sql)
Querying x-ua-compatible
usage
Show query
WITH meta AS (
SELECT
root_page,
LOWER(JSON_VALUE(meta, '$.http-equiv')) AS http_equiv,
REGEXP_REPLACE(LOWER(JSON_VALUE(meta, '$.content')), r'\s', '') AS content
FROM
`httparchive.all.pages`,
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01'
)
SELECT
content,
COUNT(DISTINCT root_page) AS sites
FROM
meta
WHERE
http_equiv = 'x-ua-compatible'
GROUP BY
content
ORDER BY
sites DESC
Code language: SQL (Structured Query Language) (sql)
Querying content-type
usage
Show query
CREATE TEMP FUNCTION IS_VALID(header STRING, charset STRING, http_equiv STRING) RETURNS STRING AS (
CASE
WHEN charset IS NOT NULL AND http_equiv IS NOT NULL THEN '❌'
WHEN header IS NULL AND charset IS NULL AND http_equiv IS NULL THEN '❌'
WHEN header IS NOT NULL AND charset IS NULL and http_equiv IS NULL THEN '✅'
WHEN header IS NULL AND charset IS NOT NULL and http_equiv IS NULL THEN '✅'
WHEN header IS NULL AND charset IS NULL and http_equiv IS NOT NULL THEN '✅'
ELSE '❓'
END
);
WITH all_sites AS (
SELECT
rank,
root_page,
page
FROM
`httparchive.all.pages`
WHERE
date = '2023-06-01'
),
meta_charset AS (
SELECT
page,
LOWER(JSON_VALUE(meta, '$.charset')) AS charset
FROM
`httparchive.all.pages`,
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01' AND
LOWER(JSON_VALUE(meta, '$.charset')) IS NOT NULL
),
meta_content_type AS (
SELECT
page,
LOWER(JSON_VALUE(meta, '$.content')) AS content_type
FROM
`httparchive.all.pages`,
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01' AND
LOWER(JSON_VALUE(meta, '$.http-equiv')) = 'content-type' AND
LOWER(JSON_VALUE(meta, '$.content')) IS NOT NULL
),
header AS (
SELECT
page,
LOWER(REGEXP_EXTRACT(header.value, r'(?i)charset=([^;\s]*)')) AS http_content_type
FROM
`httparchive.all.requests`,
UNNEST(response_headers) AS header
WHERE
date = '2023-06-01' AND
is_main_document AND
LOWER(header.name) = 'content-type' AND
REGEXP_CONTAINS(header.value, r'(?i)charset=([^;\s]*)')
)
SELECT
IF(http_content_type IS NOT NULL, '✔️', '') AS has_http_header,
IF(charset IS NOT NULL, '✔️', '') AS has_meta_charset,
IF(content_type IS NOT NULL, '✔️', '') AS has_http_equiv,
COUNT(DISTINCT root_page) AS sites,
IS_VALID(http_content_type, charset, content_type) AS valid
FROM
all_sites
LEFT JOIN
meta_charset
USING
(page)
FULL OUTER JOIN
meta_content_type
USING
(page)
FULL OUTER JOIN
header
USING
(page)
GROUP BY
has_http_header,
has_meta_charset,
has_http_equiv,
valid
ORDER BY
has_http_header DESC,
has_meta_charset DESC,
has_http_equiv DESC
Code language: SQL (Structured Query Language) (sql)
Querying Wix adoption
Show query
SELECT
COUNT(DISTINCT root_page) AS wix_sites
FROM
`httparchive.all.pages`,
UNNEST(technologies) AS t
WHERE
date = '2023-06-01' AND
t.technology = 'Wix'
Code language: SQL (Structured Query Language) (sql)
Querying content-language
usage
Show query
CREATE TEMP FUNCTION IS_VALID(header STRING, lang STRING, http_equiv STRING) RETURNS STRING AS (
CASE
WHEN lang IS NOT NULL AND http_equiv IS NULL THEN '✅'
WHEN lang IS NULL THEN '❌'
ELSE '❓'
END
);
WITH all_sites AS (
SELECT
root_page,
page
FROM
`httparchive.all.pages`
WHERE
date = '2023-06-01'
),
html_lang AS (
SELECT
page,
LOWER(REGEXP_EXTRACT(response_body, r'(?i)<html[^>]*lang=[\'"]?([^\s\'"])')) AS lang
FROM
`httparchive.all.requests`
WHERE
date = '2023-06-01' AND
is_main_document AND
REGEXP_CONTAINS(response_body, r'(?i)<html[^>]*lang=[\'"]?([^\s\'"])')
),
meta_content_language AS (
SELECT
page,
LOWER(JSON_VALUE(meta, '$.content')) AS content_language
FROM
`httparchive.all.pages`,
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01' AND
LOWER(JSON_VALUE(meta, '$.http-equiv')) = 'content-language' AND
LOWER(JSON_VALUE(meta, '$.content')) IS NOT NULL
),
header AS (
SELECT
page,
LOWER(header.value) AS http_content_language
FROM
`httparchive.all.requests`,
UNNEST(response_headers) AS header
WHERE
date = '2023-06-01' AND
is_main_document AND
LOWER(header.name) = 'content-language'
)
SELECT
IF(http_content_language IS NOT NULL, '✔️', '') AS has_http_header,
IF(lang IS NOT NULL, '✔️', '') AS has_html_lang,
IF(content_language IS NOT NULL, '✔️', '') AS has_http_equiv,
COUNT(DISTINCT root_page) AS sites,
IS_VALID(http_content_language, lang, content_language) AS valid
FROM
all_sites
LEFT JOIN
html_lang
USING
(page)
FULL OUTER JOIN
meta_content_language
USING
(page)
FULL OUTER JOIN
header
USING
(page)
GROUP BY
has_http_header,
has_html_lang,
has_http_equiv,
valid
ORDER BY
has_http_header DESC,
has_html_lang DESC,
has_http_equiv DESC
Code language: SQL (Structured Query Language) (sql)
Querying content-security-policy-report-only
usage
Show query
WITH meta AS (
SELECT
rank,
page,
LOWER(JSON_VALUE(meta, '$.http-equiv')) AS http_equiv,
REGEXP_REPLACE(LOWER(JSON_VALUE(meta, '$.content')), r'\s', '') AS content
FROM
`httparchive.all.pages`,
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01'
)
SELECT
rank,
page,
content
FROM
meta
WHERE
http_equiv = 'content-security-policy-report-only'
ORDER BY
rank
Code language: SQL (Structured Query Language) (sql)
Querying refresh
usage
Show query
WITH meta AS (
SELECT
root_page,
LOWER(JSON_VALUE(meta, '$.http-equiv')) AS http_equiv,
REGEXP_REPLACE(LOWER(JSON_VALUE(meta, '$.content')), r'\s', '') AS content
FROM
`httparchive.all.pages`,
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01'
)
SELECT
content,
COUNT(DISTINCT root_page) AS sites
FROM
meta
WHERE
http_equiv = 'refresh'
GROUP BY
content
ORDER BY
sites DESC
Code language: SQL (Structured Query Language) (sql)
Querying default-style
usage
Show query
WITH meta AS (
SELECT
rank,
root_page,
LOWER(JSON_VALUE(meta, '$.http-equiv')) AS http_equiv,
REGEXP_REPLACE(LOWER(JSON_VALUE(meta, '$.content')), r';\s+', ';') AS content
FROM
`httparchive.all.pages`,
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta
WHERE
date = '2023-06-01'
)
SELECT DISTINCT
rank,
root_page,
content
FROM
meta
WHERE
http_equiv = 'default-style'
ORDER BY
rank
Code language: SQL (Structured Query Language) (sql)
Querying Squarespace accept-ch
adoption
Show query
WITH accept_ch AS (
SELECT
root_page
FROM
`httparchive.scratchspace.http_equiv`
WHERE
http_equiv = 'accept-ch'
),
ss AS (
SELECT
root_page
FROM
`httparchive.all.pages`,
UNNEST(technologies) AS t
WHERE
date = '2023-06-01' AND
t.technology = 'Squarespace'
)
SELECT
COUNT(DISTINCT IF(ss.root_page IS NOT NULL, root_page, NULL)) / COUNT(DISTINCT root_page) AS pct_ss
FROM
accept_ch
LEFT JOIN
ss
USING
(root_page)
Code language: SQL (Structured Query Language) (sql)
Querying accept-ch
usage
Show query
SELECT
TRIM(directive) AS directive,
COUNT(DISTINCT root_page) AS sites
FROM
`httparchive.scratchspace.http_equiv`,
UNNEST(SPLIT(content)) AS directive
WHERE
http_equiv = 'accept-ch'
GROUP BY
directive
ORDER BY
sites DESC
Code language: SQL (Structured Query Language) (sql)
Querying x-dns-prefetch-control
usage
Show query
SELECT
content,
COUNT(DISTINCT root_page) AS sites
FROM
`httparchive.scratchspace.http_equiv`
WHERE
http_equiv = 'x-dns-prefetch-control'
GROUP BY
content
ORDER BY
sites DESC
Code language: SQL (Structured Query Language) (sql)
Leave a Reply