Html tag writing rule

Question:

The question sounds strange, I'll explain everything now.

Look, you can write tags that are defined by the standard for example: <html> , <head> , <body> , <div> , <table> , <span> , etc.

But you can, in principle, write your own tags that are not defined by the standard, for example: <screen> , <display> , <place> , etc.

However, we can use not any symbols in "our" tags, for example: <0car> , <-car> , <машина> – all this will be processed by the browser like this: "<твой_тег>"<!--твой_тег-->

But back to the example where the tags will be accepted correctly, you can also write <screen100> or <car-> or <car-моя> – there are no errors, the browser displays them correctly in its DOM tree.

I would like to find rules that clarify the behavior of my examples.


From the resources where I was looking for this https://html.spec.whatwg.org

However, here it is not clear why the second paragraph does not fully describe what the first character should be, and which subsequent ones https://html.spec.whatwg.org/multipage/syntax.html#start-tags But in the second paragraph we see a link and it throws us over to the paragraph that reads:

Tags contain a tag name, giving the element's name. HTML elements all have names that only use ASCII alphanumerics. In the HTML syntax, tag names, even those for foreign elements, may be written with any mix of lower- and uppercase letters that, when converted to all-lowercase, matches the element's tag name; tag names are case-insensitive.


Tags contain a tag name giving the element a name. All HTML elements are named using only ASCII alphanumeric characters. In HTML syntax, tag names, even those used for foreign elements, can be written in any combination of lowercase and uppercase letters that, when converted to all lowercase letters, match the element's tag name; Tag names are not case sensitive.

But as you can remember, I gave examples that not only contain ASCII alphanumerics , how to understand this?

Answer:

The specs say the following:

  • The first character of the tag name must be an ASCII letter , otherwise an error. The error behavior is also described. Therefore, the tag <машина> turns into a pumpkin for you.
  • The following characters in the tag name can be any other than those listed, including whitespace, slash / , exclamation mark, and > . Naturally, no other special restrictions are imposed on all subsequent characters. Therefore, you can see that the <car-моя> processed without errors.

Further in the definition of an HTML element, we read that it can only consist of ASCII letters and numbers. And in the same paragraph we read that there are also other elements that, as can be understood from the text, obey their own rules. Quote:

In the HTML syntax, tag names, even those for foreign elements , may be written with any mix of lower- and uppercase letters that, when converted to all-lowercase, matches the element's tag name; tag names are case-insensitive.

Note the use of may , which means may, not shall , should. One type of other element is custom elements. For them, you can find a more specific description of the restrictions on the symbols used :

 "-" | "." | [0-9] | "_" | [a-z] | #xB7 | [#xC0-#xD6] | [#xD8-#xF6] 
| [#xF8-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x203F-#x2040] 
| [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] 
| [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

The Cyrillic alphabet is located in the range U + 0400 – U + 04FF, which, as you can easily see, is included in the allowed ones.

Therefore, yes: you can use Cyrillic in tag names everywhere except for the first character. The first character of a tag name must be in the range [az] , strictly.

To avoid any doubts at all, the documentation gives an example :

Apart from these restrictions, a large variety of names is allowed, to give maximum flexibility for use cases like <math-α> or <emotion-😍> .

This quote clearly and unambiguously states that you can use tags like <math-α> or <emotion-😍> .

Scroll to Top