javascript – XSS attacks: What should be used instead of innerHTML / insertAdjacentHTML?

Question:

In what cases can you safely add HTML as a line, and when should you avoid this approach? And then what to replace them with?

Answer:

Historically, template engines have been used that work with the markup, adding some data to it, and then inserting it through innerHTML. This was done due to the fact that parsing html by the browser worked faster than creating elements from js. I can’t say for sure about the processing speed in current browsers, but modern frameworks (for example, React and Angular) use createElement . Well, if you think about it – what should be faster – parse the markup, and then create the elements, or immediately create the elements? If the obvious answer turns out to be wrong, then this is a clear field for browser optimization.

In any case, first of all, it is worth considering not the speed of work, but the correctness.

Things to think about when working with markup:

  • Is this data supposed to contain markup?
  • The data is trusted or can be obtained from users (including via url, from external sources, etc.)
  • How will changing the markup affect the correctness of the page in terms of scripts?
  • Are we sure that parsing the markup will result in a similar dom tree?

Markup expected?

In most cases, the answer to this question is no.

What will happen to the text if it is displayed as markup? With the usual word – nothing. But if special characters suddenly come across, they will disappear. for example, we want to display the inequality a<b , but the browser will eat <b as the beginning of the tag and the resulting result will be incorrect.

Предположим, что a<b, тогда ...

And this is even with neutral texts, the purpose of which was not to harm the site.

User data

What the user enters should not turn into html markup without additional processing. There are two places where this should not be – on the server when composing the page code, and on the client when inserting data into the markup.

In the case of a server, the scope for injection is huge – you can simply write <script>alert(1)</script> or close a couple of extra tags and break the markup of the entire page, or try to comment out part of the page, or simply position the link to the phishing site over the logo using css , leading to the main site.

If the data is inserted by the script into innerHTML , then the markup cannot go beyond the scope of the corresponding element (however, nobody canceled the <style> tag), and the script is a little more difficult to place: <img src="/no" onerror="alert(1)"> . But essentially all the same attacks remain possible.

document.querySelector('main').innerHTML = '<img src="/no" onerror="console.log(1)">'
<main></main>

Breaking scripts?

What happens when we change the markup via innerHTML ? All new markup is parsed anew, new html elements are created – even for the part that has not changed. It is immediately obvious that this is inefficient, but there are bigger problems. If the scripts hung handlers on some elements, then after updating the markup, the handlers will remain hanging on old elements that are no longer in the dom tree. Therefore, when adding markup, you should choose insertAdjacentHTML innerHTML += :

document.getElementById('ih').addEventListener('click', e => {
  e.target.parentElement.innerHTML += "<i></i>"
})

document.getElementById('ia').addEventListener('click', e => {
  e.target.parentElement.insertAdjacentHTML('beforeend', "<i></i>")
})

document.getElementById('ac').addEventListener('click', e => {
  e.target.parentElement.appendChild(document.createElement('i'))
})
i {
  display: inline-block;
  background: silver;
  height: 1em;
  width: 1em;
  border-radius: 50%;
  margin-left: 4px;
  vertical-align: middle;
}
<p><button id="ih">innerHTML</button></p>
<p><button id="ia">insertAdjacentHTML</button></p>
<p><button id="ac">appendChild</button></p>

Breaking the markup?

Any manipulation of innerHTML or a dom element does not go beyond that element. However, if we set up something tricky that the resulting tree is not valid, then after re-parsing, we can be very surprised at the result:

var oldP = document.querySelector('main p')

var newP = document.createElement('p')
newP.textContent = "456"
oldP.appendChild(newP)

document.querySelector('button').addEventListener('click', e => {
  var main = document.querySelector('main')

  console.log(main.innerHTML)
  main.innerHTML = main.innerHTML
  console.log(main.innerHTML)
})
p p { color: blue; }
p + p { color: red; }
<main><p>123</p></main>
<button>Ooops!</button>

When to use markup?

  • We know that we are adding markup, and from a trusted source.
  • If there is a reason to insert data into the markup, then it should be appropriately escaped to ensure that it remains a string and not become markup with elements.
  • If we want to add markup to an element, we should prefer adding markup rather than a complete overwrite.

textContent and innerText

The textContent property makes it easy to insert arbitrary text into an element and let the browser take care of the escaping. If we just need to insert text, whatever the user writes there, this is ideal:

document.querySelector('main').textContent = '<img src="/no" onerror="console.log(1)">'
<main></main>

As for the innerText property, it should almost never be used. When written, it behaves the same as textContent , but in some cases ten times slower. When reading, it does not return the entire text, but only the visible one – this can be used if necessary, but this is extremely rarely required. It's still included in the standard.

var main = document.querySelector('main')

console.log(main.textContent)
console.log(main.innerText)
<main>
  <style>p { color: blue; }</style>
  <p>123</p>
  <p hidden>123</p>
  И немного
  текста
</main>

When to work with elements?

In my opinion, almost always, except in rare cases when you need to work with markup. The browser itself takes care of escaping when using textContent , and when creating and inserting elements, they are inserted exactly where we want and do not spoil the surrounding content. Here it is almost impossible to screw up with an accidental closing of the tag in the wrong place.

What is almost certainly a bug?

  • Putting user or untrusted data into the markup.
  • Reading something from textContent and then writing it to innerHTML – here the rule is simple – from where you read it, you write it there. If the transfer in any direction turns out to be a side effect (and not an intentional understood action), then it at least breaks the display of data, and at the maximum it creates a vulnerability on the site. And in general, all manipulations over the markup as a string must be done carefully.
  • Using += on innerHTML .

It is also undesirable to work too often with dom-elements that are in the document. Inserting an element into a document after you've finished working on it can provide a significant performance improvement. If you need to insert a group of elements, then you can use document fragment.

I just need to finish my page

It happens that the code is not written for the site, but in order to bring a completely alien page to a printable form. Which rules should be followed and which should not?

This can be easier on performance and script breaking, but wrapping text in markup still potentially breaks data in unexpected places and should still be avoided. Scripts from <script> tags will not be executed, but inline event handlers can sometimes surprise, although they usually do not exist on modern sites.

Scroll to Top