javascript – Convert string with HTML tags into an array

Question:

Consider the following string:

var texto = 'Esse é um texto de <span class="red"> teste </span>';

I need to transform the string into an array separating by space, that is:

var palavras = texto.split(" ");

The problem is that the text contains HTML and in this case the resulting array will be:

palavras[0] = 'Esse';
palavras[1] = 'é';
palavras[2] = 'um';
palavras[3] = 'texto';
palavras[4] = 'de';
palavras[5] = '<span';
palavras[6] = 'class="red">';
palavras[7] = 'teste';
palavras[8] = '</span>';

But I need the resulting array to be the following:

palavras[0] = 'Esse';
palavras[1] = 'é';
palavras[2] = 'um';
palavras[3] = 'texto';
palavras[4] = 'de';
palavras[5] = '<span class="red"> teste </span>';

How to do this using javascript?

Answer:

You can use DOMParser to parse HTML text. From there, just manipulate the HTML to get the elements you need:

 // parsing do trecho HTML var texto = 'Esse é um texto de <span class="red"> teste </span>'; var parser = new DOMParser(); // cria um document com html, header, body, etc var htmlDoc = parser.parseFromString(texto, "text/html"); // obter o body do HTML var body = htmlDoc.querySelector('body'); // obter o elemento span var span = body.querySelector('span'); // remover o span para que sobre só o texto body.removeChild(span); // quebrar o texto em um array var palavras = body.innerHTML.trim().split(' '); // adicionar o span no array palavras.push(span.outerHTML); console.log(palavras);

The code is very specific to the text you put in. If you have other tags in other positions, obviously the necessary adjustments must be made.


You can also use jQuery's parseHTML function. The idea is the same: parse and extract the elements you need.

var texto = 'Esse é um texto de <span class="red"> teste </span>';
var html = $.parseHTML(texto);

var palavras;
$.each(html, function (i, el) {
    if (el.nodeName === '#text') {
        palavras = el.nodeValue.trim().split(' ');
    } else if (el.nodeName === 'SPAN') {
        palavras.push(el.outerHTML);
    }
});

console.log(palavras);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

Again, the code is very specific to your case as it expects text followed by a span . Adapt for other cases if necessary.


Regex, despite being very cool, is not always the best solution, especially for parsing HTML . If there are already specific parsers for the type of data you are handling, it is preferable to use them.

Scroll to Top