python – bs4: How to wrap an incomplete html code?

Question:

Hi, I came across incomplete html codes where the "html" and "body" tags are missing.

Here's the code I implemented:

import bs4

content='''
<head>
 <title>
  my page
 </title>
</head>
  <table border="0" cellpadding="0" cellspacing="0">
   <tr>
    <td>
     <p>
      <img alt="Brastra.gif (4376 bytes)" height="82" src="../../DNN/Brastra.gif"/>
     </p>
    </td>
    <td>
     <p>
      <strong>
       Titulo 1
       <br/>
       Titulo 2
       <br/>
       Titulo 3
      </strong>
     </p>
    </td>
   </tr>
  </table>
 <small>
  <strong>
   <a href="http://example.com/">
    Link.
   </a>
  </strong>
 </small>
<p>
  <a href="http://example.com/">I linked to <i>example.com</i></a>
</p>
<p>#1</p>
<p>#2</p>
'''

soup = bs4.Beautifulsoup(content, 'html.parser')

I tried the excerpt below which has an error.

tag = soup.new_tag('html')
tag.wrap(soup)

ValueError: Cannot replace one element with another when theelement to
be replaced is not part of a tree.

E tentei este outro que mistura a ordem das tags:

for item in soup.find_all():
    tag.append(item.extract())
soup = tag

<body>
 <head>
 </head>
 <title>
  my page
 </title>
 <div>
 </div>
 <center>
 </center>
 <table border="0" cellpadding="0" cellspacing="0">
 </table>
 <tr>
 </tr>
 <td>
 </td>

How can I solve my problem with bs4, to wrap the code with 'body' and 'html' tags?

Answer:

For this you will need the html5lib parser.

pip install html5lib

I tried it on my console and this was the result:

In [2]:import bs4

In [3]:content='''
<head>
 <title>
  my page
 </title>
</head>
  <table border="0" cellpadding="0" cellspacing="0">
   <tr>
    <td>
     <p>
      <img alt="Brastra.gif (4376 bytes)" height="82" src="../../DNN/Brastra.gif"/>
     </p>
    </td>
    <td>
     <p>
      <strong>
       Titulo 1
       <br/>
       Titulo 2
       <br/>
       Titulo 3
      </strong>
     </p>
    </td>
   </tr>
  </table>
 <small>
  <strong>
   <a href="http://example.com/">
    Link.
   </a>
  </strong>
 </small>
<p>
  <a href="http://example.com/">I linked to <i>example.com</i></a>
</p>
<p>#1</p>
<p>#2</p>
'''

In [4]: soup = bs4.Beautifulsoup(content, 'html5lib')

In [5]: soup
Out[5]: 
<html><head>
 <title>
  my page
 </title>
</head>
  <body><table border="0" cellpadding="0" cellspacing="0">
   <tbody><tr>
    <td>
     <p>
      <img alt="Brastra.gif (4376 bytes)" height="82" src="../../DNN/Brastra.gif"/>
     </p>
    </td>
    <td>
     <p>
      <strong>
       Titulo 1
       <br/>
       Titulo 2
       <br/>
       Titulo 3
      </strong>
     </p>
    </td>
   </tr>
  </tbody></table>
 <small>
  <strong>
   <a href="http://example.com/">
    Link.
   </a>
  </strong>
 </small>
<p>
  <a href="http://example.com/">I linked to <i>example.com</i></a>
</p>
<p>#1</p>
<p>#2</p>
</body></html>
Scroll to Top
AllEscort