c# – Format String after it is converted from HTML

Question:

I made a code that transforms all the HTML into a String, however, when I do that the code is coming like this:

<div class=\"page\">\r\n<div class=\"bloco\">\r\n   <table id=\"canhoto\">\r\n

The characters \r\n I can already remove, but now I need to find a way to remove those bars that are, for example, in the class of the div, I'd like to leave it like this: class="page", but they're all like this: class=\"page\", I would like to somehow treat them so that it doesn't look like that, and stays the right way.

string HTMLemString = RenderizaHtmlComoString("~/Views/Item/Item.cshtml", id);
        var regex = new Regex("(\\<script(.+?)\\</script\\>)|(\\<style(.+?)\\</style\\>)|(<link[^>]*>)",
            RegexOptions.Singleline | RegexOptions.IgnoreCase);
        HTMLemString = regex.Replace(HTMLemString, "");
        HTMLemString = HTMLemString.Replace("\0", "");

The part I deal with the code is this.

string CSSdocumento = CSSemString();
        Byte[] bytes;

        using (var ms = new MemoryStream())
        {
            using (var doc = new Document())
            {
                using (var writer = PdfWriter.GetInstance(doc, ms))
                {
                    doc.Open();
                    var HTMLconversão = @HTMLemString;
                    var CSSconversão = @CSSdocumento;


                    using (var msCss = new MemoryStream(System.Text.ASCIIEncoding.UTF8.GetBytes(CSSconversão)))
                    {
                        using (var msHtml = new MemoryStream(System.Text.ASCIIEncoding.UTF8.GetBytes(HTMLconversão)))
                        {
                            iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
                        }
                    }

                    doc.Close();
                }
            }

            bytes = ms.ToArray();
        }

        var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "teste.pdf");
        System.IO.File.WriteAllBytes(testFile, bytes);

And above the code where I generate the PDF.

Answer:

https://stackoverflow.com/questions/2822843/itextsharp-html-to-pdf

from what I saw there seems to be a bug with this… there is a solution in this answer:

 Document document = new Document();
    try
    {
        PdfWriter.GetInstance(document, new FileStream("c:\\my.pdf", FileMode.Create));
        document.Open();
        WebClient wc = new WebClient();
        string htmlText = wc.DownloadString("http://localhost:59500/my.html");
        Response.Write(htmlText);
        List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null);
        for (int k = 0; k < htmlarraylist.Count; k++)
        {
            document.Add((IElement)htmlarraylist[k]);
        }

        document.Close();
    }
    catch
    {
    }
Scroll to Top