c# – Format String after it is converted from HTML


I made a code that transforms all the HTML into a String, however, when I do that the code is coming like this:

<div class=\"page\">\r\n<div class=\"bloco\">\r\n   <table id=\"canhoto\">\r\n

The characters \r\n I can already remove, but now I need to find a way to remove those bars that are, for example, in the class of the div, I'd like to leave it like this: class="page", but they're all like this: class=\"page\", I would like to somehow treat them so that it doesn't look like that, and stays the right way.

string HTMLemString = RenderizaHtmlComoString("~/Views/Item/Item.cshtml", id);
        var regex = new Regex("(\\<script(.+?)\\</script\\>)|(\\<style(.+?)\\</style\\>)|(<link[^>]*>)",
            RegexOptions.Singleline | RegexOptions.IgnoreCase);
        HTMLemString = regex.Replace(HTMLemString, "");
        HTMLemString = HTMLemString.Replace("\0", "");

The part I deal with the code is this.

string CSSdocumento = CSSemString();
        Byte[] bytes;

        using (var ms = new MemoryStream())
            using (var doc = new Document())
                using (var writer = PdfWriter.GetInstance(doc, ms))
                    var HTMLconversão = @HTMLemString;
                    var CSSconversão = @CSSdocumento;

                    using (var msCss = new MemoryStream(System.Text.ASCIIEncoding.UTF8.GetBytes(CSSconversão)))
                        using (var msHtml = new MemoryStream(System.Text.ASCIIEncoding.UTF8.GetBytes(HTMLconversão)))
                            iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);


            bytes = ms.ToArray();

        var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "teste.pdf");
        System.IO.File.WriteAllBytes(testFile, bytes);

And above the code where I generate the PDF.



from what I saw there seems to be a bug with this… there is a solution in this answer:

 Document document = new Document();
        PdfWriter.GetInstance(document, new FileStream("c:\\my.pdf", FileMode.Create));
        WebClient wc = new WebClient();
        string htmlText = wc.DownloadString("http://localhost:59500/my.html");
        List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null);
        for (int k = 0; k < htmlarraylist.Count; k++)

Scroll to Top