I have a html document that after being parsed contains only formatted text.I was wondering if it is possible to get its text like I would do if I was mouse-selecting it + copy + paste in new Text Document?
I know that this is possible in Microsoft.Office.Interop where I have .ActiveSelection property that selects the content of the open Word.
I need to find a way to load the html somehowe(maybe in a browser object) and then copy all of its content and assign it to a string.
var doc = new HtmlAgilityPack.HtmlDocument();
var documetText = File.ReadAllText(myhtmlfile.html, Encoding.GetEncoding(1251));
documetText = this.PerformSomeChangesOverDocument(documetText);
doc.LoadHtml(documetText);
var stringWriter = new StringWriter();
AgilityPackEntities.AgilityPack.ConvertTo(doc.DocumentNode, stringWriter);
stringWriter.Flush();
var titleNode = doc.DocumentNode.SelectNodes("//title");
if (titleNode != null)
{
var titleToBeRemoved = titleNode[0].InnerText;
document.DocumentContent = stringWriter.ToString().Replace(titleToBeRemoved, string.Empty);
}
else
{
document.DocumentContent = stringWriter.ToString();
}
and then I return the document object.The problem is that the string is not always formatted as I want it to be
You should be able to just use
StreamReader
and as you read each line just write it out usingStreamWriter
Something like this will readuntil the end of your file and save it to a new one. If you need to do extra logic in the file I have a comment inserted to let you know where to do all that.
You can also save it to a string then just do whatever you want to with it. You can use new lines to keep the same format.
EDIT The below will tke into account your tags