Can I programatically copy my html selection

621 views Asked by At

I have a html document that after being parsed contains only formatted text.I was wondering if it is possible to get its text like I would do if I was mouse-selecting it + copy + paste in new Text Document?

I know that this is possible in Microsoft.Office.Interop where I have .ActiveSelection property that selects the content of the open Word.

I need to find a way to load the html somehowe(maybe in a browser object) and then copy all of its content and assign it to a string.

var doc = new HtmlAgilityPack.HtmlDocument();
var documetText = File.ReadAllText(myhtmlfile.html, Encoding.GetEncoding(1251));
documetText = this.PerformSomeChangesOverDocument(documetText);
doc.LoadHtml(documetText);
var stringWriter = new StringWriter();
AgilityPackEntities.AgilityPack.ConvertTo(doc.DocumentNode, stringWriter);
stringWriter.Flush();
var titleNode = doc.DocumentNode.SelectNodes("//title");
if (titleNode != null)
{
    var titleToBeRemoved = titleNode[0].InnerText;
    document.DocumentContent = stringWriter.ToString().Replace(titleToBeRemoved, string.Empty);
}
else
{
    document.DocumentContent = stringWriter.ToString();
}

and then I return the document object.The problem is that the string is not always formatted as I want it to be

1

There are 1 answers

11
sealz On

You should be able to just use StreamReader and as you read each line just write it out using StreamWriter

Something like this will readuntil the end of your file and save it to a new one. If you need to do extra logic in the file I have a comment inserted to let you know where to do all that.

private void button4_Click(object sender, EventArgs e)
        {
            System.IO.StreamWriter writer = new System.IO.StreamWriter("C:\\XXX\\XXX\\XXX\\test2.html");
            String line;
            using (System.IO.StreamReader reader = new System.IO.StreamReader("C:\\XXX\\XXX\\XXX\\test.html"))
            {
                //Do until the end
                while ((line = reader.ReadLine()) != null) {
                //You can insert extra logic here if you need to omit lines or change them
                writer.WriteLine(line);
                }
                //All done, close the reader
                reader.Close();
            }
            //Flush and close the writer
            writer.Flush();
            writer.Close();

        }

You can also save it to a string then just do whatever you want to with it. You can use new lines to keep the same format.

EDIT The below will tke into account your tags

  private void button4_Click(object sender, EventArgs e)
        {
            String line;
            String filetext = null;
            int count = 0;
            using (System.IO.StreamReader reader = new System.IO.StreamReader("C:\\XXXX\\XXXX\\XXXX\\test.html"))
            {
              while ((line = reader.ReadLine()) != null) { 
                if (count == 0) {
                    //No newline since its start
                    if (line.StartsWith("<")) {
                        //skip this it is formatted stuff
                    }
                    else {
                    filetext = filetext + line; 
                    }
                    }
                else {
                    if (line.StartsWith("<"))
                    {
                        //skip this it is formatted stuff
                    }
                    else
                    {
                        filetext = filetext + "\n" + line;
                    }
                }
                count++;                           
           }                
            Trace.WriteLine(filetext);                  
            reader.Close();
            }          
        }