Determine character index in HTML source given a DOMRange from a WebKit selection

304 views Asked by At

I'm attempting to synchronize a DOMRange (representing a user-selection from a Cocoa WebView) to the original HTML source currently rendered in that view, as a kind of Dreamweaver-split-editor:

Dreamweaver code-design splitview

My first idea was to get the DOMRange object's startContainer and offset and walk up the DOM tree from there, accumulating the overall character offset up to the body tag.

Unfortunately this task presents some problems:

  1. Clearly the document's outerHTML will differ from the original HTML source if the DOM was manipulated via Javascript or the parser needed to clean up malformed tags.
  2. I can't figure out how to get the offset of a node within its parent text node (e.g., 4 characters to target in <p>some<div>target</div>text</p>), and normalize doesn't seem to make this any easier.
  3. Trying to account for some of the problems in #1, or just going from HTML source to WebView will probably require separately parsing the HTML and then correlating the two DOM-trees.

One ray of hope is that HTML5 specifies a standard parsing algorithm for dealing with invalid HTML (which WebKit has since adopted), so in theory it should be possible to use an off-the-shelf HTML5 parser to generate the same tree as WebKit — right?

This is the most similar existing question I could find, but it's for a slightly different problem:
Getting source HTML from a WebView in Cocoa

1

There are 1 answers

0
coffeetocode On

Your problem #1 is actually not so bad; you can just turn off JS interpretation.

Look at QWebSettings::JavascriptEnabled, or just drop this in before you load any html: QWebSettings::globalSettings()->setAttribute(QWebSettings::JavascriptEnabled, false);

That should leave your DOM un-mangled by JS. Good luck!