Im attempting to remove a line of Javascript from the Dom of IE 11 using A BHO. (Internet Explorer Add-ON)
This is so badly documented its hard to see the best way forward.
Ive managed to write the BHO in C++ ATL/COM and its working fine but I cant quite work out the best way to actually remove / replace text from the body then inject the changes back into the page.
And being honest I havnt got the time to read this 1000 page out of date COM book :-).
This is what I have currently for the OnDocumentComplete event:
void STDMETHODCALLTYPE CMyFooBHO::OnDocumentComplete(IDispatch *pDisp, VARIANT *pvarURL)
{
BSTR bstrURL = pvarURL->bstrVal;
if (_wcsicmp(bstrURL, ABOUT_BLANK) == 0)
{
return;
}
HRESULT hr = S_OK;
// Query for the IWebBrowser2 interface.
CComQIPtr<IWebBrowser2> spTempWebBrowser = pDisp;
// Is this event associated with the top-level browser?
if (spTempWebBrowser && m_spWebBrowser && m_spWebBrowser.IsEqualObject(spTempWebBrowser))
{
// Get the current document object from browser.
CComPtr<IDispatch> spDispDoc;
hr = m_spWebBrowser->get_Document(&spDispDoc);
if (SUCCEEDED(hr))
{
// Verify that what we get is a pointer to a IHTMLDocument2 interface.
// To be sure, let's query for the IHTMLDocument2 interface (through smart pointers).
CComQIPtr<IHTMLDocument2, &IID_IHTMLDocument2> spHTML;
spHTML = spDispDoc;
// Extract the source of the document if its HTML.
if (spHTML)
{
// Get the BODY object.
CComPtr<IHTMLElement> m_pBody;
hr = spHTML->get_body(&m_pBody);
if (SUCCEEDED(hr))
{
// Get the HTML text.
BSTR bstrHTMLText;
hr = m_pBody->get_outerHTML(&bstrHTMLText);
if (SUCCEEDED(hr))
{
// bstrHTMLText now contains the <body> ...whatever... </body> of the html page.
// ******** HERE ********
// What I want to do here is replace some text contained in bstrHTMLText
// i.e. Replace "ABC" with "DEF" if it exists in bstrHTMLText.
// Then replace the body of the original page with the edited bstrHTMLText.
// My actual goal is to remove one line of javascript.
}
}
}
}
}
}
Feel free to comment on any improvements to the already existing code.
This doesnt follow the normal (should do) way of doing it.
If no better answers are forthcoming then I guess its the best answer and I will mark it as so.
I would love to hear any comments or updates to either improve, or show me a working example that is better.
This is for IE 11 and compiled using C++ ATL / COM in Visual Studio 2015.
I have tried iterating the DOM and changing it and about every other very badly documented variation.
There never seems to be an issue reading the html i.e. get_innerText get_innerHTML get_outerHTML in its various forms but put_*** never seems to work mostly. Why? nobody seems to be able to say nor give me a working example that does.
What I did find is that get_body > get_innerHTML > put_innerHTML does seem to work.
So finding this I simply wrote a function to search and replace inside a CComBSTR.
This works for me but I suppose you could take what is returned as the body inner HTML and run some other DOM manipulation code on it (not the built in stuff) if your requirements are different.
The main advantage of this way of doing things is non reliance on c**p undocumented code thats seems to work in some mystical method when MS wanted it to.
This is the test html page. Im trying to remove the "alert("Hello")" that is executed when the page finishes loading.