IE Explore 11 < c++ ATL COM Browser Helper Object (Add-on) to replace text in the DOM

1.2k views Asked by At

Im attempting to remove a line of Javascript from the Dom of IE 11 using A BHO. (Internet Explorer Add-ON)

This is so badly documented its hard to see the best way forward.

Ive managed to write the BHO in C++ ATL/COM and its working fine but I cant quite work out the best way to actually remove / replace text from the body then inject the changes back into the page.

And being honest I havnt got the time to read this 1000 page out of date COM book :-).

This is what I have currently for the OnDocumentComplete event:

void STDMETHODCALLTYPE CMyFooBHO::OnDocumentComplete(IDispatch *pDisp, VARIANT *pvarURL)
{
    BSTR bstrURL = pvarURL->bstrVal;

    if (_wcsicmp(bstrURL, ABOUT_BLANK) == 0)
    {
        return;
    }

    HRESULT hr = S_OK;

    // Query for the IWebBrowser2 interface.
    CComQIPtr<IWebBrowser2> spTempWebBrowser = pDisp;

    // Is this event associated with the top-level browser?
    if (spTempWebBrowser && m_spWebBrowser && m_spWebBrowser.IsEqualObject(spTempWebBrowser))
    {
        // Get the current document object from browser.
        CComPtr<IDispatch> spDispDoc;
        hr = m_spWebBrowser->get_Document(&spDispDoc);

        if (SUCCEEDED(hr))
        {
            // Verify that what we get is a pointer to a IHTMLDocument2 interface. 
            // To be sure, let's query for the IHTMLDocument2 interface (through smart pointers).

            CComQIPtr<IHTMLDocument2, &IID_IHTMLDocument2> spHTML;
            spHTML = spDispDoc;

            // Extract the source of the document if its HTML.
            if (spHTML)
            {
                // Get the BODY object.
                CComPtr<IHTMLElement> m_pBody;
                hr = spHTML->get_body(&m_pBody);

                if (SUCCEEDED(hr))
                {
                    // Get the HTML text.
                    BSTR bstrHTMLText;
                    hr = m_pBody->get_outerHTML(&bstrHTMLText);

                    if (SUCCEEDED(hr))
                    {
                        // bstrHTMLText now contains the <body> ...whatever... </body> of the html page.

                        // ******** HERE ********

                        // What I want to do here is replace some text contained in bstrHTMLText  
                        // i.e. Replace "ABC" with "DEF" if it exists in bstrHTMLText.

                        // Then replace the body of the original page with the edited bstrHTMLText.

                        // My actual goal is to remove one line of javascript.

                    }
                }
            }
        }
    }
}

Feel free to comment on any improvements to the already existing code.

2

There are 2 answers

0
William Humphreys On BEST ANSWER

This doesnt follow the normal (should do) way of doing it.

If no better answers are forthcoming then I guess its the best answer and I will mark it as so.

I would love to hear any comments or updates to either improve, or show me a working example that is better.

This is for IE 11 and compiled using C++ ATL / COM in Visual Studio 2015.

I have tried iterating the DOM and changing it and about every other very badly documented variation.

There never seems to be an issue reading the html i.e. get_innerText get_innerHTML get_outerHTML in its various forms but put_*** never seems to work mostly. Why? nobody seems to be able to say nor give me a working example that does.

What I did find is that get_body > get_innerHTML > put_innerHTML does seem to work.

So finding this I simply wrote a function to search and replace inside a CComBSTR.

This works for me but I suppose you could take what is returned as the body inner HTML and run some other DOM manipulation code on it (not the built in stuff) if your requirements are different.

The main advantage of this way of doing things is non reliance on c**p undocumented code thats seems to work in some mystical method when MS wanted it to.


This is the test html page. Im trying to remove the "alert("Hello")" that is executed when the page finishes loading.

<!doctype html>

  <head>
    <title>Site</title>

    <meta http-equiv="cache-control" content="max-age=0" />
    <meta http-equiv="cache-control" content="no-cache" />
    <meta http-equiv="expires" content="0" />
    <meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" />
    <meta http-equiv="pragma" content="no-cache" />

  </head>

  <body>

    <div>If a dialog with hello appears then the BHO failed</div>


    <script type="text/javascript">

      window.onload = function(){
        window.document.body.onload = foo; 
      };

      function foo()
      {
          alert("hello");
      }

    </script>

  </body>
<html>

// FooBHO.h : Declaration of the CFooBHO

#pragma once
#include "resource.h"       // main symbols

#include "FooIEAddOn_i.h"

#include <shlguid.h>        // IID_IWebBrowser2, DIID_DWebBrowserEvents2, etc.

#include <exdispid.h>       // DISPID_DOCUMENTCOMPLETE, etc.

#include <mshtml.h>         // DOM interfaces

#include <string> 

#if defined(_WIN32_WCE) && !defined(_CE_DCOM) && !defined(_CE_ALLOW_SINGLE_THREADED_OBJECTS_IN_MTA)
#error "Single-threaded COM objects are not properly supported on Windows CE platform, such as the Windows Mobile platforms that do not include full DCOM support. Define _CE_ALLOW_SINGLE_THREADED_OBJECTS_IN_MTA to force ATL to support creating single-thread COM object's and allow use of it's single-threaded COM object implementations. The threading model in your rgs file was set to 'Free' as that is the only threading model supported in non DCOM Windows CE platforms."
#endif

#define DISPID_DOCUMENTRELOAD 282

using namespace ATL;

using namespace std;

// CFooBHO
class ATL_NO_VTABLE CFooBHO : public CComObjectRootEx<CComSingleThreadModel>,
                                        public CComCoClass<CFooBHO, &CLSID_FooBHO>,
                                        public IObjectWithSiteImpl<CFooBHO>,
                                        public IDispatchImpl<IFooBHO, &IID_IFooBHO, &LIBID_FooIEAddOnLib, /*wMajor =*/ 1, /*wMinor =*/ 0>,
                                        public IDispEventImpl<1, CFooBHO, &DIID_DWebBrowserEvents2, &LIBID_SHDocVw, 1, 1>
{
    public:

        CFooBHO()
        {
        }

        // The STDMETHOD macro is an ATL convention that marks the method as virtual and ensures that it has the right calling convention for the public
        // COM interface.It helps to demarcate COM interfaces from other public methods that may exist on the class.The STDMETHODIMP macro is likewise used
        // when implementing the member method.

        STDMETHOD(SetSite)(IUnknown *pUnkSite);

        DECLARE_REGISTRY_RESOURCEID(IDR_FooBHO)

        DECLARE_NOT_AGGREGATABLE(CFooBHO)

        BEGIN_COM_MAP(CFooBHO)
            COM_INTERFACE_ENTRY(IFooBHO)
            COM_INTERFACE_ENTRY(IDispatch)
            COM_INTERFACE_ENTRY(IObjectWithSite)
        END_COM_MAP()

        DECLARE_PROTECT_FINAL_CONSTRUCT()

        BEGIN_SINK_MAP(CFooBHO)
            SINK_ENTRY_EX(1, DIID_DWebBrowserEvents2, DISPID_DOCUMENTCOMPLETE, OnDocumentComplete)
        END_SINK_MAP()

        void STDMETHODCALLTYPE OnDocumentComplete(IDispatch *pDisp, VARIANT *pvarURL);

        HRESULT FinalConstruct()
        {
            return S_OK;
        }

        void FinalRelease()
        {
        }

    private:

        CComPtr<IWebBrowser2>  m_spWebBrowser;

        BOOL m_fAdvised;

        static const wchar_t* ABOUT_BLANK;

        void CFooBHO::ReplaceInCComBSTR(CComBSTR &strInput, const wstring &strOld, const wstring &strNew);
};

OBJECT_ENTRY_AUTO(__uuidof(FooBHO), CFooBHO)

// FooBHO.cpp : Implementation of CFooBHO

#include "stdafx.h"
#include "FooBHO.h"
#include "Strsafe.h"

const wchar_t* CFooBHO::ABOUT_BLANK = L"about:blank";

// The SetSite() method is where the BHO is initialized and where you would perform all the tasks that happen only 
// once. When you navigate to a URL with Internet Explorer, you should wait for a couple of events to make sure the
// required document has been completely downloaded and then initialized. Only at this point can you safely access 
// its content through the exposed object model, if any.

STDMETHODIMP CFooBHO::SetSite(IUnknown* pUnkSite)
{
    if (pUnkSite != NULL)
    {
        // Cache the pointer to IWebBrowser2.
        HRESULT hr = pUnkSite->QueryInterface(IID_IWebBrowser2, (void **)&m_spWebBrowser);

        if (SUCCEEDED(hr))
        {
            // Register to sink events from DWebBrowserEvents2.
            hr = DispEventAdvise(m_spWebBrowser);
            if (SUCCEEDED(hr))
            {
                m_fAdvised = TRUE;
            }
        }
    }
    else
    {
        // Unregister event sink.
        if (m_fAdvised)
        {
            DispEventUnadvise(m_spWebBrowser);
            m_fAdvised = FALSE;
        }

        // Release cached pointers and other resources here.
        m_spWebBrowser.Release();
    }

    // Call base class implementation.
    return IObjectWithSiteImpl<CFooBHO>::SetSite(pUnkSite);
}

void STDMETHODCALLTYPE CFooBHO::OnDocumentComplete(IDispatch *pDisp, VARIANT *pvarURL)
{
    BSTR bstrURL = pvarURL->bstrVal;

    // Test for any specific URL here. 
    // Currently we are ignoring ABOUT:BLANK but allowing everything else.

    if (_wcsicmp(bstrURL, ABOUT_BLANK) == 0)
    {
        return;
    }

    HRESULT hr = S_OK;

    // Query for the IWebBrowser2 interface.
    CComQIPtr<IWebBrowser2> spTempWebBrowser = pDisp;

    // Is this event associated with the top-level browser?
    if (spTempWebBrowser && m_spWebBrowser && m_spWebBrowser.IsEqualObject(spTempWebBrowser))
    {
        // Get the current document object from browser.
        CComPtr<IDispatch> spDispDoc;

        if (SUCCEEDED(m_spWebBrowser->get_Document(&spDispDoc)))
        {
            // Verify that what we get is a pointer to a IHTMLDocument2 interface. 
            // To be sure, let's query for the IHTMLDocument2 interface (through smart pointers).

            CComQIPtr<IHTMLDocument2, &IID_IHTMLDocument2> spHTMLDocument2 = spDispDoc;

            // Extract the source of the document if its HTML.
            if (spHTMLDocument2)
            {
                // Get the BODY object.
                CComPtr<IHTMLElement> spBody;

                if (SUCCEEDED(spHTMLDocument2->get_body(&spBody)))
                {
                    // Get the Body HTML text.
                    CComBSTR bstrBodyHTMLText;

                    if (SUCCEEDED(spBody->get_innerHTML(&bstrBodyHTMLText)))
                    {
                        ReplaceInCComBSTR(bstrBodyHTMLText, L"alert(\"hello\");", L"");

                        spBody->put_innerHTML(bstrBodyHTMLText);
                    }
                }
            }
        }
    }
}

void CFooBHO::ReplaceInCComBSTR(CComBSTR &bstrInput, const wstring &strOld, const wstring &strNew)
{
    wstring strOutput(bstrInput);

    size_t iPos = 0;
    size_t iLpos = 0;

    while ((iPos = strOutput.find(strOld, iLpos)) != string::npos)
    {
        strOutput.replace(iPos, strOld.length(), strNew);
        iLpos = iPos + 1;
    }

    ::SysFreeString(bstrInput.m_str);

    // Find and replace is complete; now update the CComBSTR.
    bstrInput.m_str = ::SysAllocString(strOutput.c_str());
}
9
manuell On

Here is another way of doing it, in JavaScript.

var oCollection = document.getElementsByTagName("script");
var nColCount = oCollection.length;
var nIndex;
for ( nIndex = 0; nIndex < nColCount; ++nIndex ) {
    var oScript = oCollection[ nIndex ];
    var strScriptText = oScript.innerHTML;
    if ( strScriptText.indexOf( "alert(\"hello\");" ) != -1 ) {
        var strNewText = strScriptText.replace( "alert(\"hello\");", "" );
        var oNewScript = document.createElement("script");
        oNewScript.type = "text\/javascript";
        oNewScript.text = strNewText;
        document.getElementsByTagName("head")[0].appendChild(oNewScript);
        console.log ("DONE!");
    }
}

You can execute that JS with IHTMLWindow2::execScript

You will have to store the code above in a CComBSTR. Beware of escaping.

Example:

CComBSTR bstrScript = L"var strWithOneDoubleQuote = \"\\\"\";"; // -:)