Web Text Grabber in Swift

155 views Asked by At

I have been trying to get the text content of any web page via:

func getTextContentFromUrl (url: URL) -> String? {
    var content = ""
    do {
       content = try String(contentsOf: url)
    } catch {
       return nil
    }
    return content
}

It works fine if the web page contains texts inside html/body tags, but not if the Web Page contains only javascript, e.g.: https://twitter.com/search?q=tesla&src=typed_query

I know about Swifter, but I cannot program potentially hundreds of API to access any WEB site: twitter, facebook, linkedin, quora, amazon etc. Obviously, WKWebView views know how to display and print their text, therefore I tried to get the text content from WKWebView:

(1) Unfortunately, the following method always returns "" even though I call it from webView(_ webView: WKWebView, didFinish navigation: WKNavigation!):

func getTextContentFromWebView () -> String {
    var content = ""
    myWKWebView.evaluateJavaScript("document.documentElement") { (string, error) in
        if string != nil {
            content = string as! String
        }
    }
    return content
}

I tried variants of this code published on the WEB, such as "document.body.textContent", "document.body.innerText", "document.body.outerHTML", "document.body.innerHTML", but this method always returns ""...

(2) I have also tried to use the clipboard to get the text content (myWKWebView.SelectAll(), myWWKWebView.copy()), but myWKWebView.copy() always sends an exception (even though this method is supposed to work for any NSView, as Apple's documentation states):

2020-03-13 15:21:26.251341+0100 Text Miner[7313:603242] -[WKWebView copyWithZone:]: unrecognized selector sent to instance 0x101b815c0

If anyone can manually copy & paste and print the textual content of any web page via any web browser regardless of its content (html/javascript), there should be a generic easy and documented way to grab text from WKWebView, shouldn't be?

1

There are 1 answers

0
silberz On

I figured out that:

  • my mistake in the first problem was that myWKWebView.evaluateJavaScript is an asynchronous function, i.e. it returns right away with content="" (without any time to set this variable). The solution is to process the content of the variable "content" inside its body inside the method.

  • WKwebViews do accept a copy() method but do not implement it: it is up to developers to implement it. I read somewhere that it is done it via an interface javascript-swift...

Anyways, first solution works for me.