AppleScript won't go beyond top level

117 views Asked by At

I've found a script for downloading information about crypto currencies so that I can download into a Numbers spreadsheet using AppleScript. This is the script:

set mySheetName to "Coin Prices"
set myTableName to "Coin Prices"
set tgtCell to "A2"

set theHtml to do shell script "curl -s " & quoted form of "https://www.worldcoinindex.com"
set text item delimiters to {"<tbody>", "</tbody>"}
set tableContents to theHtml's text item 2 # item 2 is the body of the price table
set text item delimiters to {"<h2>"} # site uses new h2 for each currency
set tableChunks to tableContents's text items 2 thru -1
set pasteStr to ""
repeat with aChunk in tableChunks
    set text item delimiters to "><span>$ </span><span class=\"span\">"
    tell aChunk's text item 1 to set {theSymbol, thePrice} to {first word, last word}
    set pasteStr to pasteStr & theSymbol & tab & thePrice & return
end repeat
set the clipboard to pasteStr


tell application "Numbers"
    tell front document
        tell sheet mySheetName to tell table myTableName
            
            activate
            set selection range to range tgtCell
            delay 0.3
            tell application "System Events" to keystroke "v" using {option down, shift down, command down}
        end tell
    end tell
end tell

It works perfectly until I set this line:

set theHtml to do shell script "curl -s " & quoted form of "https://www.worldcoinindex.com/watchlist"

I've checked the webpage code and it's exactly the same but I get a big long error box with something to do with Item 2. I won't copy and paste because the error box contains the entire source code for the webpage. The error reads like this though:

Can’t get text item 2 of

From there it's the source code.

Why does this script work on the base of the URL and not subdirectories of the URL?

Thanks for your help folks.

2

There are 2 answers

12
user3439894 On BEST ANSWER

As I do not have an account to access https://www.worldcoinindex.com/watchlist and see it source code while logged in, I'll take your word that it has the <tbody> and </tbody> tags and offer you an alternate solution to using curl.

Assuming you are using Safari and are logged in at the target URL and the page is fully loaded, you can use the following example AppleScript code to get the data you seek.

Add the following to the top of your existing AppleScript script while commenting out the set theHtml to do shell script ... line of code.

tell application "Safari" to ¬
    set theHtml to do JavaScript ¬
        "document.getElementById('myTable').innerHTML;" in document 1

Note that myTable in the JavaScript command comes from the table on the main domain and may need to be adjusted for the Watchlist.

Look at the page source for e.g:

<table id="myTable" class= ... >
<thead>

You can also use e.g.:

"document.getElementsByClassName('...')[0].innerHTML;"in document 1

Replacing the ... as shown in the source code for class=



Update:

Here is a version of example AppleScript code that will open a new Safari document to the target URL, and then dynamically create a Numbers document in the background, bring it frontmost once complete. No using curl or parsing HTML in a manner it really shouldn't be done in the first place. No clipboard or pasting into Numbers.

Note that the Safari window can stay the background once its id has been ascertained, which only takes a moment once the window first appears, you can then set focus elsewhere while the script runs. The new Safari window is actually already in the background when it's created as Safari has not been told to activate.

I created a login for the site and added the top three coins to my watchlist, and this screenshot is of the dynamically created Numbers document. It will do the same for the main URL as well if that's what you set theURL to in the example AppleScript code.

Note that as currently coded, if not logged in it will notify you and abort the running of the script. I hope to update the code a bit later to handle not being logged in and dynamically logging in if not, but that is for the next iteration of the example AppleScript code.

enter image description here

Example AppleScript code:

property theURL : "https://www.worldcoinindex.com/watchlist"
-- property theURL : "https://www.worldcoinindex.com"

property myNumbersSheetName : "Coin Prices"
property myNumbersTableName : "Coin Prices"


--  # Do not modify code below unless necessary.

property winID : missing value
property itemCount : missing value
property loginStatus : missing value
property thisNumbersDocument : missing value
property theNumbersDocumentName : missing value
property theSafariDocumentName : missing value

--  # Create a new Safari document to the target URL.
--  # Get the id of the newly created window.
--  # Wait for the page to finish loading.
--  # Get the name of the newly created document.
--  # Get Login status and if not already logged in,
--  # notify user and abort the running of the script.
--  # Get the count of ticker symbols for Numbers.

tell application "Safari"
    make new document with properties {URL:theURL}
    set winID to id of window 1
    my waitForSafariPageToFinishLoading()
    set theSafariDocumentName to name of window id winID
    tell document theSafariDocumentName
        set loginStatus to ¬
            do JavaScript ¬
                "document.getElementsByClassName('logout-nav-container')[0].innerHTML;"
        if loginStatus contains "Login" then
            display dialog ¬
                "You are not logged in!   Please login, " & ¬
                "then run script again..." buttons {"OK"} ¬
                default button 1 with title ¬
                "Login Required To Run This Script"
            return
        else
            set itemCount to my getTickerSymbolCount()
        end if
    end tell
end tell


--  # Create a new document in Numbers in the background.
--  # Create two columns and one row more than the number of
--  # ticker symbols on the page. Set the column header names.

tell application "Numbers"
    set columnCount to 2
    set rowCount to itemCount + 1
    set thisNumbersDocument to make new document
    set theNumbersDocumentName to the name of thisNumbersDocument
    tell thisNumbersDocument
        delete every table of every sheet
        tell active sheet to set its name to myNumbersSheetName
        tell sheet myNumbersSheetName
            set thisTable to ¬
                make new table with properties ¬
                    {name:myNumbersTableName ¬
                        , column count:columnCount ¬
                        , row count:rowCount}
            tell thisTable
                set value of cell "A1" to "Ticker Symbol"
                set value of cell "B1" to "Last Price"
            end tell
        end tell
    end tell
end tell


--  # Get the 'Ticker Symbol' and 'Last Price' for 
--  # the number of symbols on the page, setting their 
--  # values to the target cells in the Numbers document.

tell application "Safari"
    tell document theSafariDocumentName
        set n to 2
        repeat with i from 0 to itemCount - 1
            set |Ticker Symbol| to ¬
                first paragraph of ¬
                (do JavaScript ¬
                    "document.getElementsByClassName('ticker')[" & i & "].innerText;")
            my addToNumbersTable("A", n, |Ticker Symbol|)
            set |Last Price| to ¬
                (do JavaScript ¬
                    "document.getElementsByClassName('number pricekoers lastprice')[" & i & "].innerText;")
            my addToNumbersTable("B", n, |Last Price|)
            set n to n + 1
        end repeat
    end tell
end tell


-- # Set focus to cell A1 and bring 
-- # the Numbers document frontmost.

tell application "Numbers"
    --  # Set focus to cell A1.
    tell table myNumbersTableName of ¬
        sheet myNumbersSheetName of ¬
        document theNumbersDocumentName to ¬
        set selection range to range "A1"
    activate
end tell



--  ##  Handlers  ##

to getTickerSymbolCount()
    tell application "Safari" to ¬
        tell document ¬
            theSafariDocumentName to ¬
            return ¬
                (do JavaScript ¬
                    "document.getElementsByClassName('ticker').length;") ¬
                    as integer
end getTickerSymbolCount

to addToNumbersTable(c, n, v)
    --  # Sets the value of the target cell.    
    tell application "Numbers" to ¬
        tell table myNumbersTableName of ¬
            sheet myNumbersSheetName of ¬
            document theNumbersDocumentName to ¬
            set value of cell (c & n) to v
end addToNumbersTable

on waitForSafariPageToFinishLoading()
    --  # Wait for page to finish loading in Safari.
    --  # This works in **macOS Catalina** (10.15.7) and 
    --  # macOS Big Sur (11.4) and may need adjusting for
    --  # updated versions of Safari in these version of 
    --  # macOS, or other versions of macOS past or future.
    tell application "System Events" to repeat until ¬
        exists (buttons of groups of toolbar 1 of window 1 of ¬
            process "Safari" whose name = "Reload this page")
        delay 0.5
    end repeat
end waitForSafariPageToFinishLoading


Note: The example AppleScript code is just that and sans any included error handling does not contain any additional error handling as may be appropriate. The onus is upon the user to add any error handling as may be appropriate, needed or wanted. Have a look at the try statement and error statement in the AppleScript Language Guide. See also, Working with Errors. Additionally, the use of the delay command may be necessary between events where appropriate, e.g. delay 0.5, with the value of the delay set appropriately.

4
Ted Wrigley On

The short answer is that the main page contains an explicit html table, while the watchlist page seems to be a structured series of div elements generated by javascript and made to look like a table. There is no 'tbody' element on the watchlist page because there is no table there. The text items command splits the first page into three parts (the second of which is the one you want); it doesn't split the watchlist page at all, which produces an array with a single item containing all of the html. When you ask an array of 1 element for its second item, you get your error.

You're going to have to examine the html of the second page and figure out how to split the text to extract the information you want.