Pulling URL with =VLOOKUP from the table

174 views Asked by At

Trying to get url from this link https://www.atanet.org/onlinedirectories/tsd_view.php?id=3856

enter image description here

I use the following formula: =VLOOKUP("Website",ImportXML(A1, "(//table[@id='tableTSDContent']//tr)"),2,0) But unfortunately, it does not pull out the url. I would really appreciate it if you could help me extract the url in question.

1

There are 1 answers

0
General Grievance On BEST ANSWER

I tried using the APIPheny add on to import the data. After the <h2>Online Directories Listing</h2>, I saw a cell that said "Google bot blocked" or something to that effect.

I then went to the site's robots.txt file (https://www.atanet.org/robots.txt), which says:

User-agent: *
Disallow: /onlinedirectories/tsd_view.php*
Disallow: /onlinedirectories/tsd_search.php*

Disallow: /onlinedirectories/tsd_listings/tsd_view.fpl*
Disallow: /onlinedirectories/tsd_listings/tsd_search.fpl*
Disallow: http://www.atanet.org/bin/mpg.pl/28644.html

Disallow: /onlinedirectories/tsd_corp_listings/*

Disallow: /bin
Disallow: /division_calendar


User-agent: Googlebot
Disallow: /onlinedirectories/tsd_view.php*
Disallow: /onlinedirectories/tsd_search.php*

Disallow: /onlinedirectories/tsd_listings/tsd_view.fpl*
Disallow: /onlinedirectories/tsd_listings/tsd_search.fpl*

Disallow: /*division_calendar*

Disallow: /*bin*

Disallow: http://www.atanet.org/bin/mpg.pl/28644.html

User-agent: ITABot
Disallow: /onlinedirectories

I also think this means that the Google Sheets user agent is the same as the same as the Search Engine (Googlebot). If this is the case, then with Google Sheets, you're out of luck here because the tsd_view.php you want is disallowed. Likely, this was put there because they didn't want Google (or other search engines, for that matter) to index people's contact information. Of course, if you're a malicious webcrawler, you could ignore the robots.txt, but Googlebot is a nice bot.