I want to parse Geographic pages (i.e. landmarks, places of interest) on Wikipedia to return a json file that only contains only the page title, and the GIS coordinates scraped from the page(s).
So for example, looking at the page: https://en.wikipedia.org/wiki/The_Sanctuary
Using the api: https://en.wikipedia.org/w/api.php?action=query&titles=The%20Sanctuary&prop=revisions&rvprop=content&format=json returns all the data from the page content.
However, I just want to return the following elements:
"title":"The Sanctuary" coord|51.41000|N|1.83173|W
Please can anyone advise how to correctly structure the web service call?
This is a first attempt at scraping content from pages for me, so any guidance greatly appreciated
Rule of thumb for scraping is to not do it. Many things are available in the API (use the API sandbox to discover them). For most other interesting data someone already wrote a library.
In this case, action=query&titles=The_Sanctuary&prop=coordinates will get you what you want: