Return specific data from a Wikipedia Page using API

Question

Return specific data from a Wikipedia Page using API

79 views Asked by Jon295087 At 21 December 2016 at 16:27

I want to parse Geographic pages (i.e. landmarks, places of interest) on Wikipedia to return a json file that only contains only the page title, and the GIS coordinates scraped from the page(s).

So for example, looking at the page: https://en.wikipedia.org/wiki/The_Sanctuary

Using the api: https://en.wikipedia.org/w/api.php?action=query&titles=The%20Sanctuary&prop=revisions&rvprop=content&format=json returns all the data from the page content.

However, I just want to return the following elements:

"title":"The Sanctuary" coord|51.41000|N|1.83173|W

Please can anyone advise how to correctly structure the web service call?

This is a first attempt at scraping content from pages for me, so any guidance greatly appreciated

Original Q&A

There are 1 answers

**Tgr** · Accepted Answer · 2016-12-22T09:27:01+00:00

Rule of thumb for scraping is to not do it. Many things are available in the API (use the API sandbox to discover them). For most other interesting data someone already wrote a library.

In this case, action=query&titles=The_Sanctuary&prop=coordinates will get you what you want:

{
    "batchcomplete": "",
    "query": {
        "pages": {
            "788970": {
                "pageid": 788970,
                "ns": 0,
                "title": "The Sanctuary",
                "coordinates": [
                    {
                        "lat": 51.41,
                        "lon": -1.83173,
                        "primary": "",
                        "globe": "earth"
                    }
                ]
            }
        }
    }
}

TechQA.

Return specific data from a Wikipedia Page using API

There are 1 answers

Related Questions in WEB-SCRAPING

Related Questions in MEDIAWIKI-API

Popular Questions

Popular Tags

Trending Questions