How to parse Wiktionary API?

3.9k views Asked by At

There is a lack of online resources that demonstrate how I might parse a Wiktionary API response, that looks like this:

{
    "query": {
        "pages": {
            "40915": {
                "pageid": 40915,
                "ns": 0,
                "title": "reluctant",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "==English==\n\n===Etymology===\nFrom {{etyl|la|en}} {{term|lang=la|reluctans}}, present participle of {{term|reluctare}}, {{term|reluctari||to struggle against, oppose, resist}}, from {{term|re-||back}} + {{term|luctari||to struggle}}.\n\n===Pronunciation===\n* {{IPA|/ɹɪˈlʌktənt/}}\n* {{audio|en-us-reluctant.ogg|Audio (US)}}\n\n===Adjective===\n{{en-adj}}\n\n# {{context|now|_|rare|lang=en}} [[opposing|Opposing]]; offering [[resistance]] (to).\n#* '''1819''', Lord Byron, ''Don Juan'', II.108:\n#*: There, breathless, with his digging nails he clung / Fast to the sand, lest the returning wave, / From whose '''reluctant''' roar his life he wrung, / Should suck him back to her insatiate grave [...].\n#* '''2008''', Kern Alexander et al., ''The World Trade Organization and Trade in Services'', p. 222:\n#*: They are '''reluctant''' to the inclusion of a necessity test, especially of a horizontal nature, and emphasize, instead, the importance of procedural disciplines [...].\n# Not [[wanting]] to take some [[action]]; [[unwilling]].\n#: ''She was '''reluctant''' to lend him the money''\n\n====Synonyms====\n* [[unwilling]], [[disinclined]]\n\n====Translations====\n{{trans-top|not wanting to take some action}}\n* Chinese: \n*: Mandarin: {{t|cmn|不情願|sc=Hani}}, {{t+|cmn|不情愿|tr=bùqíngyuàn|sc=Hani}}\n* Czech: {{t|cs|neochotný}}, {{t|cs|zdráhající}} se\n* Dutch: {{t+|nl|aarzelend}}\n* Finnish: {{t+|fi|haluton}}, {{t+|fi|vastahakoinen}}\n* French: {{t+|fr|réservé}},  {{t+|fr|réfractaire}},  {{t+|fr|rétif}}\n* German: {{t|de|zögernd}}\n* Hungarian: {{t|hu|kelletlen}}\n* Indonesian: {{t+|id|enggan}}\n* Interlingua: [[reluctante]]\n* Italian: {{t+|it|riluttante}}\n{{trans-mid}}\n* Latin: {{t|la|invītus}}\n* Manx: {{t|gv|neuarryltagh}}, {{t|gv|neuwooiagh}}\n* Maori: {{t|mi|whakawhēuaua}}, {{t|mi|manauhea}}\n* Polish: [[niechętny]]\n* Romanian: reticent, precaut, {{t|ro|prevăzător}}\n* Russian: {{t+|ru|неохотный|tr=neoxótnyj}}\n* Scots: {{t|sco|sweer}}, {{t|sco|sweirt}}, {{t|sco|laith}}\n* Scottish Gaelic: {{t|gd|aindeònach}}, {{t|gd|leisg}}\n* Spanish: {{t+|es|renuente}}, {{t|es|reacio}}\n* Swedish: {{t|sv|motvillig}}\n{{trans-bottom}}\n\n====Related terms====\n* [[reluctance]]\n* [[reluctantly]]\n\n===External links===\n* {{R:Webster 1913}}\n* {{R:Century 1911}}\n* {{R:OneLook}}\n\n[[ca:reluctant]]\n[[cy:reluctant]]\n[[et:reluctant]]\n[[el:reluctant]]\n[[es:reluctant]]\n[[fr:reluctant]]\n[[ko:reluctant]]\n[[io:reluctant]]\n[[kn:reluctant]]\n[[ku:reluctant]]\n[[hu:reluctant]]\n[[mg:reluctant]]\n[[ml:reluctant]]\n[[my:reluctant]]\n[[nl:reluctant]]\n[[pl:reluctant]]\n[[pt:reluctant]]\n[[simple:reluctant]]\n[[fi:reluctant]]\n[[sv:reluctant]]\n[[ta:reluctant]]\n[[te:reluctant]]\n[[th:reluctant]]\n[[vi:reluctant]]\n[[zh:reluctant]]"
                    }
                ]
            }
        }
    }
}

Basically all I want is the English definition, but the response format is so odd, that everything about the word is jumbled up into one large inseparable blob.

  1. Is there an API way to get the response in an actual JSON format, where the English definition would just be a JSON key?
  2. Would I have to resort to a regex pattern to do this, and how might that look?
  3. Lastly, why would the API designers return data like this? I want to judge and say they have no idea what they're doing, but surely there must be a reason.
1

There are 1 answers

0
neuronet On

use extracts property to get html version

https://en.wiktionary.org/w/api.php?titles=cloud&action=query&prop=extracts&format=json