Is there something a [directional?] notion/implementation of distance between Wikipedia categories/pages?
For example consider: A) "Saint Louis University" B) "university"
Clearly "A" is a type of "B". How can you extract this from Wiki? If you extract all the categories connect to A, you'd see that it gives
Category:1818 establishments in Missouri Territory
Category:Articles containing Latin-language text
Category:Association of Catholic Colleges and Universities
Category:Commons category with local link same as on Wikidata
Category:Coordinates on Wikidata
Category:Educational institutions established in 1818
Category:Instances of Infobox university using image size
Category:Jesuit universities and colleges in the United States
Category:Roman Catholic Archdiocese of St. Louis
Category:Roman Catholic universities and colleges in Missouri
and it does not contain anything that would directly connect to B (https://en.wikipedia.org/wiki/University). But essentially if you look further, you should be able to find a multi-hop path between A and B, possibly multiple hops. What are the popular ways of accomplishing this?
If you have the entire Wikipedia category taxonomy, then you can compute the distance (shortest path length) between two categories. If one category is the ancestor of other, it is straight forward.
Otherwise you can find the Least Common Subsumer which is defined as follows.
Then compute the distance between them via LCS.
I encourage you to go through similarity measures where you will find state-of-art techniques to compute semantic similarity between words.
Resource: My project on extracting Wikipedia category/concept might help you.
Compute semantic similarity between words using WordNet. WordNet organizes English words in hierarchical fashion. See this wordnet similarity for java demo. It uses eight different state-of-techniques to compute semantic similarity between words.