wikipedia extraction with Beautiful Soup

265 views Asked by jdv12 At 21 June 2015 at 13:31

Hey so I am just beginning to learn how to use Beautiful Soup and I am having trouble pulling out the right html tags from a wikipedia page.

I am trying to pull out the individual subcategories from the sub categories section on https://en.wikipedia.org/wiki/Category:Furniture

however, I can't seem to figure out how to do it through all of the embedded links. I have managed to extract the page links pretty simply with:

pg_links = soup.find("div" , { "id" : "mw-pages" })

However, when I try similar code to get the subcategories

sub_cats = soup.find("div" , { "class" : "CategoryTreeSection" })

I only get a portion of the output I want and when I try to reduce the scope;

sub_cats = soup.find("li" , { "class" : "CategoryTreeSection" })

I don't get anything at all. Any Insight into this issue would be appreciated

---Again here is the link to the wiki page I'm trying to pull from: https://en.wikipedia.org/wiki/Category:Furniture

TechQA.