'NoneType' object has no attribute 'attrs' error in python

73 views Asked by At

I am trying to scarp the the site: https://stackoverflow.com/questions/tagged/docusignapi to get vote_count, answers and views. However I am getting None values for few questions and not sure how to handle it. Below is the code I am running in Jupyter notebook. I am not able to print answers and views.

from bs4 import BeautifulSoup
import requests

url= "https://stackoverflow.com/questions/tagged/docusignapi"
page = requests.get(url)

soup = BeautifulSoup(page.text, "html.parser")
questions = soup.select(".s-post-summary")

   questions_data = {
    "questions": []
}
questions = soup.select(".s-post-summary ")
for que in questions:
    q = que.select_one('.s-link').getText()
    vote_count = que.select_one('.s-post-summary--stats-item-number').text
    answers = que.select_one('.s-post-summary--stats-item.has-answers').attrs['title']
    views = que.select_one('.s-post-summary--stats-item ')
    print(answers)
   

Getting the below error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-70-b2414da81e70> in <module>
      6     q = que.select_one('.s-link').getText()
      7     vote_count = que.select_one('.s-post-summary--stats-item-number').text
----> 8     answers = que.select_one('.s-post-summary--stats-item.has-answers').attrs['title']
      9     views = que.select_one('.s-post-summary--stats-item ')
     10 print(answers)

AttributeError: 'NoneType' object has no attribute 'attrs'
2

There are 2 answers

0
Andrej Kesely On

You can check if the selected tag isn't None before accessing the title attribute:

import requests
from bs4 import BeautifulSoup

url = "https://stackoverflow.com/questions/tagged/docusignapi"
page = requests.get(url)

soup = BeautifulSoup(page.text, "html.parser")
questions = soup.select(".s-post-summary")

questions_data = {"questions": []}
questions = soup.select(".s-post-summary")
for que in questions:
    q = que.select_one(".s-link").getText()
    vote_count = que.select_one(".s-post-summary--stats-item-number").text
    answers = que.select_one(".s-post-summary--stats-item.has-answers")
    answers = answers['title'] if answers else 'No answer'
    views = que.select_one(".s-post-summary--stats-item ").get_text(strip=True, separator=' ')
    print(answers, views)

Prints:

...

one of the answers was accepted as the correct answer 0 votes
1 answer 0 votes
2 answers 1 vote
1 answer 0 votes
1 answer -1 votes
1 answer 0 votes
1 answer -1 votes
0
Unmitigated On

Not all the questions have answers, so you can't select based on the has-answers. However, you can use :nth-of-type(2) to always get the second statistics item.

answers = que.select_one('.s-post-summary--stats-item:nth-of-type(2)').attrs['title']

For more detailed information, read both the title and the text:

stat = que.select_one('.s-post-summary--stats-item:nth-of-type(2)')
print(stat.attrs['title'] + '; ' + stat.getText(strip=True, separator=' '))