Filter h1 HTML With BeautifulSoup

240 views Asked by At

I'm Writing a CODE Algorithm BELOW:

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests
import time
browser =webdriver.Firefox(executable_path=r'C:/path/geckodriver.exe')
browser.get('https://brainly.com.br/app/ask?entry=hero&q=15+12')
WebDriverWait(browser,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,'a[href*="/tarefa"]')))
html=browser.page_source
#html = browser.execute_script("return document.documentElement.outerHTML")
p=[]
soup=BeautifulSoup(html,'html.parser')
for link in soup.select('div.sg-actions-list__hole > a[href*="/tarefa"]'):
    ref=link.get('href')
    rt = ('https://brainly.com.br'+str(ref))
    p.append(rt)

g=[]
for url in p:
    r = requests.get(url).text
    time.sleep(10)


rs= BeautifulSoup(r,'html.parser')


df = rs.select('div > h1')
print(df) 

HTML I need to filter:

<div class="brn-content-image">
<h1 class="sg-text sg-text--large sg-text--regular">
Tinha 12 cruzeiros e gastei 15 cruzeiros esta situação pode ser representada por a 15 - 12 B 12 + 15 C 12 - 11 e D - 12 + 15
</h1>
</div> 

AND

<div class="brn-content-image">
<h1 class="sg-text sg-text--large sg-text--regular">
[(-15)12]12:(-15)142?
</h1>
</div> 

At Variable r I am storing the source code for 8 URLS, ie 8 Source Code which I am not sure is the problem that I cannot filter it with.

Using Code to Filter HTML:

rs= BeautifulSoup(r,'html.parser')
df = rs.select('div > h1')
print(df)

What mistake am I making?.

2

There are 2 answers

7
αԋɱҽԃ αмєяιcαη On BEST ANSWER

Instead of using Selenium, you can call the API directly which is rendering the JavaScript using XHR request.

import requests

data = [{"operationName": "SearchQuery", "variables": {"query": "15 12", "after": None, "first": 10},
         "query": "query SearchQuery($query: String!, $first: Int!, $after: ID) {\n  questionSearch(query: $query, first: $first, after: $after) {\n    count\n    edges {\n      node {\n        id\n        databaseId\n        author {\n          id\n          databaseId\n          isDeleted\n          nick\n          avatar {\n            thumbnailUrl\n            __typename\n          }\n          rank {\n            name\n            __typename\n          }\n          __typename\n        }\n        content\n        answers {\n          nodes {\n            thanksCount\n            ratesCount\n            rating\n            __typename\n          }\n          hasVerified\n          __typename\n        }\n        __typename\n      }\n      highlight {\n        contentFragments\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}]
r = requests.post("https://brainly.com.br/graphql/pt", json=data).json()


for item in r[0]['data']['questionSearch']['edges']:
    print(item['node']['content'])

Output:

tinha 12 reais e gastei 15 reais essa situaτπo pode ser representada por 15-12   12+15   12-15  -12+15
tinha 12 reais e gastei 15 reais,essa situaτπo pode ser representada por:                                                a)15-12 b)12+15 c)12-15 d)12+15
tinha 12 reais e gastei 15 reais. esta situaτπo pode ser representada por:<br />
a) 15 - 12<br />
b) 12 - 15<br />
c) 12 + 15<br />
d) - 12 + 15
[(-15)12]12:(-15)142?
Tinha r$ 15 e gastei r$ 12 essa situaτπo pode ser representada por .<br />
A)+15 - 12<br />
B) +12 + 15<br />
C + 12 - 15<br />
D -12+15<br />
Me ajude a para amanhπ obrigado <br />
Tinha 12 cruzeiros e gastei 15 cruzeiros esta situaτπo pode ser representada por a 15 - 12 B 12 + 15 C 12 - 11 e D - 12 + 15
Complete as lacunas com um n·mero inteiro, de modo que as igualdades sejam verdadeiras. a) -15 + ___ = 12. <br />
b) 15 + __ = 12. <br />
c) -15 + __ = -12. <br />
d) 15 + ___ = -12
Mediana de 10-10-11-12-12-12-12-12-13-13-13-15-15-15-15-16-16-18-20-20 ? 
 á á á á á á á áA Figura Abaixo representa um terreno com suas dimens⌡es:15+15+12+12+12+12
15 14 12 13 14 14 15 10 10 12 13 10 15 10 12 12 12 14 15 12 Determine a Moda Dessa distribuiτπo de frequΩncia?

Update for links:

for item in r[0]['data']['questionSearch']['edges']:
    print(f"https://brainly.com.br/tarefa/{item['node']['databaseId']}")

Output:

https://brainly.com.br/tarefa/4726592
https://brainly.com.br/tarefa/2645408
https://brainly.com.br/tarefa/1610749
https://brainly.com.br/tarefa/6008293
https://brainly.com.br/tarefa/13872231
https://brainly.com.br/tarefa/22541768
https://brainly.com.br/tarefa/4531702
https://brainly.com.br/tarefa/23553975
https://brainly.com.br/tarefa/7061395
https://brainly.com.br/tarefa/13037193
8
GiovaniSalazar On

Try with this :

from bs4 import BeautifulSoup

html = """
<div class="brn-content-image">
<h1 class="sg-text sg-text--large sg-text--regular">
Tinha 12 cruzeiros e gastei 15 cruzeiros esta situação pode ser representada por a 15 - 12 B 12 + 15 C 12 - 11 e D - 12 + 15
</h1>
</div>
<div class="brn-content-image">
<h1 class="sg-text sg-text--large sg-text--regular">
[(-15)12]12:(-15)142?
</h1>
</div> 
       """
soup = BeautifulSoup(html,'html.parser')

for n in soup.find_all('div', attrs={'class': 'brn-content-image'}):    
   print(n.find('h1').text)