I am making a web scraper.
I access google search, I get the link of the web page and then I get the contents of the <title> tag.
The problem is that, for example, the string "P\xe1gina N\xe3o Encontrada!" should be "Página Não Encontrada!".
I tried do decode to latin-1 and then encode to utf-8 and it did not work.
r2 = requests.get(item_str)
texto_pagina = r2.text
soup_item = BeautifulSoup(texto_pagina,"html.parser")
empresa = soup_item.find_all("title")
print(empresa_str.decode('latin1').encode('utf8'))
Can you help me, please? Thanks !
You can change the retrieved text variable to something like:
After printing
stringit seemed to work just fine for me.Edit
Instead of adding
.encode('utf8'), have you tried just usingempresa_str.decode('latin1')?As in: