How to use lxml for web scraping?

Question

How to use lxml for web scraping?

528 views Asked by Raunanza At 22 October 2020 at 06:50

I want to write a python script that fetches my current reputation on stack overflow --https://stackoverflow.com/users/14483205/raunanza?tab=profile

This is the code I have written.

from lxml import html 
import requests
page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile')
tree = html.fromstring(page.content)

Now, what to do to fetch my reputation. (I can't understand how to use xpath even
after googling it.)

Original Q&A

There are 3 answers

Tasnuva Leeya On 22 October 2020 at 07:02

Simple solution using lxml and beautifulsoup:

from lxml import html
from bs4 import BeautifulSoup
import requests
page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile').text
tree = BeautifulSoup(page, 'lxml')
name = tree.find("div", {'class': 'grid--cell fw-bold'}).text
title = tree.find("div", {'class': 'grid--cell fs-title fc-dark'}).text
print("Stackoverflow reputation of {}is: {}".format(name, title))
# output: Stackoverflow reputation of Raunanza is: 3

mulaixi On 22 October 2020 at 07:06

If you don't mind using BeautifulSoup, you can directly extract the text from the tag which contains your reputation. Of course you need to check page structure first.

from bs4 import BeautifulSoup
import requests

page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile')
soup = BeautifulSoup(page.content, features= 'lxml')

for tag in soup.find_all('strong', {'class': 'ml6 fc-medium'}):
    print(tag.text)
#this will output as 3

**Ananth** · Accepted Answer · 2020-10-22T07:18:34+00:00

You need to make some modifications in your code to get the xpath. Below is the code:

from lxml import HTML 
import requests

page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile')
tree = html.fromstring(page.content) 
title = tree.xpath('//*[@id="avatar-card"]/div[2]/div/div[1]/text()')
print(title) #prints 3

You can easily get the xpath of element in chrome console(inspect option).

To learn more about xpath you can refer: https://www.w3schools.com/xml/xpath_examples.asp

TechQA.

How to use lxml for web scraping?

There are 3 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in LXML.HTML

Popular Questions

Popular Tags

Trending Questions