Can't extract data using beautifulSoup for javascript?

Question

Can't extract data using beautifulSoup for javascript?

144 views Asked by Ka Vui At 08 December 2024 at 04:18

Hi guys I was trying to extract data from https://newslab.malaysiakini.com/covid-19/en

import requests
from bs4 import BeautifulSoup

page = requests.get("https://newslab.malaysiakini.com/covid-19/en")

soup = BeautifulSoup(page.content, 'html.parser')

option_tags = soup.find(id="uk-grid uk-grid-small uk-width-auto uk-flex uk-flex-middle uk-flex-center")

patient_items = option_tags.find_all(class_="patient")

first = patient_items[0]
print(first.prettigy())

I cant extract the result seems like my html.parser cannot get the data like I see in the google console. Anyone can help on this?

Original Q&A

There are 1 answers

**Rusty Robot** · Answer 1 · 2020-03-16T02:09:21+00:00

The site makes a lot of requests after the initial requests to https://newslab.malaysiakini.com/covid-19/en. These additional links may have what you're looking for.

This link probably has all the information you are looking for except the GPS coordinates. The locaiton is more difficult, they appear to be compiled into some javascript and data tags.

https://m5.malaysiakini.com/en/tag/covid-19?alt=json This contains a JSON format of all the stories on the google map/list. For example:

{
            "title": "Tabligh particiapants: Foreigners the cause of Covid-19 spread, not fair to blame locals",
            "sid": 514832,
            "image_feat": ["https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg"],
            "image_feat_single": "https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg",
            "summary": "<p>Most of us went to the hospital for testing as soon we were given the directive, says a participant.</p>",
            "author": "",
            "author_array": [],
            "author_display": "no",
            "date_pub": 1584321043,
            "date_pub2": "1584321043000",
            "date_pubh": "2020-03-16 09:10:43+08:00",
            "category": "news",
            "comment_count": 0,
            "tags": ["health", "coronavirus", "covid-19", "tabligh gathering", "infection"],
            "free": false,
            "redirect": "",
            "date_modh": "2020-03-16 09:10:43+08:00"
        }

TechQA.

Can't extract data using beautifulSoup for javascript?

There are 1 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in HTML-PARSING

Popular Questions

Popular Tags

Trending Questions