Can't extract data using beautifulSoup for javascript?

142 views Asked by At

Hi guys I was trying to extract data from https://newslab.malaysiakini.com/covid-19/en

import requests
from bs4 import BeautifulSoup

page = requests.get("https://newslab.malaysiakini.com/covid-19/en")

soup = BeautifulSoup(page.content, 'html.parser')

option_tags = soup.find(id="uk-grid uk-grid-small uk-width-auto uk-flex uk-flex-middle uk-flex-center")

patient_items = option_tags.find_all(class_="patient")

first = patient_items[0]
print(first.prettigy())

I cant extract the result seems like my html.parser cannot get the data like I see in the google console. Anyone can help on this?

1

There are 1 answers

1
Rusty Robot On

The site makes a lot of requests after the initial requests to https://newslab.malaysiakini.com/covid-19/en. These additional links may have what you're looking for.

This link probably has all the information you are looking for except the GPS coordinates. The locaiton is more difficult, they appear to be compiled into some javascript and data tags.

https://m5.malaysiakini.com/en/tag/covid-19?alt=json This contains a JSON format of all the stories on the google map/list. For example:

{
            "title": "Tabligh particiapants: Foreigners the cause of Covid-19 spread, not fair to blame locals",
            "sid": 514832,
            "image_feat": ["https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg"],
            "image_feat_single": "https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg",
            "summary": "<p>Most of us went to the hospital for testing as soon we were given the directive, says a participant.</p>",
            "author": "",
            "author_array": [],
            "author_display": "no",
            "date_pub": 1584321043,
            "date_pub2": "1584321043000",
            "date_pubh": "2020-03-16 09:10:43+08:00",
            "category": "news",
            "comment_count": 0,
            "tags": ["health", "coronavirus", "covid-19", "tabligh gathering", "infection"],
            "free": false,
            "redirect": "",
            "date_modh": "2020-03-16 09:10:43+08:00"
        }