How to get a list of page objects using Jsoup

39 views Asked by At

I've been working with Jsoup for a long time, and I managed to capture it on several sites, but there's one that I can't do at all, I've tried passing almost all possible ids and I can't return the result list.

My code structure is very simple:

fun execute(beer: String, record: Int = 100): WebsiteEntity {
    val products = arrayListOf<ProductEntity>()
    var document: Document? = null

    try {
        document = Jsoup.connect("https://mercado.carrefour.com.br/s?q=heineken&sort=score_desc&page=0").get()
    } catch (e: Exception) {
        return getWebsiteEntity(products)
    }

    for (item in document.select(".grid .grid-cols-2")) {
        products.add(
            ProductEntity(
                name = "",
                price = "",
                partner = "",
                url = "",
                image = "",
                percent = "",
                websiteName = ""
            )
        )
    }
    return getWebsiteEntity(products)
}

I tried to get the value of the grid:

"grid grid-cols-2 xl:grid-cols-5 md:grid-cols-4"

but it doesn't return anything.

evidência

1

There are 1 answers

0
spikehd On

Does the website use Javascript to populate the page? If so, Jsoup won't work for you. From Baeldung:

Bear in mind that jsoup interprets HTML only — it does not interpret JavaScript. Therefore changes to the DOM that would normally take place after page loads in a JavaScript-enabled browser will not be seen in jsoup.

As for alternatives that DO run full browser engines, the one that people seem to recommend these days for Java is Playwright, though I've never used it myself.