V-lang: How to send +2500 HTTP requests per second?

1.4k views Asked by At

I am planning to write my scraper with V and i need to send estimatedly ~2500 request per second but can't figure out what am i doing wrong, it should be sending concurrently but it is deadly slow right now. Feels like i'm doing something really wrong but i can't figure it out.

import net.http
import sync
import time

fn send_request(mut wg sync.WaitGroup) ?string {
    start := time.ticks()
    data := http.get('https://google.com')?
    finish := time.ticks()
    println('Finish getting time ${finish - start} ms')
    wg.done()
    return data.text
}



fn main() {
    mut wg := sync.new_waitgroup()
    for i := 0; i < 50; i++ {
        wg.add(1)
        go send_request(mut wg)
    }
    wg.wait()
}

Output:

...
Finish getting time 2157 ms
Finish getting time 2173 ms
Finish getting time 2174 ms
Finish getting time 2200 ms
Finish getting time 2225 ms
Finish getting time 2380 ms
Finish getting time 2678 ms
Finish getting time 2770 ms

V Version: 0.1.29

System: Ubuntu 20.04

3

There are 3 answers

1
Major On BEST ANSWER

You're not doing anything wrong. I'm getting similar results in multiple languages in multiple ways. Many sites have rate limiting software that prevent repeated reads like this, that's what you're running up against.

You could try using channels now that they're in, but you'll still run up against the rate limiter.

0
x3- On

Best way to send that many get requests it too use what is called a Head request, it relies on status code rather than a response since it doesn't return any. Which is what makes the http requests faster.

0
tenxsoydev On

Working a lot in V's concurrent spheres in the recent weeks the best way I've found to do it is using a pool processor.

A snip from v/examples:

fn worker_fetch(mut p pool.PoolProcessor, cursor int, worker_id int) voidptr {
    id := p.get_item[int](cursor)
    resp := http.get('https://hacker-news.firebaseio.com/v0/item/${id}.json') or {
        println('failed to fetch data from /v0/item/${id}.json')
        return pool.no_result
    }
    story := json.decode(Story, resp.body) or {
        println('failed to decode a story')
        return pool.no_result
    }
    println('# ${cursor}) ${story.title} | ${story.url}')
    return pool.no_result
}

// Fetches top HN stories in parallel, depending on how many cores you have
fn main() {
    resp := http.get('https://hacker-news.firebaseio.com/v0/topstories.json') or {
        println('failed to fetch data from /v0/topstories.json')
        return
    }
    ids := json.decode([]int, resp.body) or {
        println('failed to decode topstories.json')
        return
    }#[0..10]
    mut fetcher_pool := pool.new_pool_processor(
        callback: worker_fetch
    )
    // Note: if you do not call set_max_jobs, the pool will try to use an optimal
    // number of threads, one per each core in your system, which in most
    // cases is what you want anyway... You can override the automatic choice
    // by setting the VJOBS environment variable too.
    // fetcher_pool.set_max_jobs( 4 )
    fetcher_pool.work_on_items(ids)
}

src: https://github.com/vlang/v/blob/master/examples/news_fetcher.v

docs: https://modules.vosca.dev/standard_library/sync/pool.html