How to efficiently test some HTTP proxies for accessing a specific domain?

63 views Asked by At

I need an efficient way to test some free online HTTP proxies and determine which ones can access a specific website;

And since that proxy testing involves significant waiting time, I opted to redesign my code for asynchronous testing. I then explored the httpx and aiohttp packages. However, I encountered unexpected behaviors, leading me to question if my current code is the best fit for my purpose.

Below is the output of the code for three methods I used:

  • one using the requests package for synchronous testing,
  • and the other two for asynchronous testing.

As you can see, there are several errors, and the time taken to complete each request varies significantly. Interestingly, the requests method returned an HTTP 200 status for four links, while the httpx method returned five, and the aiohttp method returned nothing, which is unexpected considering they are supposed to perform the same task. This raises doubts about how I implemented them.

Additionally, in the httpx method, one proxy took an inexplicably long time, even though I set the timeout to 60 seconds. It took 13,480.64 seconds (I should mention that during this test, I put my PC into sleep mode when I noticed it was taking too long. When I returned later, I found that the process hadn't stopped and was still running.)

Can anyone please tell me what I'm doing wrong here and how I could improve it?

 1) --> 185.XXX.XX.XX:80     --> ProxyError      (4.96s)
 2) --> 38.XX.XXX.XXX:443    --> HTTP (200)      (2.50s)
 3) --> 162.XXX.XX.XXX:80    --> HTTP (200)      (20.92s)
 4) --> 18.XXX.XXX.XXX:8080  --> HTTP (200)      (0.61s)
 5) --> 31.XX.XX.XX:50687    --> ConnectionError (7.88s)
 6) --> 177.XX.XXX.XXX:80    --> ProxyError      (21.07s)
 7) --> 8.XXX.XXX.X:4153     --> HTTP (200)      (4.96s)
 8) --> 146.XX.XXX.XXX:12334 --> ProxyError      (21.05s)
 9) --> 67.XX.XXX.XXX:33081  --> ProxyError      (3.03s)
10) --> 37.XXX.XX.XX:80      --> ReadTimeout     (60.16s)
Testing 10 proxies with "requests" took 147.16 seconds.


 4) --> 18.XXX.XXX.XXX:8080  --> HTTP (200)          (16.09s)
 2) --> 38.XX.XXX.XXX:443    --> HTTP (200)          (22.11s)
 7) --> 8.XXX.XXX.X:4153     --> HTTP (200)          (12.96s)
 1) --> 185.XXX.XX.XX:80     --> RemoteProtocolError (24.83s)
 9) --> 67.XX.XXX.XXX:33081  --> ConnectError        (6.02s)
 3) --> 162.XXX.XX.XXX:80    --> HTTP (200)          (22.48s)
 6) --> 177.XX.XXX.XXX:80    --> HTTP (200)          (26.96s)
 5) --> 31.XX.XX.XX:50687    --> ConnectError        (34.50s)
 8) --> 146.XX.XXX.XXX:12334 --> ConnectError        (27.01s)
10) --> 37.XXX.XX.XX:80      --> ReadError           (13480.64s)
Testing 10 proxies with "httpx" took 13507.80 seconds.


 1) --> 185.XXX.XX.XX:80     --> ClientProxyConnectionError  (1.30s)
 2) --> 38.XX.XXX.XXX:443    --> ClientProxyConnectionError  (0.67s)
 3) --> 162.XXX.XX.XXX:80    --> ClientProxyConnectionError  (0.77s)
 4) --> 18.XXX.XXX.XXX:8080  --> ClientProxyConnectionError  (0.83s)
 5) --> 31.XX.XX.XX:50687    --> ClientProxyConnectionError  (0.85s)
 6) --> 177.XX.XXX.XXX:80    --> ClientProxyConnectionError  (0.91s)
 7) --> 8.XXX.XXX.X:4153     --> ClientProxyConnectionError  (0.94s)
 8) --> 146.XX.XXX.XXX:12334 --> ClientProxyConnectionError  (1.03s)
 9) --> 67.XX.XXX.XXX:33081  --> ClientProxyConnectionError  (1.05s)
10) --> 37.XXX.XX.XX:80      --> ClientProxyConnectionError  (0.62s)
Testing 10 proxies with "aiohttp" took 2.42 seconds.

Here's the code I used:

I started by downloading the proxies from this GitHub repository:

import random
import tempfile
import os
import requests
import time
import asyncio
import httpx
import aiohttp

TIMEOUT: int = 60
DEFAULT_DOMAIN: str = r"www.desired.domain.com"
PROXIES_URL: str = "https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/http.txt"
PROXIES_PATH: str = os.path.join(tempfile.gettempdir(), "httpProxies.txt")
HEADERS: dict = {
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "en,ar;q=0.9,fr;q=0.8",
    "Accept-Encoding": "gzip, deflate",
    "dnt": "1",
    "referer": "https://www.google.com/",
    "sec-ch-ua": '"Microsoft Edge";v="123", "Not:A-Brand";v="8", "Chromium";v="123"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"Windows"',
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "cross-site",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0",
    "Connection": "close",  # "keep-alive",
}

def get_proxies() -> list[str]:
    proxies: list[str] = []
    if os.path.exists(PROXIES_PATH):
        with open(file=PROXIES_PATH, mode="r") as file:
            proxies = file.read().splitlines()
            file.close()
    else:
        response = requests.request(method="GET", url=PROXIES_URL)
        if response.status_code == 200:
            proxies = response.text
            with open(file=PROXIES_PATH, mode="w") as file:
                file.write(proxies)
                file.close()
            proxies = proxies.split("\n")
    return proxies

below is the method I used to sequentially test those proxies:

def sequential_test(proxies_list: list[str]):
    if proxies_list:
        with requests.Session() as session:
            session.headers = HEADERS
            for i, proxy in enumerate(proxies_list, 1):
                session.proxies = {"http": f"http://{proxy}"}
                try:
                    color = "\033[91m"
                    start = time.perf_counter()
                    response = session.get(url=f"http://{DEFAULT_DOMAIN}", timeout=TIMEOUT)
                    status = f"HTTP ({response.status_code})"
                    if response.status_code == 200:
                        color = "\033[92m"
                except Exception as exception:  # requests.RequestException
                    status = type(exception).__name__
                print(f"{i:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)")

The following is the code I used to test if a proxy is working with the desired website or not. I employed httpx and aiohttp, respectively:

async def is_alive_httpx(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
    proxy_mounts = {"http://": httpx.AsyncHTTPTransport(proxy=f"http://{proxy}"),}
    async with httpx.AsyncClient(
        mounts=proxy_mounts,
        timeout=TIMEOUT,
        headers=HEADERS,
        follow_redirects=True
    ) as session:
        try:
            color = "\033[91m"
            start = time.perf_counter()
            response = await session.send(httpx.Request(method="GET", url=f"http://{domain}"))
            status = f"HTTP ({response.status_code})"
            if response.status_code == 200:
                color = "\033[92m"
        except Exception as exception:  # httpx.HTTPError
            status = type(exception).__name__
        print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)"
async def is_alive_aiohttp(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
    try:
        async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=TIMEOUT), headers=HEADERS, trust_env=True,
                                         connector=aiohttp.TCPConnector(ssl_context=None, force_close=True, limit_per_host=5)) as client:
            color = "\033[91m"
            start = time.perf_counter()
            response = await client.get(url=f"http://{domain}", proxy=f"http://{proxy}")
            status = f"HTTP ({response.status})"
            if response.status == 200:
                color = "\033[92m"
    except Exception as exception:  # aiohttp.ClientError
        status = type(exception).__name__
        print(status + ":", exception)
    finally:
        await client.close()
    print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:26}\t({time.perf_counter()-start:.2f}s)")
    await asyncio.sleep(0.3)

Below is the remainder of the code. You can run it directly by copying it into your environment, (just ensure you have the required packages installed):

async def test_proxies(proxies_list: list[str], func):
    if proxies_list:
        await asyncio.gather(*[func(ip[0], ip[1]) for ip in enumerate(proxies_list, 1)])


def main():
    proxies = random.sample(get_proxies(), 10)  # get_proxies()[:10]

    start = time.perf_counter()
    sequential_test(proxies)
    print(f'\nTesting {len(proxies)} proxies with "requests" took {time.perf_counter()-start:.2f} seconds.\n')

    start = time.perf_counter()
    asyncio.run(test_proxies(proxies, is_alive_httpx))
    print(f'\nTesting {len(proxies)} proxies with "httpx" took {time.perf_counter()-start:.2f} seconds.\n')

    start = time.perf_counter()
    asyncio.run(test_proxies(proxies, is_alive_aiohttp))
    print(f'\nTesting {len(proxies)} proxies with "aiohttp" took {time.perf_counter()-start:.2f} seconds.\n')


if __name__ == "__main__":
    main()

Here are some of the errors I frequently encounter when using aiohttp, for example:

  • ClientProxyConnectionError: Cannot connect to host ssl:default
    • [The semaphore timeout period has expired]
    • [The remote computer refused the network connection]
  • ClientResponse:
    • [409 Conflict]
    • [407 Proxy Authentication Required]
  • ClientOSError:
    • [WinError 64] The specified network name is no longer available
    • [WinError 1236] The network connection was aborted by the local system
  • ServerDisconnectedError: Server disconnected.
1

There are 1 answers

3
Deadbeef Development On

To diagnose the error from your aiohttp code, it's important to print the full exception details, not just its name.

print(exception)

In this code, detailed information is printed about what went wrong.