I need an efficient way to test some free online HTTP proxies and determine which ones can access a specific website;
And since that proxy testing involves significant waiting time, I opted to redesign my code for asynchronous testing. I then explored the httpx and aiohttp packages. However, I encountered unexpected behaviors, leading me to question if my current code is the best fit for my purpose.
Below is the output of the code for three methods I used:
- one using the requests package for synchronous testing,
- and the other two for asynchronous testing.
As you can see, there are several errors, and the time taken to complete each request varies significantly. Interestingly, the requests method returned an HTTP 200 status for four links, while the httpx method returned five, and the aiohttp method returned nothing, which is unexpected considering they are supposed to perform the same task. This raises doubts about how I implemented them.
Additionally, in the httpx method, one proxy took an inexplicably long time, even though I set the timeout to 60 seconds. It took 13,480.64 seconds (I should mention that during this test, I put my PC into sleep mode when I noticed it was taking too long. When I returned later, I found that the process hadn't stopped and was still running.)
Can anyone please tell me what I'm doing wrong here and how I could improve it?
1) --> 185.XXX.XX.XX:80 --> ProxyError (4.96s)
2) --> 38.XX.XXX.XXX:443 --> HTTP (200) (2.50s)
3) --> 162.XXX.XX.XXX:80 --> HTTP (200) (20.92s)
4) --> 18.XXX.XXX.XXX:8080 --> HTTP (200) (0.61s)
5) --> 31.XX.XX.XX:50687 --> ConnectionError (7.88s)
6) --> 177.XX.XXX.XXX:80 --> ProxyError (21.07s)
7) --> 8.XXX.XXX.X:4153 --> HTTP (200) (4.96s)
8) --> 146.XX.XXX.XXX:12334 --> ProxyError (21.05s)
9) --> 67.XX.XXX.XXX:33081 --> ProxyError (3.03s)
10) --> 37.XXX.XX.XX:80 --> ReadTimeout (60.16s)
Testing 10 proxies with "requests" took 147.16 seconds.
4) --> 18.XXX.XXX.XXX:8080 --> HTTP (200) (16.09s)
2) --> 38.XX.XXX.XXX:443 --> HTTP (200) (22.11s)
7) --> 8.XXX.XXX.X:4153 --> HTTP (200) (12.96s)
1) --> 185.XXX.XX.XX:80 --> RemoteProtocolError (24.83s)
9) --> 67.XX.XXX.XXX:33081 --> ConnectError (6.02s)
3) --> 162.XXX.XX.XXX:80 --> HTTP (200) (22.48s)
6) --> 177.XX.XXX.XXX:80 --> HTTP (200) (26.96s)
5) --> 31.XX.XX.XX:50687 --> ConnectError (34.50s)
8) --> 146.XX.XXX.XXX:12334 --> ConnectError (27.01s)
10) --> 37.XXX.XX.XX:80 --> ReadError (13480.64s)
Testing 10 proxies with "httpx" took 13507.80 seconds.
1) --> 185.XXX.XX.XX:80 --> ClientProxyConnectionError (1.30s)
2) --> 38.XX.XXX.XXX:443 --> ClientProxyConnectionError (0.67s)
3) --> 162.XXX.XX.XXX:80 --> ClientProxyConnectionError (0.77s)
4) --> 18.XXX.XXX.XXX:8080 --> ClientProxyConnectionError (0.83s)
5) --> 31.XX.XX.XX:50687 --> ClientProxyConnectionError (0.85s)
6) --> 177.XX.XXX.XXX:80 --> ClientProxyConnectionError (0.91s)
7) --> 8.XXX.XXX.X:4153 --> ClientProxyConnectionError (0.94s)
8) --> 146.XX.XXX.XXX:12334 --> ClientProxyConnectionError (1.03s)
9) --> 67.XX.XXX.XXX:33081 --> ClientProxyConnectionError (1.05s)
10) --> 37.XXX.XX.XX:80 --> ClientProxyConnectionError (0.62s)
Testing 10 proxies with "aiohttp" took 2.42 seconds.
Here's the code I used:
I started by downloading the proxies from this GitHub repository:
import random
import tempfile
import os
import requests
import time
import asyncio
import httpx
import aiohttp
TIMEOUT: int = 60
DEFAULT_DOMAIN: str = r"www.desired.domain.com"
PROXIES_URL: str = "https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/http.txt"
PROXIES_PATH: str = os.path.join(tempfile.gettempdir(), "httpProxies.txt")
HEADERS: dict = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "en,ar;q=0.9,fr;q=0.8",
"Accept-Encoding": "gzip, deflate",
"dnt": "1",
"referer": "https://www.google.com/",
"sec-ch-ua": '"Microsoft Edge";v="123", "Not:A-Brand";v="8", "Chromium";v="123"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "cross-site",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0",
"Connection": "close", # "keep-alive",
}
def get_proxies() -> list[str]:
proxies: list[str] = []
if os.path.exists(PROXIES_PATH):
with open(file=PROXIES_PATH, mode="r") as file:
proxies = file.read().splitlines()
file.close()
else:
response = requests.request(method="GET", url=PROXIES_URL)
if response.status_code == 200:
proxies = response.text
with open(file=PROXIES_PATH, mode="w") as file:
file.write(proxies)
file.close()
proxies = proxies.split("\n")
return proxies
below is the method I used to sequentially test those proxies:
def sequential_test(proxies_list: list[str]):
if proxies_list:
with requests.Session() as session:
session.headers = HEADERS
for i, proxy in enumerate(proxies_list, 1):
session.proxies = {"http": f"http://{proxy}"}
try:
color = "\033[91m"
start = time.perf_counter()
response = session.get(url=f"http://{DEFAULT_DOMAIN}", timeout=TIMEOUT)
status = f"HTTP ({response.status_code})"
if response.status_code == 200:
color = "\033[92m"
except Exception as exception: # requests.RequestException
status = type(exception).__name__
print(f"{i:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)")
The following is the code I used to test if a proxy is working with the desired website or not. I employed httpx and aiohttp, respectively:
async def is_alive_httpx(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
proxy_mounts = {"http://": httpx.AsyncHTTPTransport(proxy=f"http://{proxy}"),}
async with httpx.AsyncClient(
mounts=proxy_mounts,
timeout=TIMEOUT,
headers=HEADERS,
follow_redirects=True
) as session:
try:
color = "\033[91m"
start = time.perf_counter()
response = await session.send(httpx.Request(method="GET", url=f"http://{domain}"))
status = f"HTTP ({response.status_code})"
if response.status_code == 200:
color = "\033[92m"
except Exception as exception: # httpx.HTTPError
status = type(exception).__name__
print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)"
async def is_alive_aiohttp(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
try:
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=TIMEOUT), headers=HEADERS, trust_env=True,
connector=aiohttp.TCPConnector(ssl_context=None, force_close=True, limit_per_host=5)) as client:
color = "\033[91m"
start = time.perf_counter()
response = await client.get(url=f"http://{domain}", proxy=f"http://{proxy}")
status = f"HTTP ({response.status})"
if response.status == 200:
color = "\033[92m"
except Exception as exception: # aiohttp.ClientError
status = type(exception).__name__
print(status + ":", exception)
finally:
await client.close()
print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:26}\t({time.perf_counter()-start:.2f}s)")
await asyncio.sleep(0.3)
Below is the remainder of the code. You can run it directly by copying it into your environment, (just ensure you have the required packages installed):
async def test_proxies(proxies_list: list[str], func):
if proxies_list:
await asyncio.gather(*[func(ip[0], ip[1]) for ip in enumerate(proxies_list, 1)])
def main():
proxies = random.sample(get_proxies(), 10) # get_proxies()[:10]
start = time.perf_counter()
sequential_test(proxies)
print(f'\nTesting {len(proxies)} proxies with "requests" took {time.perf_counter()-start:.2f} seconds.\n')
start = time.perf_counter()
asyncio.run(test_proxies(proxies, is_alive_httpx))
print(f'\nTesting {len(proxies)} proxies with "httpx" took {time.perf_counter()-start:.2f} seconds.\n')
start = time.perf_counter()
asyncio.run(test_proxies(proxies, is_alive_aiohttp))
print(f'\nTesting {len(proxies)} proxies with "aiohttp" took {time.perf_counter()-start:.2f} seconds.\n')
if __name__ == "__main__":
main()
Here are some of the errors I frequently encounter when using aiohttp, for example:
- ClientProxyConnectionError: Cannot connect to host ssl:default
- [The semaphore timeout period has expired]
- [The remote computer refused the network connection]
- ClientResponse:
- [409 Conflict]
- [407 Proxy Authentication Required]
- ClientOSError:
- [WinError 64] The specified network name is no longer available
- [WinError 1236] The network connection was aborted by the local system
- ServerDisconnectedError: Server disconnected.
To diagnose the error from your aiohttp code, it's important to print the full exception details, not just its name.
In this code, detailed information is printed about what went wrong.