seaborn.load_dataset results in URLError: <urlopen error [WinError 10060]

2.9k views Asked by At
df = sns.load_dataset("tips") 

I am trying to load dataset using seaborn, which results in the follow URLError:

TimeoutError                              Traceback (most recent call last)
File ~\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py:1348, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1347 try:
-> 1348     h.request(req.get_method(), req.selector, req.data, headers,
   1349               encode_chunked=req.has_header('Transfer-encoding'))
   1350 except OSError as err: # timeout error

File ~\AppData\Local\Programs\Python\Python311\Lib\http\client.py:1282, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
   1281 """Send a complete request to the server."""
-> 1282 self._send_request(method, url, body, headers, encode_chunked)



File ~\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py:241, in urlretrieve(url, filename, reporthook, data)
    224 """
    225 Retrieve a URL into a temporary location on disk.
    226 
   (...)
    237 data file as well as the resulting HTTPMessage object.
    238 """
    239 url_type, path = _splittype(url)
--> 241 with contextlib.closing(urlopen(url, data)) as fp:
    242     headers = fp.info()
    244     # Just return the local path and the "headers" for file://
    245     # URLs. No sense in performing a copy unless requested.



URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

I tried changing the internet connection and also tried unchecking the proxy server in LAN settings

3

There are 3 answers

0
Radha Raman Jha On

I found this to be an issue with my ISP where it is not able to reach host https://raw.githubusercontent.com/ , specific to Jio ISP in India.

Following are the steps you can follow to verify if you are facing this error due to the same issue :-

  1. Verify whether you are connected, and that connection to git hub is successful PS> ssh -T [email protected] A success message would be something like : Hi RadhaRamanJha! You've successfully authenticated, but GitHub does not provide shell access.

  2. Seaborn.load_dataset documentation has information that seaborn is loading the datasets from the following github link "https://github.com/mwaskom/seaborn-data" e.g. dataset diamonds.csv , whose raw file is downloaded by seaborn from this link during execution of seaborn.load_dataset

  3. While I was able to see this code file on github.com, I was not able to open or download the raw file of any file from the dataset. Checked whether my system is able to establish connection to raw.githubusercontent.com . It was failing.

PS> ping raw.githubusercontent.com

Pinging raw.githubusercontent.com [2405:200:1607:2820:41::36] with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 2405:200:1607:2820:41::36:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),
  1. Searching around I found that my ISP 'JIO Internet' had this specific issue of not able to connect to raw.githubusercontent.com . Switched to network of a different ISP ( Airtel Hotspot ) , and now I could ping raw.githubusercontent.com from my new ISP , and the raw file was accessible - after which seaborn also was able to download the file, and the call seaborn.load_dataset in my code succeeded.
0
dzhu_man_dzhi On
  • The error 10060 means you cannot connect to the remote place. There might be several reasons for such an error:
  • Try downloading the data with an alternate method
    • directly with pandas
    • with requests and then load into pandas
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv')
import requests
import io

t = requests.get('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv').text

df = pd.read_csv(io.StringIO(t))
0
VJ aka Vijay On

Check the data cached path using

sns.get_data_home()

which returns something like - 'C:\\Users\\xyz\\AppData\\Local\\seaborn\\seaborn\\Cache'

Then download a copy of the dataset from "https://github.com/mwaskom/seaborn-data" to the cached path. It should resolve the problem.

This method resolved ConnectionResetError