pd.read_excel crashes (hangs) on certain files

678 views Asked by At

FOA I've never had pandas crash (freeze, loop infinitely) on me before. Second it's not the files, they were reading well before.

Doing a bit of research I stumbled upon this issue where the cause is traced back to pd._libs.cp36. Another similar this

I looked up my own pd.libs to find diverse .py files like algos.cp38-win.... A couple things fell to mind. First that I upgraded to python 3.8. Environment is called work38 btw But trying on a different environment didn't work

The only other thing is that I installed fbprophet. To install fbprophet I installed pystan. To install pystan I had to run this command as per their docs conda install libpython m2w64-toolchain -c msys2.

There are many guides to installing pystan that encourage you to install in a particular order (first pystan, then numpy cython pandas etc). Idk if there's a reason for this.

In any case, my idea is that the code above f***ed up my whole anaconda environment with some c compilers and now pandas is broken in all environments, even if I pip uninstall & pip install --no-cache-dir pandas.

2 Questions: First one is, do you know what's happening here? Could you explain me? And second, any idea how I can repair this? Or must I uninstall anaconda an reinstall everything (then of course pip install -r requirements.txt)

Edit: Maybe the C compiler stuff is unrelated. I just let read_excel run for a painful amount of time and it returned a dataframe with 65000 rows and 250 columns. I see that when I convert the xlsx to csv (with a CLI script) the csv contains a bunch of empty rows and columns.

TLDR: I have an xlsx file with 250 rows and ~20 columns but apparently the empty cells aren't empty?

0

There are 0 answers