Adding pandas dependencies after kedro new

1.2k views Asked by At

I began a new project with kedro new without adding the files from the iris example. The original requirements.txt looked like:

black==v19.10b0
flake8>=3.7.9, <4.0
ipython~=7.0
isort>=4.3.21, <5.0
jupyter~=1.0
jupyter_client>=5.1, < 7.0
jupyterlab==0.31.1
kedro==0.16.6
nbstripout==0.3.3
pytest-cov~=2.5
pytest-mock>=1.7.1, <2.0
pytest~=5.0
wheel==0.32.2

I then ran kedro install to install the packages, generating requirements.in and requirements.txt. I now want to install the necessary dependencies for working with pandas and csv files. I tried updating the requirements.in with the line: kedro[pandas]==0.16.6 and then executing kedro install --build-reqs. However, that line fails with the error:

Could not find a version that matches pyarrow<1.0.0,<2.0dev,>=0.12.0,>=1.0.0 (from kedro[pandas]==0.16.6->-r /lrlhps/data/busanalytics/Guilherme/Projects/kedro-environment/spaceflights/src/requirements.in (line 8))
Tried: 0.9.0, 0.10.0, 0.11.0, 0.11.1, 0.12.0, 0.12.1, 0.13.0, 0.14.0, 0.15.1, 0.16.0, 0.16.0, 0.16.0, 0.16.0, 0.17.0, 0.17.0, 0.17.0, 0.17.0, 0.17.1, 0.17.1, 0.17.1, 0.17.1, 1.0.0, 1.0.0, 1.0.0, 1.0.0, 1.0.1, 1.0.1, 1.0.1, 1.0.1, 2.0.0, 2.0.0, 2.0.0
There are incompatible versions in the resolved dependencies:
  pyarrow<2.0dev,>=1.0.0 (from google-cloud-bigquery[bqstorage,pandas]==2.2.0->pandas-gbq==0.14.0->kedro[pandas]==0.16.6->-r /Projects/kedro/spaceflights/src/requirements.in (line 8))
  pyarrow<1.0.0,>=0.12.0 (from kedro[pandas]==0.16.6->-r /Projects/kedro/spaceflights/src/requirements.in (line 8))

Question: Is it possible to update requirements.in and have the pandas dependencies installed with the --build-reqs option? Or must I install the dependency with pip?

1

There are 1 answers

0
André Mello On

You should be able to install pandas by adding which specific components you wish to use, as exemplified in the documentation:

The dependencies above may be sufficient for some projects, but for the spaceflights project, you need to add a requirement for the pandas project because you are working with CSV and Excel files. You can add the necessary dependencies for these files types as follows:

kedro[pandas.CSVDataSet,pandas.ExcelDataSet]==0.17.0

https://kedro.readthedocs.io/en/stable/03_tutorial/02_tutorial_template.html#add-and-remove-project-specific-dependencies

For instance, after adding

kedro[pandas.CSVDataSet]==0.17.0

to your requirements.in and issuing a kedro build-reqs, you should see

kedro[pandas.csvdataset]==0.17.0  # via -r /.../src/requirements.in
(...)
pandas==1.2.0                     # via kedro

in your requirements.txt file.