I encountered strange error while training my MLP model. I have no clue what to change nor how to fix it. First I've ran it in conda env with follow packages:
- cudatoolkit=11.3
- cudnn=7.6.5
- python=3.7.4
- python-dateutil=2.8.0
- pip=19.2.3
- pytorch=1.11.0
- torchvision==0.12.0
- torchaudio==0.11.0
- pillow==6.1
- dgl-cuda11.3
- numpy=1.19.2
- matplotlib=3.1.0
- tensorboard=1.14.0
- tensorboardx=1.8
- future=0.18.2
- absl-py
- networkx=2.3
- scikit-learn=0.21.2
- scipy=1.3.0
- notebook=6.0.0
- h5py=2.9.0
- mkl=2019.4
- ipykernel=5.1.2
- ipython=7.7.0
- ipython_genutils=0.2.0
- ipywidgets=7.5.1
- jupyter=1.0.0
- jupyter_client=5.3.1
- jupyter_console=6.0.0
- jupyter_core=4.5.0
- plotly=4.1.1
- scikit-image=0.15.0
- requests==2.22.0
- tqdm==4.43.0
I got below error:
Traceback (most recent call last):
File "main_COLLAB_edge_classification.py", line 578, in <module>
main()
File "main_COLLAB_edge_classification.py", line 573, in main
train_val_pipeline(MODEL_NAME, dataset, params, net_params, dirs)
File "main_COLLAB_edge_classification.py", line 308, in train_val_pipeline
epoch_train_loss, optimizer, train_loader, val_loader, test_loader = train_epoch(model, optimizer, device, graph, train_edges, params['batch_size'], epoch, dataset, 4, monet_pseudo)
File "E:\link-prediction-V2\benchmarking\train\train_COLLAB_drnl_edge_classification.py", line 63, in train_epoch_sparse
for subgs, _ in train_loader:
File "F:\Aga\Python38\lib\site-packages\dgl\dataloading\dataloader.py", line 512, in __next__
self._next_non_threaded() if not self.use_thread else self._next_threaded()
File "F:\Aga\Python38\lib\site-packages\dgl\dataloading\dataloader.py", line 507, in _next_threaded
exception.reraise()
File "F:\Aga\Python38\lib\site-packages\dgl\utils\exception.py", line 57, in reraise
raise exception
dgl._ffi.base.DGLError: Caught DGLError in prefetcher.
Original Traceback (most recent call last):
File "F:\Aga\Python38\lib\site-packages\dgl\dataloading\dataloader.py", line 380, in _prefetcher_entry
batch, feats, stream_event = _prefetch(batch, dataloader, stream)
File "F:\Aga\Python38\lib\site-packages\dgl\dataloading\dataloader.py", line 338, in _prefetch
batch = recursive_apply(batch, _record_stream, current_stream)
File "F:\Aga\Python38\lib\site-packages\dgl\utils\internal.py", line 1038, in recursive_apply
return [recursive_apply(v, fn, *args, **kwargs) for v in data]
File "F:\Aga\Python38\lib\site-packages\dgl\utils\internal.py", line 1038, in <listcomp>
return [recursive_apply(v, fn, *args, **kwargs) for v in data]
File "F:\Aga\Python38\lib\site-packages\dgl\utils\internal.py", line 1040, in recursive_apply
return fn(data, *args, **kwargs)
File "F:\Aga\Python38\lib\site-packages\dgl\dataloading\dataloader.py", line 307, in _record_stream
x.record_stream(stream)
File "F:\Aga\Python38\lib\site-packages\dgl\heterograph.py", line 5605, in record_stream
self._graph.record_stream(stream)
File "F:\Aga\Python38\lib\site-packages\dgl\heterograph_index.py", line 290, in record_stream
return _CAPI_DGLHeteroRecordStream(self, to_dgl_stream_handle(stream))
File "F:\Aga\Python38\lib\site-packages\dgl\_ffi\_ctypes\function.py", line 188, in __call__
check_call(_LIB.DGLFuncCall(
File "F:\Aga\Python38\lib\site-packages\dgl\_ffi\base.py", line 65, in check_call
raise DGLError(py_str(_LIB.DGLGetLastError()))
dgl._ffi.base.DGLError: [12:02:11] C:\Users\Administrator\dgl-0.5\src\runtime\ndarray.cc:284: Check failed: td->IsAvailable(): RecordStream only works when TensorAdaptor is available.
I've tried to change versions of cuda and also dgl packages (downgrade and upgrade), but without any luck. Next I decied to quit from conda env and run it using python (which worked for me before) and got same issue. Could you give me any hints that will help to solve my problem.
References to
F:\Aga\Python38\lib\site-packages\dgl\
imply that you have usedpip install --user
, which is not recommended for Conda users because it leads to confusing situations like this. Packages installed at user-level take precedence (see this thread for details) and are not necessarily compatible with other packages in the environment.Consider removing all packages in
F:\Aga\Python38\lib\site-packages
and ensure that all the packages you require are installed in the Conda environment.