I'm in a team using dvc with git to version-control data files. We are using dvc 1.3.1, with the an S3 bucket remote. I'm getting this error when executing dvc fetch
or dvc pull
on a colleague's branch:
ERROR: failed to fetch data from the cloud - DVC-file 'C:\Users\blah\Documents\repo\data\processed_data.dvc' format error: extra keys not allowed @ data['outs'][0]['size']
When I check the dvc file for a cached file with which I have no problem I see this:
md5: ded591aacbe363f0518ceb9c3bc1836b
outs:
- md5: efdab20e8b59903b9523cc188ff727e5
path: completion_header.p
cache: true
metric: false
persist: false
but a problematic file only has this:
outs:
- md5: f4e15187d9a0bbb328e629eabd8d1784.dir
size: 112007
nfiles: 3
path: processed_data
In all cases, files are added to dvc with the command dvc add %dirname%
. This is the second time I've seen this on a colleague's branch (2 different people).
Since posting, I have realized that my colleague dvc'd a directory. I have attempted creating the directory first, then calling dvc fetch
, but get the same error.
It seems like there is a high chance that one of the dvc files created in newer versions of dvc and you are trying to operate with an older version. Are all of your colleagues use the same dvc version when adding new files?