DVC Files Incomplete

1.2k views Asked by At

I'm in a team using dvc with git to version-control data files. We are using dvc 1.3.1, with the an S3 bucket remote. I'm getting this error when executing dvc fetch or dvc pull on a colleague's branch:

ERROR: failed to fetch data from the cloud - DVC-file 'C:\Users\blah\Documents\repo\data\processed_data.dvc' format error: extra keys not allowed @ data['outs'][0]['size']

When I check the dvc file for a cached file with which I have no problem I see this:

md5: ded591aacbe363f0518ceb9c3bc1836b
outs:
- md5: efdab20e8b59903b9523cc188ff727e5
  path: completion_header.p
  cache: true
  metric: false
  persist: false

but a problematic file only has this:

outs:
- md5: f4e15187d9a0bbb328e629eabd8d1784.dir
  size: 112007
  nfiles: 3
  path: processed_data

In all cases, files are added to dvc with the command dvc add %dirname%. This is the second time I've seen this on a colleague's branch (2 different people).

Since posting, I have realized that my colleague dvc'd a directory. I have attempted creating the directory first, then calling dvc fetch, but get the same error.

1

There are 1 answers

4
Batuhan Taskaya On BEST ANSWER

In all cases, files are added to dvc with the command dvc add %filename%.

It seems like there is a high chance that one of the dvc files created in newer versions of dvc and you are trying to operate with an older version. Are all of your colleagues use the same dvc version when adding new files?