This question is about using git with nbstripout filter, which removes some fields from a jupyter notebook (a JSON file) before storing it under git. The strip out filter is used to minimize conflicts when the same notebook is worked on by several developers.
So the repo configuration to start with:
$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = [email protected]:stas00/fastai_v1.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
[include]
path = ../.gitconfig
$ cat .gitconfig
[filter "nbstripout"]
clean = nbstripout
smudge = cat
required = true
[diff "ipynb"]
textconv = nbstripout -t
$ cat .gitattributes
*.ipynb filter=nbstripout
*.ipynb diff=ipynb
Using this configuration during git diff
or git commit
the notebook is run through a filter that removes json fields that are local (like cell's execution_count
) and will vary from developer to develop.
Now consider a normal situation where the same notebook changes upstream and locally. Trying to sync the local repo with the upstream fails:
$ git pull origin master
From github.com:stas00/fastai_v1
* branch master -> FETCH_HEAD
Updating 1ea49ad..ae0ba93
error: Your local changes to the following files would be overwritten by merge:
dev_nb/004_callbacks.ipynb
Please commit your changes or stash them before you merge.
Aborting
$ git diff dev_nb/004_callbacks.ipynb | wc -l
60
$ git stash
Saved working directory and index state WIP on pull-merge: 1ea49ad Console progress bar
$ git diff dev_nb/004_callbacks.ipynb | wc -l
0
$ git pull origin master
From github.com:stas00/fastai_v1
* branch master -> FETCH_HEAD
Updating 1ea49ad..ae0ba93
error: Your local changes to the following files would be overwritten by merge:
dev_nb/004_callbacks.ipynb
Please commit your changes or stash them before you merge.
Aborting
This shouldn't happen, since git stash
should have stashed away all local changes. I'm not quite sure what exactly happens, but I think git stash
also gets run through a filter and it stashes only the changes showing through the nbstripout filter. So perhaps git stash
doesn't quite bring the files to pre-modified state? Yet, after I disable the filter git diff
shows nothing (and neither before disabling it).
In other words, why git pull
sees a potential conflict and won't merge, even though git diff
shows no local changes exist (but they do in reality, they are just the changes that get stripped via the filter).
At the very least I expect git diff
to show the changes after the stripout filter is disabled but it doesn't.
To make the git stash; gist pull
work I have to disable the filter before running git stash
.
$ nbstripout --uninstall
$ git stash
Saved working directory and index state WIP on pull-merge: 1ea49ad Console progress bar
$ git pull origin master
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), done.
From github.com:stas00/fastai_v1
* branch master -> FETCH_HEAD
1ea49ad..ae0ba93 master -> origin/master
Updating 1ea49ad..ae0ba93
Fast-forward
dev_nb/004_callbacks.ipynb | 1268 ----------------------------------------------------------------------------------------------------------------------------------------
1 file changed, 1268 deletions(-)
and now I have to remember to re-enable the filter:
$ nbstripout --install
Is there a better workflow that doesn't require disabling/enabling the filter for this to work?