How to tell git-svn that files are NOT binary

588 views Asked by At

I am converting a large SVN-repo (~28k Commits) into Git using git-svn. When the process was through (~ 1 1/2 weeks) I encountered some .ps1-files being treated as binary in the diffs. I have commited a .gitignore file on master (after the conversion of course) that tells git to treat the files as text:

* -text
*.snk binary
*.ico binary
*.chm binary
and so on...

Note: -text just tells it to treat line endings as is (not storing them with Unix line endings in its database)

However I was only able to add the .gitattributes AFTER the conversion of course.

The .gitattributes does not quite work out in Git-extensions for previous commits/diffs but that might be another problem that I don't want to discuss right now. The main problem is the files that are stored binary in git.

I read somewhere that you could store a gitattributes under .git/INFO. I could do this before the conversion process but I haven't given that another try since the conversion takes more than a week and I would like to get it right at the first strike.

So basically my question is now the following: With the converted repository in mind: Can I convert existing binary files in an existing git repository to text files?

If not: How would I tell git-svn which files are to be treated as text/binary using gitattributes for the whole conversion procedure?

EDIT: The problem was not anything in the conversion (git treating files as binary on purpose) but files being treated as binary by 'git diff' or Git-extensions. (see answer) When using a diff tool (for example BeyondCompare) you can still work with those files. It is then only a little annoying in the history. The conversion worked flawless since the files were migratet as is. (UTF-16 that is)

1

There are 1 answers

1
ulrichb On BEST ANSWER

The binary attribute "macro" is a shorthand for -diff -merge -text (see gitattributes docs).

In opposite to the the text-attribute which influences the line ending conversion of files between the repository and the working copy version, the diff and merge-attributes do not influence how Git stores files. The latter two attributes just influence how Git interprets file contents (e.g. how Git creates a diff/patch for a file).

If you have no explicit .ps1 binary, or .ps1 -diff in your .gitattributes, the reason why git diff interprets your .ps1 files as binaries is probably because of their encoding. Note that Git doesn't support UTF-16/UCS-2, for example.

If this is the case, you can either create a custom "diff driver" which converts the files to UTF-8 (like proposed in this answer).

Or, as the Windows PowerShell can also cope with UTF-8 script files, you could convert all your .ps1 files to UTF-8. (If you want to convert files in existing/migrated commits, you could use git-filter-branch.)