I am converting a large SVN-repo (~28k Commits) into Git using git-svn. When the process was through (~ 1 1/2 weeks) I encountered some .ps1-files being treated as binary in the diffs. I have commited a .gitignore file on master (after the conversion of course) that tells git to treat the files as text:
* -text
*.snk binary
*.ico binary
*.chm binary
and so on...
Note: -text just tells it to treat line endings as is (not storing them with Unix line endings in its database)
However I was only able to add the .gitattributes AFTER the conversion of course.
The .gitattributes does not quite work out in Git-extensions for previous commits/diffs but that might be another problem that I don't want to discuss right now. The main problem is the files that are stored binary in git.
I read somewhere that you could store a gitattributes under .git/INFO. I could do this before the conversion process but I haven't given that another try since the conversion takes more than a week and I would like to get it right at the first strike.
So basically my question is now the following: With the converted repository in mind: Can I convert existing binary files in an existing git repository to text files?
If not: How would I tell git-svn which files are to be treated as text/binary using gitattributes for the whole conversion procedure?
EDIT: The problem was not anything in the conversion (git treating files as binary on purpose) but files being treated as binary by 'git diff' or Git-extensions. (see answer) When using a diff tool (for example BeyondCompare) you can still work with those files. It is then only a little annoying in the history. The conversion worked flawless since the files were migratet as is. (UTF-16 that is)
The
binary
attribute "macro" is a shorthand for-diff -merge -text
(see gitattributes docs).In opposite to the the
text
-attribute which influences the line ending conversion of files between the repository and the working copy version, thediff
andmerge
-attributes do not influence how Git stores files. The latter two attributes just influence how Git interprets file contents (e.g. how Git creates a diff/patch for a file).If you have no explicit
.ps1 binary
, or.ps1 -diff
in your.gitattributes
, the reason whygit diff
interprets your.ps1
files as binaries is probably because of their encoding. Note that Git doesn't support UTF-16/UCS-2, for example.If this is the case, you can either create a custom "diff driver" which converts the files to UTF-8 (like proposed in this answer).
Or, as the Windows PowerShell can also cope with UTF-8 script files, you could convert all your
.ps1
files to UTF-8. (If you want to convert files in existing/migrated commits, you could use git-filter-branch.)