I have many mercurial repositories and I try to convert them to git repo.
I used fast-export
https://github.com/frej/fast-export
and everything was good, but some of my mercurial repo have files with russian letters.
It's huge repositories with about 20k commits and many branches
on ubuntu it looks like
docs/
|-- Android
|-- DataContracts
|-- \302\345\355\344\356\360\373
|-- \304\340\362\340\312\356\355\362\360\340\352\362\373
|-- \310\355\361\362\360\363\352\366\350\350
|-- \310\361\365\356\344\355\340\377\ \344\356\352\363\354\345\355\362\340\366\350\377
|-- \317\360\356\362\356\352\356\353\373
`-- \320\345\353\350\347\355\340\377\ \344\356\352\363\354\345\355\362\340\366\350\377
on windows it looks like normal
Get-ChildItem .\docs\
Каталог: C:\temp\mercurial\Ptk\docs
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 12.01.2021 10:11 Android
d----- 12.01.2021 10:11 DataContracts
d----- 12.01.2021 10:11 Вендоры
d----- 12.01.2021 10:11 ДатаКонтракты
d----- 12.01.2021 10:11 Инструкции
d----- 12.01.2021 10:11 Исходная документация
d----- 12.01.2021 10:11 Протоколы
d----- 12.01.2021 10:11 Релизная документация
-a---- 03.03.2021 12:30 0 Вендоры2
inside docs folder i have many documentations in word, pdf and other format
at first i tried to convert with command
~/mercurial/fast-export/hg-fast-export.sh -r ~/mercurial/Ptk -fe ISO-8859-1
but after converting characters were broken
next i tried to rename all files in my repo https://serverfault.com/questions/319070/mercurial-convert-filename-encoding
import sys
for path in sys.stdin:
old = path[:-1] # strip newline
new = old.decode("cp1251").encode("utf-8")
print 'rename "%s" "%s"' % (old, new)
$ hg manifest --all | python rename.py > rename.txt output is
rename ".gitignore" ".gitignore"
rename ".hgignore" ".hgignore"
rename ".hglf/docs/����������/��������� ������������� ��� �� �������/files/android_root.exe" ".hglf/docs/Инструкции/Первичная инициализация ПТК из коробки/files/android_root.exe"
Traceback (most recent call last):
File "rename.py", line 4, in <module>
new = old.decode("cp1251").encode("utf-8")
File "/usr/lib/python2.7/encodings/cp1251.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 12: character maps to <undefined>
I tried to use other decode cp1252
file -ib docs/*
output
inode/directory; charset=binary
inode/directory; charset=binary
inode/directory; charset=binary
inode/directory; charset=binary
inode/directory; charset=binary
inode/directory; charset=binary
inode/directory; charset=binary
inode/directory; charset=binary
next i tried to convert with tortoisehg https://tortoisehg.bitbucket.io/
hg bookmark -r default master
"C:\Program Files\TortoiseHg\hg.exe" push c:\temp\mercurial\converted-repo
after converting characters were broken
I don't want to delete any documentations from my repositories because not only documentations with russian characters inside repo and i have source files with russian characters, don't ask why :)
Could you give me advice how i can convert it to git repo?