We received a large patch with about 17000 files modified. Its size is 5.2G. When applying the patch with git apply -3
, it didn't finish after 12 hours.
We split the patch into smaller patches per file and applied them one by one, so that at least we could see the progress.
Once again, it got stuck at one of the file patches, which is still as large as 111M. It modifies an HTML file.
We split this file patch into smaller patches per chunk and got about 57000 chunk patches. Each chunk patch takes around 2-3 seconds so it would take more time than applying the file patch. I'll try splitting it by more chunks.
Is there any method to efficiently apply such large patches? Thanks.
Update:
As @ti7 suggested, I tried patch
and it solved the problem.
In my case, we have 2 kinds of large patches.
One is adding/removing a large binary and the content of the binary is contained as text in the patch. One of the binaries is 188M and the patch size that removes it is 374M.
The other is modifying a large text and has millions of deletions and insertions. One of the text files is 70M before and 162M after. The patch size is 181M and has 2388623 insertions and 426959 deletions.
After some tests, I think here "large" describes the number of the insertions and deletions.
For the binary patch,
- git apply -3, 7 seconds
- git apply, 6 seconds
- patch, 5 seconds
For the text patch,
- git apply -3, stuck, not finished after 10 minutes
- git apply, stuck, not finished after 10 minutes
- patch, 3 seconds
The binary has only 1 insertion and/or 1 deletion. git apply
or patch
can finish in seconds. All are acceptable.
The text has too many insertions and deletions. Obviously, patch
is much better in this case. I read some posts on patch
and got to know that some versions of patch
could not work with adding/removing/renaming a file. Luckily, the patch
on my machine works well.
So we split the all-in-one patch into smaller patches per file. We try timeout 10s git apply -3 file_patch
first. If it cannot finish in 10 seconds, try timeout 10s patch -p1 < file_patch
.
At last, it took about 1 and a half hours to apply all the 17000 patches. It's much better than applying the all-in-one patch and getting stuck for 12 hours with nothing done.
And I also tried patch -p1 < all_in_one_patch
. It took only 1m27s. So I think we can improve our patch flow further more.
You may be able to use
patch
(Wikipedia) instead ofgit apply
to speed up patching!To my knowledge,
patch
directly spools out a new file by-lines, splicing in the changes as it goes, whilegit apply
does additional context checking (and as @j6t notes in a comment, though I haven't confirmed it, will attempt to load and patch the entire file at once before writing it out)