I'm wanting to use Regex to get a specific file (e.g. package-lock.json) out of a git diff. The reason for this approach is because I'm getting a whole git diff via the Github API (Using Octocat js), therefore I can't just run the git diff on that specific file. (As far as I'm aware). Obviously the diff on a file like package-lock.json
is very large so there's a lot of content). What I've noticed is that when I try to use a regular expression to get this content out it fails due to catastrophic backtracking.
Essentially the file structure looks like this
diff --git a/package-lock.json b/package-lock.json
lots of content
diff --git a/next-file b/next-file
Therefore my idea was to get everything between the two diff --git
strings.
I figured I could just use this /(?<=diff --git )(.+?)(?=diff)/gs
This works fine if the lookahead is not too far ahead, but after a long way through the file this stops working due to catastrophic backtracking.
I get why this is happening but just don't get how to get around it. Perhaps I should be sorting this some other way and just using Regex for more specific details?
Any help would be appreciated.
You're working with lines of data, and regexes don't work well like that, as you've found out. Use a tool like
awk
that can find ranges of lines.Give this file foo.txt:
use awk to specify a range of lines you want to print: