I'm given a string which contains the contents of an HTML document, and I need to modify some of the URLs contained within the document. The URLs which need modification begin with the form:
<script src="https://foo.com/some/variable/path/to/file.js" ...
And must be modified to:
<script src="https://foo.com/some/variable/path/to/NEW/file.js" ...
My current approach has been to use Google's RE2's GlobalReplace function with the regexp:
"(?i)(<script\\s+(?:[^>]+\\s+)?src=[\"']https://foo\\.com/"
"(?:.*?/)*?)(.*?\\.js[\"'][^>]*>)"
Which almost works, until I realized that it's possible that the HTML that I'm given might already have some of the URLs modified and some not, the former of which should be left alone.
Question: What's the easiest way to go about modifying the URLs without modifying the ones that have already been modified upstream?
A single pass approach is essential.