Modify (potentially) many URLs within an HTML document in C++

75 views Asked by At

I'm given a string which contains the contents of an HTML document, and I need to modify some of the URLs contained within the document. The URLs which need modification begin with the form:

<script src="https://foo.com/some/variable/path/to/file.js" ...

And must be modified to:

<script src="https://foo.com/some/variable/path/to/NEW/file.js" ...

My current approach has been to use Google's RE2's GlobalReplace function with the regexp:

"(?i)(<script\\s+(?:[^>]+\\s+)?src=[\"']https://foo\\.com/" "(?:.*?/)*?)(.*?\\.js[\"'][^>]*>)"

Which almost works, until I realized that it's possible that the HTML that I'm given might already have some of the URLs modified and some not, the former of which should be left alone.

Question: What's the easiest way to go about modifying the URLs without modifying the ones that have already been modified upstream?

A single pass approach is essential.

0

There are 0 answers