Replace matches of one regex expression with matches from another, across two files

137 views Asked by At

I am currently helping a friend reorganise several hundred images on a database driven website. I have generated a list of the new, reorganised image paths offline and would like to replace each matching image reference in the sql export of the database with the new paths.

EDIT: Here is an example of what I am trying to achieve

The new_paths_list.txt is a file that I generated using a batch script after I had organised all of the existing images into folders. Prior to this all of the images were in just a few folders. A sample of this generated list might be:

image/data/product_photos/telephones/snom/snom_xyz.jpg
image/data/product_photos/telephones/gigaset/giga_xyz.jpg

A sample of my_exported_db.sql (the database exported from the website) might be:

...

,(110,32,'data/phones/snom_xyz.jpg',3),(213,50,'data/telephones/giga_xyz.jpg',0),

...

The result I want is my_exported_db.sql to be:

...

,(110,32,'data/product_photos/telephones/snom/snom_xyz.jpg',3),(213,50,'data/product_photos/telephones/gigaset/giga_xyz.jpg',0),

...

Some pseudo code to illustrate:

1/ Find the first image name in my_exported_db.sql, such as 'snom_xyz.jpg'.

2/ Find the same image name in new_paths_list.txt

3/ If it is present, copy the whole line (the path and filename)

4/ Replace the whole path in in my_exported_db.sql of this image with the copied line

5/ Repeat for all other image names in my_exported_db.sql

A regex expression that appears to match image names is:

([^)''"/])+\.(?:jpg|jpeg|gif|png)

and one to match image names, complete with path (for relative or absolute) is:

\bdata[^)''"\s]+\.(?:jpg|jpeg|gif|png)

I have looked around and have seen that Sed or Awk may be capable of doing this, but some pointers would be greatly appreciated. I understand that this will only work accurately if there are no duplicated filenames.

1

There are 1 answers

4
Beta On BEST ANSWER

You can use sed to convert new_paths_list.txt into a set of sed replacement commands:

sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt > rules.sed

The file rules.sed will look like this:

s#data/snom_xyz.jpg#image/data/product_photos/telephones/snom/snom_xyz.jpg#
s#data/giga_xyz.jpg#image/data/product_photos/telephones/gigaset/giga_xyz.jpg#

Then use sed again to translate my_exported_db.sql:

sed -i -f rules.sed my_exported_db.sql

I think in some shells it's possible to combine these steps and do without rules.sed:

sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt | sed -i -f - my_exported_db.sql

but I'm not certain about that.

EDIT<:

If the images are in several directories under data/, make this change:

sed "s|image/\(.*\(/[^/]*$\)\)|s#[^']*\2#\1#|" new_paths_list.txt > rules.sed