Replace incorrect urls in text file and fix them in Python

270 views Asked by At

I'm getting URLS with removed forward-lashes and I basically need to correct the urls inside of a text file.

The URLs in the file look like this:

https:www.ebay.co.ukitmReds-Challenge-184-214-Holo-Shiny-Rare-Pokemon-Card-SM-Unbroken-Bonds-Rare124315281970?hash=item1cf1c4aa32%3Ag%3AXBAAAOSwJGRfSGI1&LH_BIN=1

I need to correct it to:

https://www.ebay.co.uk/itm/Reds-Challenge-184-214-Holo-Shiny-Rare-Pokemon-Card-SM-Unbroken-Bonds-Rare/124315281970?hash=item1cf1c4aa32%3Ag%3AXBAAAOSwJGRfSGI1&LH_BIN=1

So basically I need a regex or another way that will edit in those forwardslashes to each URL within the file and replace and the broken URLs in the file.

1

There are 1 answers

0
Unstoppable On BEST ANSWER
while True:
    import time
    import re
    #input file
    fin = open("ebay2.csv", "rt")
    #output file to write the result to
    fout = open("out.txt", "wt")


    #for each line in the input file
    for line in fin:
        #read replace the string and write to output file
        fout.write(line.replace('https://www.ebay.co.uk/sch/', 'https://').replace('itm', '/itm/').replace('https:www.ebay','https://www.ebay'))

    with open('out.txt') as f:
      regex = r"\d{12}"
      subst = "/\\g<0>"
      for l in f:
          result = re.sub(regex, subst, l, 0, re.MULTILINE)
          if result:
              print(result)

    fin.close()
    fout.close()
    time.sleep(1)

I eventually came up with this. It's a bit clumsy but it does the job fast enough.