Cut and resubmit url in python

128 views Asked by At

I'm new to python and trying to figure this out, so sorry if this has been asked. I couldn't find it and don't know what this may be called.

So the short of it. I want to take a link like:

http://www.somedomainhere.com/embed-somekeyhere-650x370.html

and turn it into this:

http://www.somedomainhere.com/somekeyhere

The long of it, I have been working on an addon for xbmc that goes to a website, grabs a url, goes to that url to find another url. Basically a url resolver.

So the program searches the site and comes up with somekeyhere-650x370.html. But that page is in java and is unusable to me. but when I go to com/somekeyhere that code is usable. So I need to grab the first url, change the url to the usable page and then scrape that page.

So far the code I have is

if 'somename' in name:
try:
  n=re.compile('<iframe title="somename" type="text/html" frameborder="0" scrolling="no" width=".+?" height=".+?" src="(.+?)">" frameborder="0"',re.DOTALL).findall(net().http_GET(url).content)[0]
CONVERT URL to .com/somekeyhere SO BELOW NA CAN READ IT.
  na = re.compile("'file=(.+?)&.+?'",re.DOTALL).findall(net().http_GET(na).content)[0]

Any suggestions on how I can accomplish converting the url?

1

There are 1 answers

2
Vasif On BEST ANSWER

I really didn't get the long of your question. However, answering the short

Assumptions: somekey is a alphanumeric

a='http://www.domain.com/embed-somekey-650x370.html'
p=re.match(r'^http://www.domain.com/embed-(?P<key>[0-9A-Za-z]+)-650x370.html$',a)
somekey=p.group('key')
requiredString="http://www.domain.com/"+somekey #comment1

I have really provided a very specific answer here for just the domain name. You should modify the regex as required. I see your code in question uses regex and hence i assume you can frame a regex to match your requirement better.

EDIT 1 : also see urlparse from here https://docs.python.org/2/library/urlparse.html?highlight=urlparse#module-urlparse

It provides an easy way to get to parse your url

Also, in line with "#comment1" you can actually save the domain name to a variable and reuse it here