Twitter API: How to parse URL's out of the text of the tweet using the given list of API's

2k views Asked by At

So I'm working with Python and the Twitter API, using Tweepy and Twitter's Stream API, which returns Tweet objects in real-time. Part of my app which queries a different API doesn't play nice with URLS in the tweet text, so I'm using the Python re module to replace them with a harmless identifier string. However, I'm having trouble finding the urls that need to be parsed out of the text. Instead of having to search through the text myself for URLS, I decided to use the ones that the API delivers and do a "find and replace" in the text.

Here is the documentation on what the API gives me. It gives a t.co url, a display url, and a fully expanded url. The problem with just using the t.co url is that twiter doesn't automatically convert all urls in tweets to t.co, only ones past a certain length. This means that the t.co url isn't always the same one that appears in the tweet text.

So I need to figure out how to get, from the API, the version of the URL which actually appears in the text of the tweet.

Thanks! evamvid

1

There are 1 answers

0
Leb On BEST ANSWER

Try using this for the extended_url:

tweet_url = str(tweet.expanded_url) # you might not need str(), 
#test it yourself if you'd like.

# Replace tweet by the loop/function you have the json extracted with

tweet_url = tweet_url.replace('\\', '')

print(tweet_url)

That should you give you the link without the way you want it.