How to cycle through indices

52 views Asked by At

so in this script I am writing to learn python, I would like to just put a wildcard instead of rewriting this whole block just to change line 2. what would be the most efficient way to consolidate this into a loop, where it will just use all d.entries[0-99].content and repeat until finished? if, while, for? also my try /except does not perform as expected what gives?

import feedparser, base64
from urlextract import URLExtract 

d = feedparser.parse('https://www.reddit.com/r/PkgLinks.rss')

print (d.entries[3].title)
sr = str(d.entries[3].content)
spl1 = sr.split("<p>")
ss = str(spl1)
spl2 = ss.split("</p>")
try:
    st = str(spl2[0])
#    print(st)
except: 
    binascii.Error
    st = str(spl2[1])
    print(st)
#st = str(spl2[0])
spl3 =st.split("', '")
stringnow=str(spl3[1])
b64s1 = stringnow.encode('ascii')
b64s2 = base64.b64decode(b64s1)
stringnew = b64s2.decode('ascii')

print(stringnew)
## but line 15 does nothing, how to fix and also loop through all d.entries[?].content
2

There are 2 answers

1
TomasO On

The loop part is done simply by doing the following"

import feedparser, base64
from urlextract import URLExtract 

d = feedparser.parse('https://www.reddit.com/r/PkgLinks.rss')

# loop from 0 to 99
# range(100) goes from 0 and up to and not including 100
for i in range(100):
    print (d.entries[i].title)
    sr = str(d.entries[i].content)

    << the rest of your code here>>

The data returned from d.entries[i].content is a dictionary but you are converting to a string so you may want to see if you are doing what you really want too. Also when you use .split() it produces a list of the split items but you convert to a string once again (a few time). You may want to relook at that part of the code.

0
TomasO On

I haven't used regex much but decided to just to play and got this to work. I retrieved the contents of the 'value' key from the dictionary. Then used regex to get the base64 info. I only tried it for the first 5 items (i.e., I changed range(100) to range(5). Hope it helps. If not, I enjoyed doing this. Oh, I left all of the print statements I used as I was working down the code.

import feedparser, base64
from urlextract import URLExtract
import re

d = feedparser.parse('https://www.reddit.com/r/PkgLinks.rss')

for i in range(100):
    print (d.entries[i].title)
    # .contents is a list.
    # print("---------")
    # print (type(d.entries[i].content))
    print (d.entries[i].content)
    print("---------")


    # gets the contents of key 'value' in the dictionary that is the 1st item in the list.
    string_value = d.entries[3].content[0]['value']
    print(string_value)
    print("---------")

    # this assumes there is always a space between the 1st </p> and the 2nd <p>
    # grabs text between using re.search
    pattern = "<p>(.*?)</p>"
    substring = re.search(pattern, string_value).group(1)
    print(substring)
    print("---------")
    print("---------")
    print("---------")

# rest of your code here