How can I make a smooth transition from the Part 1 to Part 2 and to save the results in Part3? So far, I have not been able to parse a scraped url link unless i inserted it into Part 2 myself. Besides, I could not save the output results as the last url link overwrote all the other ones.
import urllib
import mechanize
from bs4 import BeautifulSoup
import os, os.path
import urlparse
import re
import csv
Part 1:
path = '/Users/.../Desktop/parsing/1.html'
f = open(path,"r")
if f.mode == 'r':
contents = f.read()
soup = BeautifulSoup(content
search = soup.findAll('div',attrs={'class':'mf_oH mf_nobr mf_pRel'})
searchtext = str(search)
soup1 = BeautifulSoup(searchtext)
for tag in soup1.findAll('a', href = True):
raw_url = tag['href'][:-7]
url = urlparse.urlparse(raw_url)
p = "http"+str(url.path)
Part 2:
for i in url:
url = "A SCRAPED URL LINK FROM ABOVE"
homepage = urllib.urlopen(url)
soup = BeautifulSoup(homepage)
for tag in soup.findAll('a',attrs={'name':'g_my.main.right.gifts.link-send'}):
searchtext = str(tag['href'])
original = searchtext
removed = original.replace("gifts?send=", "")
print removed
Part 3
i = 0
for i in removed:
f = open("1.csv", "a+")
f.write(removed)
i += 1
f.close
Update 1.After the advice, I still get this: Traceback (most recent call last): File "page.py", line 31, in homepage = urllib.urlopen(url) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen return opener.open(url) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 180, in open fullurl = unwrap(toBytes(fullurl)) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1057, in unwrap url = url.strip() AttributeError: 'ParseResult' object has no attribute 'strip'
In part 1, you keep overwriting
url
with a new URL. You should be using a list and append the URLs to that list:Then, in part 2, you can iterate over
urls
directly. Again,removed
shouldn't be overwritten with each iteration. Also, no need for the variableoriginal
- your searchtext won't be changed by areplace
operation since it returns a new string and leaves the original alone:Then, in part 3, you don't have to open and close the file for each line you're outputting. In fact, you weren't even closing it properly because you didn't call the
close()
method. The proper way is using thewith
statement anyway:Although I don't see how this is a CSV file (only one item per line?)...