I am a beginner in scraping. I have scraped some data. Here are two problems: All of the data are populated into a single row and every time when I refresh the page, each time the data is saved into the database.
import requests
from django.shortcuts import render, redirect
from bs4 import BeautifulSoup
from .models import Content
toi_r = requests.get("some web site")
toi_soup = BeautifulSoup(toi_r.content, 'html5lib')
toi_headings = toi_soup.findAll("h2", {"class": "entry-title format-icon"})[0:9]
toi_category = toi_soup.findAll("a", {"class": ""})[0:9]
toi_news = []
toi_cat =[]
for th in toi_headings:
toi_news.append(th.text)
for tr in toi_category:
toi_cat.append(tr.text)
#saving the files in database
n = Content()
n.title = toi_news
n.category = toi_cat
n.save()
You're indeed only creating one Django object.
You can use
zip()
to pair up each title and category, and then create objects. (I also took the liberty of shortening the for loops into simple list comprehensions.)As for "every time when I refresh the page, each time the data is saved into the database" - yes, well, the view code is run for every request. You could e.g. check whether a
Content
with the same title exists before creating one to avoid that.