All the datas are populated into a single row

57 views Asked by At

I am a beginner in scraping. I have scraped some data. Here are two problems: All of the data are populated into a single row and every time when I refresh the page, each time the data is saved into the database.

import requests
from django.shortcuts import render, redirect
from bs4 import BeautifulSoup
from .models import Content

toi_r = requests.get("some web site")
toi_soup = BeautifulSoup(toi_r.content, 'html5lib')
toi_headings = toi_soup.findAll("h2", {"class": "entry-title format-icon"})[0:9]
toi_category = toi_soup.findAll("a", {"class": ""})[0:9]
toi_news = []
toi_cat =[]

for th in toi_headings:
    toi_news.append(th.text)

for tr in toi_category:
    toi_cat.append(tr.text)

#saving the files in database
n = Content()
n.title = toi_news
n.category = toi_cat
n.save()
1

There are 1 answers

2
AKX On

You're indeed only creating one Django object.

You can use zip() to pair up each title and category, and then create objects. (I also took the liberty of shortening the for loops into simple list comprehensions.)

toi_news = [th.text for th in toi_headings]
toi_cat = [tr.text for tr in toi_category]

for title, category in zip(toi_news, toi_cat):
    n = Content.objects.create(title=title, category=category)

As for "every time when I refresh the page, each time the data is saved into the database" - yes, well, the view code is run for every request. You could e.g. check whether a Content with the same title exists before creating one to avoid that.