How to capture iterated output variable into list for analysis

Question

How to capture iterated output variable into list for analysis

99 views Asked by RustyShackleford At 07 June 2015 at 05:09

I am trying to parse html text from a number of webpages for sentiment analysis. With the help from community I have been able to iterate over many urls and produce sentiment score based on the textblob library's sentiment analysis and have used the print function successfully to output a score for each url. However I have not been able to achieve, putting the many outputs produced by my return variable into a list so I can use to continue my analysis further by using the stored numbers for calculating averages, and displaying my results in a graph later.

Code with print function:

import requests
import json
import urllib
from bs4 import BeautifulSoup
from textblob import TextBlob



#you can add to this
urls = ["http://www.thestar.com/business/economy/2015/05/19/canadian-consumer-confidence-dips-but-continues-to-climb-in-us-report.html",
        "http://globalnews.ca/news/2012054/canada-ripe-for-an-invasion-of-u-s-dollar-stores-experts-say/",
        "http://www.cp24.com/news/tsx-flat-in-advance-of-fed-minutes-loonie-oil-prices-stabilize-1.2381931",
        "http://www.marketpulse.com/20150522/us-and-canadian-gdp-to-close-out-week-in-fx/",
        "http://www.theglobeandmail.com/report-on-business/canada-pension-plan-fund-sees-best-ever-annual-return/article24546796/",
        "http://www.marketpulse.com/20150522/canadas-april-inflation-slowest-in-two-years/"]


def parse_websites(list_of_urls):
    for url in list_of_urls:
        html = urllib.urlopen(url).read()
        soup = BeautifulSoup(html)
        # kill all script and style elements

        for script in soup(["script", "style"]):
            script.extract()    # rip it out

        # get text
        text = soup.get_text()

        # break into lines and remove leading and trailing space on each
        lines = (line.strip() for line in text.splitlines())
        # break multi-headlines into a line each
        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
        # drop blank lines
        text = '\n'.join(chunk for chunk in chunks if chunk)

        #print(text)

        wiki = TextBlob(text)
        r = wiki.sentiment.polarity

        print r




parse_websites(urls)

output:

>>> 
0.10863027172
0.156074203574
0.0766585497835
0.0315555555556
0.0752548359411
0.0902824858757
>>>

but when I use the return variable to form a list to use the values to work with I get no result, code:

import requests
import json
import urllib
from bs4 import BeautifulSoup
from textblob import TextBlob



#you can add to this
urls = ["http://www.thestar.com/business/economy/2015/05/19/canadian-consumer-confidence-dips-but-continues-to-climb-in-us-report.html",
        "http://globalnews.ca/news/2012054/canada-ripe-for-an-invasion-of-u-s-dollar-stores-experts-say/",
        "http://www.cp24.com/news/tsx-flat-in-advance-of-fed-minutes-loonie-oil-prices-stabilize-1.2381931",
        "http://www.marketpulse.com/20150522/us-and-canadian-gdp-to-close-out-week-in-fx/",
        "http://www.theglobeandmail.com/report-on-business/canada-pension-plan-fund-sees-best-ever-annual-return/article24546796/",
        "http://www.marketpulse.com/20150522/canadas-april-inflation-slowest-in-two-years/"]


def parse_websites(list_of_urls):
    for url in list_of_urls:
        html = urllib.urlopen(url).read()
        soup = BeautifulSoup(html)
        # kill all script and style elements

        for script in soup(["script", "style"]):
            script.extract()    # rip it out

        # get text
        text = soup.get_text()

        # break into lines and remove leading and trailing space on each
        lines = (line.strip() for line in text.splitlines())
        # break multi-headlines into a line each
        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
        # drop blank lines
        text = '\n'.join(chunk for chunk in chunks if chunk)

        #print(text)

        wiki = TextBlob(text)
        r = wiki.sentiment.polarity
        r = []
        return [r]




parse_websites(urls)

output:

Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
>>>

How can I make it so I can work with the numbers and be able to add, subtract, them from list like so [r1, r2, r3...]

Thank you in advance.

Original Q&A

There are 1 answers

**gtalarico** · Accepted Answer · 2015-06-07T21:12:41+00:00

From your code below, you are asking python to return an empty list:

r = wiki.sentiment.polarity

r = []     #creat empty list r
return [r] #return empty list

If I understood your issue correctly, all you have to do is:

my_list = [] #create empty list

   for url in list_of_urls:
    html = urllib.urlopen(url).read()
    soup = BeautifulSoup(html)

    for script in soup(["script", "style"]):
        script.extract()    # rip it out

    text = soup.get_text()

    lines = (line.strip() for line in text.splitlines())
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    text = '\n'.join(chunk for chunk in chunks if chunk)

    wiki = TextBlob(text)
    r = wiki.sentiment.polarity

    my_list.append(r) #add r to list my_list

print my_list

[r1, r2, r3, ...]

Alternatively, you could creat a dictionary with the url as the key

my_dictionary = {}

        r = wiki.sentiment.polarity
        my_dictionary[url] = r

print my_dictionary

{'url1': r1, 'url2 : r2, etc)

print my_dictionary['url1']

r1

A dictionary may make more sense for you, as it would be easier to retrieve, edit, and delete "r", using the url used as a key.

I am kind of new to Python, so hopefully others will correct me if this doesn't make sense...

TechQA.

How to capture iterated output variable into list for analysis

There are 1 answers

Related Questions in LIST

Related Questions in FUNCTION

Related Questions in PYTHON-2.7

Related Questions in PARSING

Related Questions in SENTIMENT-ANALYSIS

Popular Questions

Popular Tags

Trending Questions