Extracting Numbers from Yahoo Financial statement, willing to pay some money through Pay Pal

458 views Asked by At

I am trying to extract financial data from yahoo finance using python. Below there is a link to an image that shows in circles which data I am trying to retrieve. It has the organization of the data table however I do not know where to begin with the givens shown in the picture.

This is the image of the code location of the numbers I'm trying to extract from yahoo finance, with the table name and td tickers.

enter image description here

I realize that I must somehow use the td tickers to find the numbers that I need for the extraction however Im not sure what are the basics commands that I need to implement.

This is a link to an example of the the data table that I'm trying to scrape

1

There are 1 answers

9
宏杰李 On BEST ANSWER

The page you scraped is rendered by JavaScript, requests and urllib can not handle JavaScript. I recommend you using selenium and BeautifulSoup to extract data. This is when JavaScript is disabled: enter image description here

the data you wanted is in this url :

http://financials.morningstar.com/ajax/ReportProcess4HtmlAjax.html?&t=XNAS:AAPL&region=usa&culture=en-US&ops=clear&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=378724&callback=jsonp1482077238548&_=1482077239651

i put it in the bs4, you can get the data by you own:

import requests, bs4, json

r = requests.get('http://financials.morningstar.com/ajax/ReportProcess4HtmlAjax.html?&t=XNAS:AAPL&region=usa&culture=en-US&ops=clear&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=378724&callback=jsonp1482077238548&_=1482077239651')

js = r.text.strip('jsonp1482077238548()')
html_str = json.loads(js)['result']
soup = bs4.BeautifulSoup(html_str, 'lxml')

out:

<html>
 <body>
  <div id="baseline" style="display:none">
   <div>
    156508000000
   </div>
   <div>
    170910000000
   </div>
   <div>
    182795000000
   </div>
   <div>
    233715000000
   </div>
   <div>
    215639000000
   </div>
   <div>
    215639000000
   </div>
  </div>
  <div class="left ">
   <div class="r_xcmenu rf_table_left">
    <div class="rf_header ">
     <div class="lbl " currency="USD" fiscalyearend="September" fyenumber="9" id="unitsAndFiscalYear">
     </div>
    </div>
    <div class="rf_crow1" id="label_i1" style="_height:16px; _float:none;">
     <div class="lbl">
      Revenue
     </div>
     <div class="chart_contain_free" id="chart_i1">
      <div class="chart_icon">
      </div>
     </div>
    </div>