I'm currently working on the download of the form.idx file from sec.gov for the first quarter of 2016. Since I'm only interested in the 10-Ks, I wanted to download the file as a .csv file and delete the useless rows. I tried to filter by the form type but that didn't work out.
My code so far is the following:
import requests
import os
years = [2016]
quarters = ['QTR1']
base_path = '/Users/xyz/Desktop'
current_dirs = os.listdir(path=base_path)
for yr in years:
if str(yr) not in current_dirs:
os.mkdir('/'.join([base_path, str(yr)]))
current_files = os.listdir('/'.join([base_path, str(yr)]))
for qtr in quarters:
local_filename = f'{yr}-{qtr}.csv'
local_file_path = '/'.join([base_path, str(yr), local_filename])
if local_filename in current_files:
print(f'Skipping file for {yr}, {qtr} because it is already saved.')
continue
url = f'https://www.sec.gov/Archives/edgar/full-index/{yr}/{qtr}/form.idx'
r = requests.get(url, stream=True)
with open(local_file_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=128):
f.write(chunk)
r2 = pd.read_csv('/Users/xyz/Desktop/2016-QTR1.csv', sep=";", encoding="utf-8")
r2.head()
filt = (r2 ['Form Type'] == '10-K')
r2_10K = r2.loc[filt]
r2_10K.head()
r2_10K.to_csv('/Users/xyz/Desktop/modified.csv')
The Error message I get is:
Traceback (most recent call last):
File "<ipython-input-5-f84e3f81f3d1>", line 61, in <module>
filt = (r2 ['Form Type'] == '10-K')
File "/Users/xyz/opt/anaconda3/envs/spyder-4.1.5_1/lib/python3.8/site-packages/pandas/core/frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/xyz/opt/anaconda3/envs/spyder-4.1.5_1/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'Form Type'
Maybe there's a way to just delete the rows I don't need in the file? Otherwise, I'm also thankful for any kind of help on that problem.
Many thanks in advance.
Kind regards, Elena
This is the full working code for you, The main issue was on your csv format that you're getting from online, Full code: https://rextester.com/QUGF24653
What I did:
Output: