I am building a python script that will run regularly and alert me if there is a change to prices on a website. I have gotten pretty far along in my newbie approach thanks to a lot of other posts but I have a little issue with the final hurdle.

I review the pandas documentation and found a few things that should have allowed me to remove the row that contained empty cells but I never got it to work.

pd.set_option('display.width', 800)
df = pd.read_html(url)

with pd.option_context('display.max_rows', 0, 'display.max_columns', 3):

    df[0].replace(to_replace=r' €', value='', regex=True).replace(to_replace=r'^A.*', value='', regex=True).fillna(method='bfill').to_csv("mac0.csv", index = False)

Currently I am living with it but I feel like if I can get this to not print the empty rows it would allow me to apply the same technique to other websites I need to pull data off of.

An image of what I currently get as output: as of posting

1 Answers

Community On

Use pandas dropna:

df = df.dropna()

This method allow to drop (delete) cells with NaN elements. If the values are not NaN but things like empty string, 0s, or whatever, just use the replace method to put NaN in those cells:

df = df.replace(yourvalue, NaN) #maybe should be numpy.nan? check your libraries to see the syntax of nan values


Read the docs for better understanding of this method, as it allow to drop rows where at least a element is missing, or if all of them are missing, or only if a value of certain column is missings.