Linked Questions

Popular Questions

Try to merge excel sheets with python and pandas

Asked by At

i am not a big crack with python... and need some help please. I got the job to merge 2 excel files many times a day. The first file is the basic file, and from the second file the missing cells in the first file should be filled up, if they are empty. Both files have the same index in the first column, and both files having headers.

File one looks like | Index | B | C | D | |-------|---|---|---| | 1 | B1| | D1| | 2 | B2| C2| |

File two looks like | Index | B | C | D | |-------|---|---|---| | 1 | B1| C1| | | 2 | C2| D2| |

The merged file should look like | Index | B | C | D | |-------|---|---|---| | 1 | B1| C1| D1| | 2 | B2| C2| D2|

But with the code i have the result is | Index | B | C | D | |-------|---|---|---| | 1a | B1| | D1| | 2a | B2| C2| | | 1b | B1| C1| | | 2b | C2| D2| |

It looks like it is adding the rows just under the other rows, instead of merging them.

I have the following python code:

import pandas as pd
import os

inputFolder = r'd:/testin'
outputFolder = r'd:/testout'

file1 = pd.read_excel(os.path.join(inputFolder, 'input1.xlsx'),index_col=0)
file2 = pd.read_excel(os.path.join(inputFolder, 'input2.xlsx'),index_col=0)

merged = pd.concat([file1, file2]).drop_duplicates(keep='first')

merged.to_excel(os.path.join(outputFolder, 'merged.xlsx'), index=True)

I guess the "error" is in the concat line, but i am not able to get it working. If you have some tipps for me, i would be happy.

I have tried to change the merged line to

merged = pd.merge(file1, file2, how='inner', right_index=True)

But this does throw an error

Related Questions