# Here's the first try:
# Create a custom function for merging the data together:
def getXDataMerged():
print('Income Statement CSV data is(rows, columns): ', df1.shape)
print('Balance Sheet CSV data is: ', df2.shape)
print('Cash Flow CSV data is: ' , df3.shape)
# Merge the data together
result = pd.merge(df1, df2, on=['Ticker', 'SimFinId', 'Currency',
'Fiscal Year', 'Fiscal Period', 'Report Date', 'Publish Date'], how='inner')
result = pd.merge(result, df3, on=['Ticker','SimFinId','Currency',
'Fiscal Year','Report Date','Publish Date'])
print('Merged X data matrix shape is: ', result.shape)
return result
# Use getXDataMerged() to retrieve some data, and then save it to a CSV file named "Annual_Stock_Price_Fundamentals.csv"
X = getXDataMerged()
X.to_csv("Annual_Stock_Price_Fundamentals.csv")
# Output for first try:
Income Statement CSV data is(rows, columns): (17185, 28)
Balance Sheet CSV data is: (17185, 30)
Cash Flow CSV data is: (17185, 28)
Merged X data matrix shape is: (0, 73)
# Second try (only changed the merging method to 'outer', everything else stays the same:
# Merge the data together
result = pd.merge(df1, df2, on=['Ticker', 'SimFinId', 'Currency',
'Fiscal Year', 'Fiscal Period', 'Report Date', 'Publish Date'], how='outer')
result = pd.merge(result, df3, on=['Ticker','SimFinId','Currency',
'Fiscal Year','Report Date','Publish Date'])
# Output for second try:
Income Statement CSV data is(rows, columns): (17185, 28)
Balance Sheet CSV data is: (17185, 30)
Cash Flow CSV data is: (17185, 28)
Merged X data matrix shape is: (34370, 73)
Problem during merge: either gets no data output, or simply merges them by doubling the column in data
59 views Asked by Dun AtThere are 2 answers
I can't see your data, so it's hard to tell where excatly the problem is, but when using 'inner' statement of pd.merge() method, new dataframe will have only matching keys from both dataframe objects, you trying to merge. It's always good to use 'inner' if you have same keys in both frames etc., from what I understand, your data is not replicable and you have different values across frames you trying to merge, that's why you have no values after merge. Eg. column 'SimFindID' is not the same, for both dataframes, that's why it's not matching results. Try to use less columns to merge on.
In terms of 'outer' method of merge, story is the same, but it gives you "not matching" rows of data, so that's why you will not sort it in any way, as all rows are different for both frames.
If what you want to achieve is just adding to frames together, use 'cross' option of pd.merge() method or look if you have any matching keys in both dataframes, and pick only those columns that are matching.
If you need more help, please add your data for me to look at it
My method:
Output: