I want to concat two tables based on the same index. But the table format is different. How to transfer them to the same format and then concat?

one table is like this:

close     date

0  1658.92   2009-02-01

1  1835.84   2009-03-01

2  2057.33   2009-04-01

3  2120.32   2009-05-01

4  2174.52   2009-06-01

5  2348.48   2009-07-01

6  2378.73   2009-08-01

7  2510.82   2009-09-01

8  2417.32   2009-10-01

9  2532.77   2009-11-01

10  2684.40  2009-12-01

the other is like this:

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec year

0   7.8 8.3 8.7 9.0 9.4 9.5 9.5 9.6 9.8 10.0    9.9 9.9 2009

1   9.8 9.8 9.9 9.9 9.6 9.4 9.4 9.5 9.5 9.4 9.8 9.3 2010

2   9.1 9.0 9.0 9.1 9.0 9.1 9.0 9.0 9.0 8.8 8.6 8.5 2011

3   8.3 8.3 8.2 8.2 8.2 8.2 8.2 8.1 7.8 7.8 7.7 7.9 2012

4   8.0 7.7 7.5 7.6 7.5 7.5 7.3 7.2 7.2 7.2 6.9 6.7 2013

5   6.6 6.7 6.7 6.2 6.3 6.1 6.2 6.1 5.9 5.7 5.8 5.6 2014

6   5.7 5.5 5.4 5.4 5.6 5.3 5.2 5.1 5.0 5.0 5.1 5.0 2015

7   4.9 4.9 5.0 5.0 4.8 4.9 4.8 4.9 5.0 4.9 4.7 4.7 2016

8   4.7 4.7 4.4 4.4 4.4 4.3 4.3 4.4 4.2 4.1 4.2 4.1 2017

9   4.1 4.1 4.0 3.9 3.8 4.0 3.9 3.8 3.7 3.8 3.7 3.9 2018

10  4.0 3.8 3.8 3.6 -1  -1  -1  -1  -1  -1  -1  -1  2019

I'm new to python and not familiar with data processing. Please give me some suggestions and advice. Thank you guys.

I want to combine them to one table and column could be 'year', 'month','data1', 'data2'

1 Answers

1
Alexandre B. On

Here is one solution that iterate over all the dataframe. It's not the most efficient but it's readable.

import pandas as pd

df = pd.DataFrame([[7.8, 8.3, 8.7, 9.0, 9.4, 9.5, 9.5, 9.6, 9.8, 10.0, 9.9, 9.9, 2009],
                   [9.8, 9.8, 9.9, 9.9, 9.6, 9.4, 9.4, 9.5, 9.5, 9.4, 9.8, 9.3, 2010],
                   [9.1, 9.0, 9.0, 9.1, 9.0, 9.1, 9.0, 9.0, 9.0, 8.8, 8.6, 8.5, 2011],
                   [8.3, 8.3, 8.2, 8.2, 8.2, 8.2, 8.2, 8.1, 7.8, 7.8, 7.7, 7.9, 2012],
                   [8.0, 7.7, 7.5, 7.6, 7.5, 7.5, 7.3, 7.2, 7.2, 7.2, 6.9, 6.7, 2013],
                   [6.6, 6.7, 6.7, 6.2, 6.3, 6.1, 6.2, 6.1, 5.9, 5.7, 5.8, 5.6, 2014],
                   [5.7, 5.5, 5.4, 5.4, 5.6, 5.3, 5.2, 5.1, 5.0, 5.0, 5.1, 5.0, 2015],
                   [4.9, 4.9, 5.0, 5.0, 4.8, 4.9, 4.8, 4.9, 5.0, 4.9, 4.7, 4.7, 2016],
                   [4.7, 4.7, 4.4, 4.4, 4.4, 4.3, 4.3, 4.4, 4.2, 4.1, 4.2, 4.1, 2017],
                   [4.1, 4.1, 4.0, 3.9, 3.8, 4.0, 3.9, 3.8, 3.7, 3.8, 3.7, 3.9, 2018],
                   [4.0, 3.8, 3.8, 3.6, -1, -1, -1, -1, -1, -1, -1, -1, 2019]],
    columns=["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "year"],
)

df1 = pd.DataFrame([[1658.92, "2009-02-01"],
                    [1835.84, "2009-03-01"],
                    [2057.33 ,"2009-04-01"],
                    [2120.32, "2009-05-01"],
                    [2174.52, "2009-06-01"],
                    [2348.48, "2009-07-01"],
                    [2378.73 ,"2009-08-01"],
                    [2510.82, "2009-09-01"],
                    [2417.32, "2009-10-01"],
                    [2532.77, "2009-11-01"],
                    [2684.40, "2009-12-01"]],
    columns=["close"   ,  "date"])

# Rename columns
df.columns = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "year"]

# Convert to datetime for having the year and the month
df1["date"] = pd.to_datetime(df1["date"])
df1["month"] = df1.date.dt.month.astype(int)
df1["year"] = df1.date.dt.to_period('Y').astype(int)
df1= df1[["close", "month", "year"]]

# Create a new DataFrame
new_df = pd.DataFrame(columns=["month", "year", "df1"])
# Iterate over all the dataFrame
for index, row in df.iterrows():
    for i, cell in enumerate(row[["1", "2", "3", "4", "5", "6",
                     "7", "8", "9", "10", "11", "12"]]):
        new_df.loc[index*12 + i] = [int(df.columns[i]),
                                    int(row.year), cell]

# Add the close column to the "new_df"
new_df = pd.merge(new_df, df1, how='outer')

print(new_df)
#      month    year   df1    close
# 0      1.0  2009.0   7.8      NaN
# 1      2.0  2009.0   8.3      NaN
# 2      3.0  2009.0   8.7      NaN
# 3      4.0  2009.0   9.0      NaN
# 4      5.0  2009.0   9.4      NaN
# 5      6.0  2009.0   9.5      NaN
# 6      7.0  2009.0   9.5      NaN
# 7      8.0  2009.0   9.6      NaN
# 8      9.0  2009.0   9.8      NaN
# 9     10.0  2009.0  10.0      NaN
# 10    11.0  2009.0   9.9      NaN
# ..     ...     ...   ...      ...
# 128    9.0  2019.0 - 1.0      NaN
# 129   10.0  2019.0 - 1.0      NaN
# 130   11.0  2019.0 - 1.0      NaN
# 131   12.0  2019.0 - 1.0      NaN
# 132    2.0    39.0   NaN  1658.92
# 133    3.0    39.0   NaN  1835.84
# 134    4.0    39.0   NaN  2057.33
# 135    5.0    39.0   NaN  2120.32
# 136    6.0    39.0   NaN  2174.52
# 137    7.0    39.0   NaN  2348.48
# 138    8.0    39.0   NaN  2378.73
# 139    9.0    39.0   NaN  2510.82
# 140   10.0    39.0   NaN  2417.32
# 141   11.0    39.0   NaN  2532.77
# 142   12.0    39.0   NaN  2684.40