Pandas Pivot table nearest neighbor

602 views Asked by At

SOLUTION

df = pd.read_csv('data.txt')
df['z-C+1'] = df.groupby(['a','b','d'])['z'].transform(lambda x:x.shift(+1))
df['z-C-1'] = df.groupby(['a','b','d'])['z'].transform(lambda x:x.shift(-1))
df['z-D+1'] = df.groupby(['a','b','c'])['z'].transform(lambda x:x.shift(+1))
df['z-D-1'] = df.groupby(['a','b','c'])['z'].transform(lambda x:x.shift(-1))

QUESTION

I have a CSV which is sorted by a few indexes. There is one index in particular I am interested in, and I want to keep the table the same. All I want to do is add extra columns which are a function of the table. So, lets say "v" is the column of interest. I want to take the "z" column, and add more "z" columns from other places in the table where "c" = "c+1" and "c-1" and "d+1", "d-1", and just join those on the end. In the end I want the same number of rows, but with the "Z" column expanded to columns that are "Z.C-1.D", "Z.C.D", "Z.C+1.D", "Z.C.D-1", "Z.C.D+1". If that makes any sense. I'm having difficulties. I've tried the pivot_table method, and that brought me somewhere, while also adding confusion.

If this helps: Think about it like a point in a matrix, and I have an independent variable & dependent variable. I want to extract the neighboring independent variables for every location I have an observation

Here is my example csv:

        a   b   c   d   v       z
        10  1   15  42  0.90    5460
        10  2   15  42  0.97    6500
        10  1   16  42  1.04    7540
        10  2   16  42  1.11    8580
        10  1   15  43  1.18    9620
        10  2   15  43  0.98    10660
        10  1   16  43  1.32    3452
        10  2   16  43  1.39    4561
        11  1   15  42  0.54    5670
        11  2   15  42  1.53    6779
        11  1   16  42  1.60    7888
        11  2   16  42  1.67    8997
        11  1   15  43  1.74    10106
        11  2   15  43  1.81    11215
        11  1   16  43  1.88    12324
        11  2   16  43  1.95    13433

And my desired output:

        a   b   c   d   v       z       z[c-1]  z[c+1]  z[d-1]  z[d+1]
        10  1   15  42  0.90    5460    Nan     7540    Nan     9620
        10  2   15  42  0.97    6500    Nan     8580    Nan     10660
        10  1   16  42  1.04    7540    5460    Nan     Nan     3452
        10  2   16  42  1.11    8580    6500    Nan     Nan     4561
        10  1   15  43  1.18    9620    Nan     3452    5460    Nan
        10  2   15  43  0.98    10660   Nan     4561    6500    Nan
        10  1   16  43  1.32    3452    9620    Nan     7540    Nan
        10  2   16  43  1.39    4561    10660   Nan     8580    Nan
        11  1   15  42  0.54    5670    Nan     7888    Nan     10106
        11  2   15  42  1.53    6779    Nan     8997    Nan     11215
        11  1   16  42  1.60    7888    5670    Nan     Nan     12324
        11  2   16  42  1.67    8997    6779    Nan     Nan     13433
        11  1   15  43  1.74    10106   Nan     12324   5670    Nan
        11  2   15  43  1.81    11215   Nan     13433   6779    Nan
        11  1   16  43  1.88    12324   10106   Nan     7888    Nan
        11  2   16  43  1.95    13433   11215   Nan     8997    Nan
1

There are 1 answers

3
Roman Pekar On BEST ANSWER

Don't know if I understood you, but you can use shift() method to add shifted columns, like:

df['z-1'] = df.groupby('a')['z'].transform(lambda x:x.shift(-1))

update

If you want selection by values, you can use apply():

def lkp_data(c,d,v):
    d = df[(df['c'] == c) & (df['d'] == d) & (df['v'] == v)]['z']
    return None if len(d) == 0 else d.values[0]

df['z[c-1]'] = df.apply(lambda x: lkp_data(x['c'] - 1, x['d'], x['v']), axis=1)
df['z[c+1]'] = df.apply(lambda x: lkp_data(x['c'] + 1, x['d'], x['v']), axis=1)
df['z[d-1]'] = df.apply(lambda x: lkp_data(x['c'], x['d'] - 1, x['v']), axis=1)
df['z[d+1]'] = df.apply(lambda x: lkp_data(x['c'], x['d'] + 1, x['v']), axis=1)

    c   d      z  v  z[c-1]  z[c+1]  z[d-1]  z[d+1]
0  15  42   5460  1     NaN    7540     NaN    9620
1  15  42   6500  2     NaN    8580     NaN   10660
2  16  42   7540  1    5460     NaN     NaN    3452
3  16  42   8580  2    6500     NaN     NaN    4561
4  15  43   9620  1     NaN    3452    5460     NaN
5  15  43  10660  2     NaN    4561    6500     NaN
6  16  43   3452  1    9620     NaN    7540     NaN
7  16  43   4561  2   10660     NaN    8580     NaN

But I think, this one would be really inefficient