Restructuring DataFrame to match value with other column number

42 views Asked by At

I have the following dataframe

Name  Rank1  Rank2  Rank3  Rank4  Rank5
Name1  Title1  Title3  Title7  Title2  Title6
Name2  Title3  Title2  Title6  Title5  Title4
Name3  Title2  Title5  Title3  Title6  Title4
Name4  Title4  Title6  Title1  Title5  Title2

and I would like to make the dataframe look like this

    Name  Title1  Title2  Title3  Title4  Title5  Title6  Title7
    Name1       1       4       2      10      10       5       3
    Name2      10       2       1       4       4       3      10
    Name3      10       1       3       5       2       4      10
    Name4       3       5      10       1       4       2      10

Any advice on how to do this?

3

There are 3 answers

0
Gilseung Ahn On

Use the following code.

import pandas as pd
import numpy as np 
def find_rank(row, v):
    rank = np.where(row == v)[0]
    if len(rank) == 0:
        return 10
    else:
        return rank[0]

for v in ['Title1', 'Title2', 'Title3', 'Title4', 'Title5', 'Title6']:
    df[v] = df.apply(find_rank, v = v, axis = 1)

df = df.drop(['Rank1', 'Rank2', 'Rank3', 'Rank4', 'Rank5'], axis = 1)

This code finds the value and returns its index for each row.
If there is no value in a row, then it returns 10.

4
mozway On

melt to 2 columns, extract the number from the rank, pivot_table to wide format with titles as columns:

out = (df.melt('Name')
         .assign(Rank=lambda d: d['variable'].str.extract(r'(\d+)$', expand=False).astype(int))
         .pivot_table(index='Name', columns='value', values='Rank', fill_value=10)
         .rename_axis(columns=None). reset_index()
      )

More robust variant to keep the original index and handle sorting of the columns for titles > "Title 9":

# pip install natsort
from natsort import natsort_key

out = (df.melt('Name', ignore_index=False).reset_index()
         .assign(Rank=lambda d: d['variable'].str.extract(r'(\d+)$', expand=False).astype(int))
         .pivot_table(index=['index', 'Name'], columns='value', values='Rank', fill_value=10)
         .sort_index(axis=1, key=natsort_key)
         .reset_index('Name').rename_axis(index=None, columns=None)
      )

Output:

    Name  Title1  Title2  Title3  Title4  Title5  Title6  Title7
0  Name1       1       4       2      10      10       5       3
1  Name2      10       2       1       5       4       3      10
2  Name3      10       1       3       5       2       4      10
3  Name4       3       5      10       1       4       2      10
0
sammywemmy On

One option is with numpy (a bit weary to flip and unflip - worried about the costs - just a hunch, and i may be wrong) - and then recreating the final dataframe

columns = ['Title1', 'Title2', 'Title3', 'Title4', 'Title5', 'Title6', 'Title7']
out = (df
       .set_index('Name')
       .apply(lambda x: x.str[-1])
       .astype(int)
      )
# get the positions within each row
sort_indices = out.apply(np.argsort,axis=1, raw=True).to_numpy()
# get column positions, starting from zero
# and reorder them based on sort_indices
length = len(out)
positions = out.to_numpy()-1
positions = positions[np.arange(length)[:, None], sort_indices]

# create new array that will contain the properly positioned values:
arr = np.empty(shape=(length, len(columns)),dtype=float)
arr[:] = np.nan
arr[np.arange(length)[:, None], positions] = sort_indices

# increase value by 1
# fill nulls with 10
# create dataframe
arr += 1
arr = np.where(np.isnan(arr), 10, arr).astype(int)
(pd.DataFrame(arr, columns=columns, index=out.index)
# avoid reset_index if you can,
# indexes are good, use them when possible
.reset_index()
)

    Name  Title1  Title2  Title3  Title4  Title5  Title6  Title7
0  Name1       1       4       2      10      10       5       3
1  Name2      10       2       1       5       4       3      10
2  Name3      10       1       3       5       2       4      10
3  Name4       3       5      10       1       4       2      10