I would like to get the matches from one column with the other columns in a dataframe. Below is an example:

  tableNameFrom   tableNameJoin   attributeName
1 film            language        [film.languageId, language.languageID]
2 inventory       rental          [invetory.inventoryId, rental.filmId]

In the example above, I would like to match between tablenameFrom and tablenameJoin with the attributeName. Here desired output:

  tableName    attributeName
1 film         languageId
2 language     languageID
3 inventory    inventoryId
4 rental       filmId

1 Answers

0
jezrael On Best Solutions

My solution working only with attributeName column, because from sample data there are all data need for output.

Use Series.str.split first by ,, then reshape DataFrame by DataFrame.stack and split by .:

df1 = (df['attributeName'].str.split(', ', expand=True)
                          .stack()
                          .str.split('.', expand=True)
                          .reset_index(drop=True))
df1.columns = ['tableName','attributeName']
print (df1)
  tableName attributeName
0      film    languageId
1  language    languageID
2  invetory   inventoryId
3    rental        filmId

EDIT:

If values are lists use DataFrame constructor:

print (type(df.loc[1, 'attributeName']))
<class 'list'>

df1 = (pd.DataFrame(df['attributeName'].values.tolist())
                          .stack()
                          .str.split('.', expand=True)
                          .reset_index(drop=True))
df1.columns = ['tableName','attributeName']
print (df1)
  tableName attributeName
0      film    languageId
1  language    languageID
2  invetory   inventoryId
3    rental        filmId