Is there a function in pandas which lets you create columns for dictionary key + value pairs efficiently?

58 views Asked by At

I have a list of form [[[key1,value1],[key2,value2],...,500],...,n] - aka L2 orderbook data with 500 levels. I want to translate this list into a dataframe, containing columns key_1,...,key_500 and value_1,...,value_500. - A dataframe that has a price + volume column for each level (e.g. level_1_volume, level_2_volume, ..., level_500_volume)

My current solution is simply looping through the existing dataframe (which simply has this list as a column) with df.iterrows(), extracting the values into separate lists, then creating dataframes using these lists and merging these as columns. Doesn't feel terribly efficient compared to other operations on my data set, is there some way to do this with built-in methods?

List1 = [[key1.1,val1.1],[key2.1,val2.1],...]
List2 = [[key1.2,val1.2],[key2.2,val2.2],...]

         key1_column, val1_column, key2_column, val2_column
Row_List1 key1.1       val1.1           key2.1          val2.1
Row_List2 key1.2       val1.2           key2.2          val2.2

Current Solution ["bids"] and ["asks"] simply contain dictionary/Json object of form {key1:value1; key2:value2} for 500 pairs.

# Initialize empty lists for prices and volumes
prices_bids = [[] for _ in range(500)]
volumes_bids = [[] for _ in range(500)]
prices_asks = [[] for _ in range(500)]
volumes_asks = [[] for _ in range(500)]

# Iterate through each row
for index, row in df.iterrows():

    attributes = row["bids"]
    result_bids = [[key,attributes[key]] for key in sorted(attributes.keys(), reverse=True)]

    attributes = row["asks"]
    result_asks = [[key,attributes[key]] for key in sorted(attributes.keys(), reverse=False)]
    
    for i in range(500):
        prices_bids[i].append(np.float64(result_bids[i][0]))
        volumes_bids[i].append(np.float64(result_bids[i][1]))
        prices_asks[i].append(np.float64(result_asks[i][0]))
        volumes_asks[i].append(np.float64(result_asks[i][1]))
        

# Create DataFrame from lists
for i in range(500):
    # Expressing prices as spreads
    df[f"bid_level_{i}_price"] = pd.Series((df["mid_price"]/pd.Series(prices_bids[i],dtype='float64')-1)*10000,dtype="float64")
    df[f"bid_level_{i}_volume"] = pd.Series(volumes_bids[i],dtype='float64')
    df[f"ask_level_{i}_price"] = pd.Series((df["mid_price"]/pd.Series(prices_asks[i],dtype='float64')-1)*10000,dtype="float64")
    df[f"ask_level_{i}_volume"] = pd.Series(volumes_asks[i],dtype='float64')
2

There are 2 answers

0
Garvit Gupta On

Try executing the below mentioned code, It should work import pandas as pd

# Your list of lists
data = [['key1', 'value1'],
        ['key2', 'value2'],
        ['key3', 'value3'],
        ['key4', 'value4']]

# Convert the list of lists into a dataframe
df = pd.DataFrame(data, columns=['Key', 'Value'])

# Display the original dataframe
print("Original DataFrame:")
print(df)

# Reshape the dataframe by setting the 'Key' column as the index
df.set_index('Key', inplace=True)

# Transpose the dataframe to convert keys into columns
df = df.T

# Reset the index to have numeric index
df.reset_index(drop=True, inplace=True)

# Display the final dataframe
print("\nFinal DataFrame:")
print(df)
0
Onyambu On

Using data from @Nick, you could do:

pd.DataFrame(map(dict, data))

   key1  key2  key3  key4
0   100   101   102   103
1   200   201   202   203
2   300   301   302   303