Parallel Coordinate plot in plotly with continuous and categorical data

1.3k views Asked by At

Let's say I have some dataframe df with continuous and categorical data. Now I'd like to make a parallel-coordinate plot in plotly that contains both types of coordinates. Is it possible to combine these into one plot such that each datapoint line goes through all axes?

In the documentation I did find go.Parcoords and go.Parcats that treat these separately, but I didn't find a way to combine them. This is my minimal example:

import pandas as pd
import plotly.graph_objs as go
df = pd.DataFrame()
# continuous data
df['x1'] = [1,2,3,4]
df['x2'] = [9,8,7,6]
# categorical data
df['x3'] = ['a', 'b', 'b', 'c']
df['x4'] = ['A', 'B', 'C', 'C']
col_list = [dict(range=(df[col].min(), df[col].max()),
                 label=col,
                 values=df[col])
            for col in df.keys()
            #if col not in ['x3', 'x4']  # only works if we exclude these (uncomment to run)
            ]
fig = go.Figure(data=go.Parcoords(dimensions=col_list))
fig.show()
1

There are 1 answers

0
flawr On BEST ANSWER

Here is a solution based on customizing the tick names (ticktext). First we replace each categorical value with an integer, and then we define our custom ticks with the corresponding categorical value as a string:

import pandas as pd
import plotly.graph_objs as go
df = pd.DataFrame()
df['x1'] = [1,2,3,4]
df['x2'] = [9,8,7,6]
df['x3'] = ['a', 'b', 'b', 'c']
df['x4'] = ['A', 'B', 'C', 'C']
keys = df.keys()
categorical_columns = ['x3', 'x4']
col_list = []

for col in df.keys():
    if col in categorical_columns:  # categorical columns
        values = df[col].unique()
        value2dummy = dict(zip(values, range(len(values))))  # works if values are strings, otherwise we probably need to convert them
        df[col] = [value2dummy[v] for v in df[col]]
        col_dict = dict(
            label=col,
            tickvals=list(value2dummy.values()),
            ticktext=list(value2dummy.keys()),
            values=df[col],
        )
    else:  # continuous columns
        col_dict = dict(
            range=(df[col].min(), df[col].max()),
            label=col,
            values=df[col],
        )
    col_list.append(col_dict)
fig = go.Figure(data=go.Parcoords(dimensions=col_list))
fig.show()