plotly express strip plot with temporal data

76 views Asked by At

Plotly express strip plot does not separate the points by color when temporal data is used on the x-axis.

Set up some data with random groups and status (which will be the color of points in our plot)

import pandas as pd
import plotly.express as px
import random

random.seed(0)
n = 100
df = pd.DataFrame(
    data=dict(
        group=random.choices(["A","B","C"], k=n),
        status=random.choices(["on", "off"], k=n),
        time=pd.date_range('2/5/2019', periods = n, freq ='2H'),
    )
)

Our DataFrame is

print(df)

   group status                time
0      C    off 2019-02-05 00:00:00
1      C    off 2019-02-05 02:00:00
2      B     on 2019-02-05 04:00:00
3      A    off 2019-02-05 06:00:00
4      B     on 2019-02-05 08:00:00
..   ...    ...                 ...
95     C     on 2019-02-12 22:00:00
96     C    off 2019-02-13 00:00:00
97     A     on 2019-02-13 02:00:00
98     B    off 2019-02-13 04:00:00
99     B     on 2019-02-13 06:00:00

[100 rows x 3 columns]

When we go to make a strip plot with "time" as the x-axis, using status as the color, all status values are on the same y-level

px.strip(df, x="time", y="group", color="status")

enter image description here

But if we were to use the DataFrame's integer indices as the x-axis, the colors are placed on different y levels

px.strip(df.reset_index(), x="index", y="group", color="status")

enter image description here

I would like the temporal data to plot like the integer data (with different colors on different y levels). I see nothing in the documentation that says temporal data is an issue.

2

There are 2 answers

0
r-beginners On

px.strip seems to be an appropriation of the boxplot function. fig=px.strip(...) ;print(fig.data). In this case, setting the jitter value to 0 eliminates the blurring of the scatter. The comment in the already existing answer says that the hover also needs time series data, so I will add the time to the custom data and update it. Also, the status is updated throughout and only the on graph is updated.

import pandas as pd
import plotly.express as px
import random

random.seed(0)
n = 100
df = pd.DataFrame(
    data=dict(
        group=random.choices(["A","B","C"], k=n),
        status=random.choices(["on", "off"], k=n),
        time=pd.date_range('2/5/2019', periods = n, freq ='2H'),
    )
)

fig = px.strip(df.reset_index(), x="index", y="group", color="status")
fig.update_traces(jitter=0)
fig.update_traces(customdata=df['time'])
fig.update_traces(hovertemplate='status=off<br>time=%{customdata}<br>group=%{y}<extra></extra>')

fig.update_xaxes(tickvals=df.index[::12], ticktext=df['time'][::12].dt.strftime('%b %d<br>%Y'),)
fig.data[1].hovertemplate = 'status=on<br>time=%{customdata}<br>group=%{y}<extra></extra>'
fig.show()

enter image description here

1
Ingwersen_erik On

Certainly, there's a simpler way to achieve your desired result, but one alternative solution is to create the strip plot using integer indices as the x-axis and then update the tick labels to the datetime values.

The downside of this solution is that certain things that plotly usually manages automatically for you, like tick labels spacing, will now have to be handled manually by your code.

Here's the source code for this approach:

import plotly.express as px
import pandas as pd
import numpy as np
import random

random.seed(0)

# Assuming df is your DataFrame with a 'time' column containing datetime values,
# 'group' for y-values, and 'status' for coloring.
n = 100
df = pd.DataFrame(
    data=dict(
        group=random.choices(["A","B","C"], k=n),
        status=random.choices(["on", "off"], k=n),
        time=pd.date_range('2/5/2019', periods = n, freq ='2H'),
    )
)

# Optionally, ensure 'time' is a datetime column
df['time'] = pd.to_datetime(df['time'])

# Create a numeric sequence for the x-axis
numeric_x = np.arange(len(df))

# Create the plot figure
fig = px.strip(df, x=numeric_x, y="group", color="status")

# Format the datetime values as strings
formatted_dates = df['time'].dt.strftime('%b %d<br>%Y')

# Select a subset of formatted datetime values for tick labels to avoid overcrowding
# Here, we select every Nth label, where N depends on the density of your data
N = max(1, len(df) // 5) # Adjust this based on your data density
tick_vals = np.array([*numeric_x[::N], numeric_x[-1]])
tick_texts = np.array([*formatted_dates[::N], formatted_dates.iloc[-1]])

# Set the customized tick labels
fig.update_xaxes(tickvals=tick_vals, ticktext=tick_texts)

# Update layout (optional)
fig.update_layout(
    xaxis_title="Time",
    yaxis_title="Group Value",
    legend_title="Status"
)

# Show the plot
fig.show()

Output:

Output strip plot