Order of plotting in Pandas.plotting.parallel_coordinates

Question

Order of plotting in Pandas.plotting.parallel_coordinates

1.3k views Asked by WolfiG At 05 October 2020 at 16:54

I have a series of measurements I want to plot as pandas.plotting.parallel_coordinates, where the color of the individual line is given by the value of one pandas.column.

Code looks like this:

... data retrieval and praparation from a couple of Excel files
---> output = 'largeDataFrame'

theColormap: ListedColormap = cm.get_cmap('some cmap name')

# This is a try to stack the lines in the right order.. (doesn't work)
largeDataFrames.sort_values(column_for_line_color_derivation, inplace=True, ascending=True)

# here comes the actual plotting of data
sns.set_style('ticks')
sns.set_context('paper')
plt.figure(figsize=(10, 6))
thePlot: plt.Axes = parallel_coordinates(largeDataFrame, class_column=column_for_line_color_derivation, cols=[columns to plot], color=theColormap.colors)
plt.title('My Title')
thePlot.get_legend().remove()
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

This works quite well and yields the following result:

Now I would like to have the yellow lines (high values of "column_for_line_color_derivation") plotted in front of the green and darker lines, so they become more prominent. In other words, I want to influence the order of stacking the lines by values of "column_for_line_color_derivation". Up to now I didn't find a way to do that.

Original Q&A

There are 1 answers

**JohanC** · Accepted Answer · 2020-10-05T18:26:25+00:00

I ran some tests with the pandas versions 1.1.2 and 1.0.3 and in both cases the lines are drawn from low to high value of the coloring column, independent of the dataframe order.

You can temporarily add parallel_coordinates(...., lw=5) which makes it very clear. With thin lines, the order is less visible, as the yellow lines have less contrast.

The parameter sort_labels= seems to have the opposite effect of its name: when False (default), the lines are drawn in sorted order, when True, they keep the dataframe order.

Here is a small reproducible example:

import numpy as np
import pandas as pd
from pandas.plotting import parallel_coordinates
import matplotlib.pyplot as plt

df = pd.DataFrame({ch: np.random.randn(100) for ch in 'abcde'})
df['coloring'] = np.random.randn(len(df))

fig, axes = plt.subplots(ncols=2, figsize=(14, 6))
for ax, lw in zip(axes, [1, 5]):
    parallel_coordinates(df, class_column='coloring', cols=df.columns[:-1], colormap='viridis', ax=ax, lw=lw)
    ax.set_title(f'linewidth={lw}')
    ax.get_legend().remove()
plt.show()

An idea is to change the linewidth depending on the class:

fig, ax = plt.subplots(figsize=(8, 6))

parallel_coordinates(df, class_column='coloring', cols=df.columns[:-1], colormap='viridis', ax=ax)
num_lines = len(ax.lines)
for ind, line in enumerate(ax.lines):
    xs = line.get_xdata()
    if xs[0] != xs[-1]:  # skip the vertical lines representing axes
        line.set_linewidth(1 + 3 * ind / num_lines)
ax.set_title(f'linewidth depending on class_column')
ax.get_legend().remove()
plt.show()

TechQA.

Order of plotting in Pandas.plotting.parallel_coordinates

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in MATPLOTLIB

Related Questions in PARALLEL-COORDINATES

Popular Questions

Popular Tags

Trending Questions