Altair Scatterplot adds unwanted lines

217 views Asked by At

When layered above a heatmap, the Altair scatterplot only seems to work if the point values are also on the axis of the heatmap. I any other case, white lines along the x and y-values are added. Here's a minimal example:

import streamlit as st
import altair as alt
import numpy as np
import pandas as pd

# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2

# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({'x': x.ravel(),
                     'y': y.ravel(),
                     'z': z.ravel()})

c = alt.Chart(source).mark_rect().encode(
    x='x:O',
    y='y:O',
    color='z:Q'
)

scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:O',
y='y:O')

st.altair_chart(c + s)

Result

Is there any way to prevent this behavior? I'd like to animate the points later on, so adding values to the heatmap axis is not an option.

1

There are 1 answers

1
jakevdp On BEST ANSWER

Ordinal encodings (marked by :O) will always create a discrete axis with one bin per unique value. It sounds like you would like to visualize your data with a quantitative encoding (marked by :Q), which creates a continuous, real-valued axis.

In the case of the heatmap, though, this complicates things: if you're no longer treating the data as ordered categories, you must specify the starting and ending point for each bin along each axis. This requires some thought about what your bins represent: does the value "2" represent numbers spanning from 2 to 3? from 1 to 2? from 1.5 to 2.5? The answer will depend on context.

Here is an example of computing these bin boundaries using a calculate transform, assuming the values represent the center of unit bins:

c = alt.Chart(source).transform_calculate(
    x1=alt.datum.x - 0.5,
    x2=alt.datum.x + 0.5,
    y1=alt.datum.y - 0.5,
    y2=alt.datum.y + 0.5,
).mark_rect().encode(
    x='x1:Q', x2='x2:Q',
    y='y1:Q', y2='y2:Q',
    color='z:Q'
).properties(
    width=400, height=400
)

scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
  x='x:Q',
  y='y:Q'
)

st.altair_chart(c + s)

enter image description here

Alternatively, if you would like this binning to happen more automatically, you can use a bin transform on each axis:

c = alt.Chart(source).mark_rect().encode(
    x=alt.X('x:Q', bin=True),
    y=alt.Y('y:Q', bin=True),
    color='z:Q'
).properties(
    width=400,
    height=400
)

scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
  x='x:Q',
  y='y:Q'
)

enter image description here