Applying transform_lookup on datasets with different number of rows

329 views Asked by At

I am currently learning Altair's maps feature and while looking into one of the examples (https://altair-viz.github.io/gallery/airport_connections.html), I noticed that the datasets (airports.csv and flights-airport.csv) have different number of rows. Is it possible to apply transform_lookup even if that's the case?

1

There are 1 answers

1
jakevdp On BEST ANSWER

Yes, it is possible to apply transform_lookup to datasets with different numbers of rows. The lookup transform amounts to a one-sided join based on a specified key colum: regardless of how many rows each dataset has, for each row of the main dataset, the first match in the lookup data is joined to the data.

A simple example to demonstrate this:

import altair as alt
import pandas as pd

df1 = pd.DataFrame({
    'key': ['A', 'B', 'C'],
    'x': [1, 2, 3]
})

df2 = pd.DataFrame({
    'key': ['A', 'B', 'C', 'D'],
    'y': [1, 2, 3, 4]
})

alt.Chart(df1).transform_lookup(
  lookup='key',
  from_=alt.LookupData(df2, key='key', fields=['y'])    
).mark_bar().encode(
    x='x:Q',
    y='y:O',
    color='key:N'
)

enter image description here

More information is available in the Lookup transform docs.