Feature and FeatureView versioning

351 views Asked by At

my team is interested in a feature store solution that enables rapid experimentation of features, probably using feature versioning. In the Feast slack history, I found @Benjamin Tan’s post that explains their feast workflow, and they explain FeatureView versioning:

insights_v1 = FeatureView(
    features=[
        Feature(name="insight_type", dtype=ValueType.STRING)
    ]
)
insights_v2 = FeatureView(
    features=[
        Feature(name="customer_id", dtype=ValueType.STRING)
        Feature(name="insight_type", dtype=ValueType.STRING)
    ]
)

Is this the recommended best practice for FeatureView versioning? It looks like Features do not have a version field. Is there a recommended strategy for Feature versioning? Creating a new column for each Feature version is one approach:

driver_rating_v1
driver_rating_v2

But that could get unwieldy if we want to experiment with dozens of permutations of the same Feature. Featureform appears to have support for feature versions through the "variant" field, but their documentation is a bit unclear.

1

There are 1 answers

0
MMBazel On

Adding additional clarity on Featureform: Variant is analogous to version. You'd supply a string which then becomes an immutable identifier for the version of the transformation, source, etc. Variant is one of the common metadata fields provided in the Featureform API.

Using the example of an ecommerce dataset & spark, here's an example of using the variant field to version a source (a parquet file in this case):

orders = spark.register_parquet_file(
    name="orders",
    variant="default",
    description="This is the core dataset. From each order you might find all other information.",
    file_path="path_to_file",
)

You can set the variant variable ahead of time:

VERSION="v1" # You can change this to rerun the definitions with with new variants

orders = spark.register_parquet_file(
    name="orders",
    variant=f"{VERSION}",
    description="This is the core dataset. From each order you might find all other information.",
    file_path="path_to_file",
)

And you can create versions or variants of the transformations -- here I'm taking a dataframe called total_paid_per_customer_per_day and aggregating it.

# Get average order value per day
@spark.df_transformation(inputs=[("total_paid_per_customer_per_day", "default")], variant="skeller88_20220110")
def average_daily_transaction(df):
    from pyspark.sql.functions import mean
    return df.groupBy("day_date").agg(mean("total_customer_order_paid").alias("average_order_value"))

There are some more details on the Featureform CLI here: https://docs.featureform.com/getting-started/interact-with-the-cli