I am currently trying to create a feature table and write the data from a dataframe into it:
from databricks import feature_store
from databricks.feature_store import feature_table
from databricks.feature_store import FeatureStoreClient
pyspark_df = dataframe.to_spark()
fs = FeatureStoreClient()
customer_feature_table = fs.create_table(
name='FeatureStore.Features',
primary_keys=['ID1', 'ID2'],
schema = pyspark_df.schema,
description='CustomerProfit features'
)
fs.write_table(
name='FeatureStore.Features',
df = pyspark_df,
mode = 'overwrite'
)
If I execute this code I run into the following error mesage:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 554.0 failed 4
times, most recent failure: Lost task 0.3 in stage 554.0 (TID 1100) (10.139.64.9 executor 19):
ExecutorLostFailure (executor 19 exited caused by one of the running tasks)
Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.
Check driver logs for WARN messages.
I am using a the runtime version: 10.3 ML (includes Apache Spark 3.2.1, Scala 2.12)
I tried the same code on a smaller dataframe and it worked. I also tried to use a more powerful "driver type" but I still run into the issue. Why do I run into that error and is there some solution or workaround?
Try using partition_columns. It will facilitate the writing and loading of the data. visit https://docs.databricks.com/machine-learning/feature-store/feature-tables.html for more information.