I am trying to create a view out of a dataframe in Glue 4.0 but I am getting the error - AnalysisException: Table or view not found. The data format for tables in glue database is hudi.
Code -
import sys
from awsglue.transforms import *
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.functions import *
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
# Define the Glue Data Catalog database and table names
database_name = "hudi_db"
table4_name = 'd_person'
table4 = glueContext.create_data_frame.from_catalog(
database=database_name,
table_name=table4_name,
)
rows = table4.count()
distinct_rows = table4.distinct().count()
print(f"Number of rows in data frame: {rows} and distinct rows are: {distinct_rows}")
table4.createOrReplaceTempView(table4_name + '_glue_view')
custom_sql_query = """
SELECT count(*)
FROM d_person_glue_view
"""
# Execute the custom SQL query
result_df = spark.sql(custom_sql_query)
Are there any additional configs required for this? What could be the possible reasons that could result in this error?
Thank you.
I have tried the below things -
- provide your own SparkSession for it to use in the GlueContext constructor.
- run your sql on the spark_session object of Gluecontext
- Directly use sparksql instead of creating a dataframe. This works, but I want to load into dataframe first and then create a view.
Therefore, following solution worked for me:
Hope this helps!