AWS Glue : AnalysisException: Table or view not found

Question

AWS Glue : AnalysisException: Table or view not found

534 views Asked by Arjun Singh At 30 October 2023 at 09:17

I am trying to create a view out of a dataframe in Glue 4.0 but I am getting the error - AnalysisException: Table or view not found. The data format for tables in glue database is hudi.

Code -

import sys
from awsglue.transforms import *
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.functions import *

sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

# Define the Glue Data Catalog database and table names
database_name = "hudi_db"
table4_name = 'd_person'

table4 = glueContext.create_data_frame.from_catalog(
    database=database_name,
    table_name=table4_name,
)

rows = table4.count()
distinct_rows = table4.distinct().count()
print(f"Number of rows in data frame: {rows} and distinct rows are: {distinct_rows}")


table4.createOrReplaceTempView(table4_name + '_glue_view')


custom_sql_query = """
      SELECT count(*)
    FROM d_person_glue_view
"""

# Execute the custom SQL query
result_df = spark.sql(custom_sql_query)

Are there any additional configs required for this? What could be the possible reasons that could result in this error?

Thank you.

I have tried the below things -

provide your own SparkSession for it to use in the GlueContext constructor.
run your sql on the spark_session object of Gluecontext
Directly use sparksql instead of creating a dataframe. This works, but I want to load into dataframe first and then create a view.

Original Q&A

There are 1 answers

**Shubham Joshi** · Answer 1 · 2023-10-30T14:41:22+00:00

Following is the way to read Hudi tables in Dataframe from S3 locaton, it works beautifully for me:

spark.read.format("hudi").load(S3_basePath).createOrReplaceTempView("test")

res = spark.sql("select * from test")

When using "getCatalogSource" for reading non-streaming data sources stored in Glue Data Catalog, kindly use DynamicFrames instead of DataFrames, and then convert the Dynamic Frame into a Spark DataFrame using "toDF()" if needed. This is because function "getDataFrameFromCatalog()" is designed for AWS Glue streaming sources.

Therefore, following solution worked for me:

df1 = glueContext.create_data_frame.from_catalog(
    database="hudidb", table_name= "huditable"
)

AWSGlueDataCatalog_node1698763846214 = DynamicFrame.fromDF(
    df1,
    glueContext,
    "AWSGlueDataCatalog_node1698763846214",
)

df2= AWSGlueDataCatalog_node1698763846214.toDF()

df2.createOrReplaceTempView("hudiview")

Hope this helps!

TechQA.

AWS Glue : AnalysisException: Table or view not found

There are 1 answers

Related Questions in AWS-GLUE

Related Questions in APACHE-HUDI

Popular Questions

Popular Tags

Trending Questions