get unsupported python packages in snowflake snowpark python worksheet

124 views Asked by At

I am using snowflake python worksheet to perform text analysis on some data in a snowflake table. This includes lemmatizing the text

I created this function in snowflake python worksheet

def lemmatize_text(text):
    # Initialize NLTK's WordNet Lemmatizer for lemmatization
    lemmatizer = WordNetLemmatizer()
    words = nltk.word_tokenize(text) 
    lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
    return ' '.join(lemmatized_words)

It gives me an error that the word_tokenize is not a known member of nltk I suppose it is not supported directly in the snowflake anaconda packages

How can I solve this problem?

I am new to snowflake and snowpark, in my jupyter notebook, i tried to create a udf and put it on the snowflake stage, but i dont know what do next.

from snowflake.snowpark import Session session = Session.builder.configs(connection_parameters).create()
from snowflake.snowpark.functions import udf, sproc, col from snowflake.snowpark.types import IntegerType, FloatType, StringType, BooleanType, Variant from snowflake.snowpark import functions as fn

session.sql("CREATE STAGE IF NOT EXISTS nlp_text_analysis").collect()

def lemmatize_text(session : Session, text: str) -> Variant: import nltk from nltk.stem import PorterStemmer from nltk.stem import WordNetLemmatizer import re nltk.download('punkt') nltk.download('wordnet') nltk.download('omw-1.4')

lemmatizer = WordNetLemmatizer() words = nltk.word_tokenize(text) lemmatized_words = [lemmatizer.lemmatize(word) for word in words] return ' '.join(lemmatized_words)

session.sproc.register(func=lemmatize_text, name="lemmatize_text", replace=True)

result: <snowflake.snowpark.stored_procedure.StoredProcedure at 0x1ddd9888850>

1

There are 1 answers

1
user23406494 On

try rolling back the spacy package version to 3.5.3 - that helped me using the nlp modules.