I have been working with Google's Cloud Natural Language Model in BigQuery, and I have noticed that a significant percent of requests generate a NULL response. Why?
Here's an example...
This code creates a table with 54 movie reviews from the publicly available IMDB movie reviews dataset, then creates a remote connection to the API, and then uses the function ML.UNDERSTAND_TEXT to do NLP on the reviews via the API. 10 out of the 54 results = NULL. I have tried this again with a different sample of movie reviews with the same result.
The code:
-- from: https://www.samthebrand.com/sentiment-analysis-using-sql-ai-bigquery/
-- Isolate a sample of natural language data
CREATE OR REPLACE TABLE `[project].[dataset].[table]` AS
SELECT review, movie_url, label, reviewer_rating
FROM `bigquery-public-data.imdb.reviews`
WHERE reviewer_rating IS NOT NULL AND RAND() < 0.001;
-- Create a connection to a remote model
CREATE OR REPLACE MODEL `[project].[dataset].[model_name_a]`
REMOTE WITH CONNECTION `[dataset].[location].[dataset]`
OPTIONS (REMOTE_SERVICE_TYPE = 'CLOUD_AI_NATURAL_LANGUAGE_V1');
-- Run the data through your model to extract sentiment
CREATE OR REPLACE TABLE `[project].[dataset].[table]` AS
SELECT *, float64(ml_understand_text_result.document_sentiment.score) as sentiment_score, float64(ml_understand_text_result.document_sentiment.magnitude) as magnitude_score,
FROM ML.UNDERSTAND_TEXT(
MODEL `[project].[dataset].[model_name_a]`,
(SELECT review AS text_content, movie_url, label, reviewer_rating from `[dataset].[table]`)
STRUCT('analyze_sentiment' AS nlu_option));
-- see how many NULLs
SELECT sentiment_score, COUNT(*)
FROM `[project].[dataset].[table]`
GROUP BY 1 ORDER BY 1;