Big Query Split String into Most Frequent Words

52 views Asked by At

I'm trying to find what words are most frequent in a column in Big Query. (the product description column)

Is there a way to go further and find what words most commonly follow the word "Knife"? (in the product description column)

I'm trying to isolate the product descriptions that contain only sharp, dangerous knives (excluding Halloween knives, Knife Blocks, Knife Trays, Knife Organizers, etc.)

https://docs.google.com/spreadsheets/d/1c_XLVA2gh7i3BFIsIyg3qAtcdXDY46QomFK6u-nB08E/edit#gid=350499651

1

There are 1 answers

2
Println On

try this below query: Just replace sample string by column_name and in the exclude_words add the keywords which you need to exclude.

    with before_knives as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Knives Tool Helps Repair and Restore Blades'),r'(\w+) knives') as words
      ),
      before_knives_words AS (
         SELECT vals
           FROM before_knives, UNNEST(before_knives.words) AS vals
    ),
    after_knives as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Tool Helps Repair and Restore Blades'),r'knives (\w+)') as words
      ),
      after_knives_words AS (
         SELECT vals
           FROM after_knives, UNNEST(after_knives.words) AS vals
    ),
    before_knife as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Knives Tool Helps Repair and Restore Blades'),r'(\w+) knife') as words
      ),
      before_knife_words AS (
         SELECT vals
           FROM before_knife, UNNEST(before_knife.words) AS vals
    ),
    after_knife as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Tool Helps Repair and Restore Blades'),r'knife (\w+)') as words
      ),
      after_knife_words AS (
         SELECT vals
           FROM after_knife, UNNEST(after_knife.words) AS vals
    ),
    union_all as (
      select * from before_knives_words
    union all
    select * from after_knives_words
    union all 
    select * from before_knife_words
    union all
    select * from after_knife_words
    
    ),
exclude_words as (
  select * from union_all where 
  vals not in ('chef','stage')
)
select vals,count(*) from exclude_words group by vals