Cloud Dataprep - Replace code or id with value with middle dataset

178 views Asked by At

I'm really new in GCP dataprep and now trying to create a recipe, but I can't figured out the way of doing it.

In summary I have 2 files, the first one with this columns: NAME, CONTRY_CODE, ...

And the second one with: COUNTRY_CODE, COUNTRY_NAME

How do I replace the COUNTRY_CODE from the first dataset with the COUNTRY_NAME of the second one (matching with the corresponding COUNTRY_CODE)?.

Thanks in advance!

2

There are 2 answers

0
Alejandro Barone On BEST ANSWER

For anyone trying to make this type of things. You can achieve it with the Lookup property in Dataprep!.

Just select the column you want to change (in my case COUNTRY_CODE), then select Lookup -> Pick a Dataset (in my case the second one). And it will replace it as expected!

0
justbeez On

While the Lookup answer will work correctly, the JOIN option may be a better solution for for extensiblity as it supports multiple columns on the remote side, shows you the match rate, allows fuzzy matching, and lots of other goodies (like ignoring whitespace in the matches). You can also choose the join type to control how output and missing rows are handled.

Semantically these two options aren't much different and I haven't seen a real performance difference—but I've been able to simplify some of those operations by using a Join like this:

Google Cloud DataPrep step menu showing a simple Join operation