Custom join in Dataprep

396 views Asked by At

Can I join two tables using custom condition rather than equal? Eg, I have two tables in BigQuery. Table A has 3 columns start_range, end_range and grade. Table B has data coming from Storage using cloud functions and has a particular column "marks". I am creating a flow in DataPrep such that join these tables which should satisfy the condition:

SELECT data.grade FROM  data  INNER JOIN student_data ON student_data.marks BETWEEN data.start_range AND data.end_range 

and then paste the data to a new table. But I could not find any provision for custom condition. Only two columns can be matched for join. Any idea?

1

There are 1 answers

5
Lefteris S On

I think that your best chance would be to implement your solution in BigQuery:

SELECT
  grade, marks
FROM
  `project.dataset.student_data` a
JOIN
  `project.dataset.data` b
ON
  a.marks > b.start_range AND a.marks < b.end_range

If you insist on Dataprep, I don't believe there's any way to join on a range. If I understood your case correctly, what you have is something like this:

enter image description here enter image description here

...and you want to get As and Bs instead of the integer marks in student_data. It's probably not the most elegant solution, but you can use a new formula with a case, something like this:

  CASE([(marks >= 0) && (marks < 50),
    'D',
    (marks >= 50) && (marks < 100),
    'C',
    (marks >= 100) && (marks < 150),
    'B',
    (marks >= 150) && (marks <= 200),
    'A'])

You'd just need to click "New step" on the recipe, chose "New formula" and then add it under "Formula". It should look like so: enter image description here

Then you can even use "Lookup" to join any other useful data you might have in the data table using "grade" as the lookup key.