Linked Questions

Popular Questions

Spark 2 Dataframe using scala

Asked by At

Data Frame 1

Person  Work_order  email  
P1  111 [email protected]  
P2  222 [email protected]   
P3  111 [email protected]   
P4  -1  [email protected]   
P5  444 [email protected]   

val person = Seq(
  ("P1", "111", "[email protected]"),
  ("P2", "222", "[email protected]"),
  ("P3", "111", "[email protected]"),
  ("P4", "-1", "[email protected]"),
  ("P5", "444", "[email protected]")).toDF("person", "work_order_person", "email_person")

Data Frame 2

Work_order  email   
111 [email protected]   
222 [email protected]    
444 [email protected] 

val workOrder = Seq(
  ("111", "[email protected]"),
  ("222", "[email protected]"),
  ("444", "[email protected]")).toDF("work_order", "email")

With the above two dataset[Work order and Person] I have to build this below report. Any idea how this will be done.

Report
Work_order email Total_count_aggregation_on_the_email Count_excluding_the_self_work_order_id
111 [email protected] 4 2
222 [email protected] 4 3
444 [email protected] 1 0

Below one is not part of the report. it is just the explanation of the report counts

Total_count_aggregation_on_the_email

4 = [[email protected],[email protected],[email protected],[email protected]]
4 = [[email protected],[email protected],[email protected],[email protected]]
1 = [[email protected]]

Count_excluding_the_self_work_order_id

 2 = [222   [email protected] ,-1   [email protected]]
 3 = [111   [email protected], 111  [email protected], -1   [email protected] ]
 0 = []

Is this possible with the dataframes to create such reports. Appreciate your help.

Related Questions