How to apply custom logics on spark dataframe using scala

Question

How to apply custom logics on spark dataframe using scala

46 views Asked by Vivek Gowda At 25 May 2023 at 07:27

Imagine a data for this question is in a nested json structure. I have flattened the data from json using explode() and added it into one data-frame with columns project, Task, Task-Evidence, Task-Remarks, Project-Evidence

*Note: This DF is having 1 Project for which it has 2 Tasks, for the first task it has 1 task-link, for the second task it has 1 task-link. at project level we have 3 project-links.

Result of DF

Expected Result

Original Q&A

There are 1 answers

**Islam Elbanna** · Answer 1 · 2023-05-25T09:06:42+00:00

AFAIU, If you flattened the json, then you just need to group tasks, task-evidence, ... by Project, so you can group by project and use collect_set, something thing like this:

import org.apache.spark.sql.functions._

val df2 = df.groupBy("project").agg(
            collect_set("Task").as("Tasks"),
            collect_set("Task-Evidence").as("Task-Evidences"),
            collect_set("Task-Remarks").as("Task-Remarks"),
            collect_set("Project-Evidence").as("Project-Evidences")
        )

TechQA.

How to apply custom logics on spark dataframe using scala

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in APACHE-SPARK-DATASET

Related Questions in SCALA-SPARK

Popular Questions

Trending Questions