How to optimize the code to choose the lines I want to extract in Hadoop Hue and concat a text from a column?

52 views Asked by At

I'm working with Hadoop on Hue and there is a limitation of 100000 lines that can be downloaded. I would like to choose the lines that I will download in order to download the entire base. Example: line 1 to 100000, 100001 to 200000 ...

Issue 1: I'm using a code, but it is taking too long to bring results and the connection time on their server ends up dropping, i would like to know how i can optimize this code. I'm new in SQL.

Issue 2: At the bottom one of the columns is a text field, only the text is divided by lines. Example: line 1 - id1 - word 1, line 2 - id 1 - word 2, line 3 - id 1 - word 3. And to decrease the number of lines I am trying to concatenate the words by id: line 1 - id 1 - word 1 + word 2 + word 3. However the code I'm using doesn't work because it says I don't have access to the base, removing the function to concatenate the text, I can access the base.

For problem 1 I'm using this code:

select *
from (select *, row_number() over (partition by ID order by ID) as row_num from tab) user_table
where row_num between 1 and 100000

For problem 2 I'm using this:

select *, concat_ws ('', collect_list (WORD)) as words
from tab
where ORG = 'card'
group by ID

And I would like to combine both, but neither is working as it should:

select *, concat_ws ('', collect_list (WORD)) as words
from (select *, row_number() over (partition by ID order by ID) as row_num from tab) user_table
where row_num between 1 and 100000 and ORG = 'card'
group by ID
0

There are 0 answers