How to replace column values in dataframe-js?

664 views Asked by At

I have 2 javascript dataframes:

    const df1 = new DataFrame([
        [1, 2, 3, 4, 5], 
        [1, 2, 3, 4, 5],
        [1, 2, 3, 4, 5],
    ], ['c1', 'c2', 'c3', 'c4', 'c5']);

and

    const df2 = new DataFrame([
        [11, 22, 33, 44, 55], 
        [11, 22, 33, 44, 55],
        [11, 22, 33, 44, 55],
    ], ['c1', 'c2', 'c3', 'c4', 'c5']);

df1.show(df1.count()) gives:

| c1        | c2        | c3        | c4        | c5        |
------------------------------------------------------------
| 1         | 2         | 3         | 4         | 5         |
| 1         | 2         | 3         | 4         | 5         |
| 1         | 2         | 3         | 4         | 5         |

df2.show(df2.count()) gives:

| c1        | c2        | c3        | c4        | c5        |
------------------------------------------------------------
| 11        | 22        | 33        | 44        | 55        |
| 11        | 22        | 33        | 44        | 55        |
| 11        | 22        | 33        | 44        | 55        |

What is the best way to replace all values in columns c2 and c3 in df1 with column values from df2?

So eventually i want to end up with:

| c1        | c2        | c3        | c4        | c5        |
------------------------------------------------------------
| 1         | 22        | 33         | 4         | 5         |
| 1         | 22        | 33         | 4         | 5         |
| 1         | 22        | 33         | 4         | 5         |
1

There are 1 answers

0
Tony On BEST ANSWER

The way I did it (fast):

const cols = ['c2', 'c3']
const values = df2.select(...cols).toArray()

for (i in cols) {
    df1 = df1.withColumn(cols[i], (row, j) => values[j][i])
}

Or alternatively (equally fast):

const cols = ['c2', 'c3']
const values = df2.select(...cols).toArray()

for (i in cols) {
    df1 = df1.chain((row, j) => row.set(cols[i], values[j][i]))
}

Or even shorter (but about 10 times slower):

const cols = ['c2', 'c3']

for (i in cols) {
    df1 = df1.withColumn(cols[i], (row, j) => df2.select(cols[i]).toArray()[j][0])
}

Is there any easier way to achieve the same?