Spark reading in fixed width file

Question

Spark reading in fixed width file

2.8k views Asked by Dan At 04 January 2017 at 18:23

I'm new to Spark (less than 1 month!) and am working with a flat file raw data input that is fixed width. I am using sqlContext to read in the file using com.databricks.spark.csv and then using .withColumn to substring the rows based on the set widths.

    rawData.withColumn("ID"), trim(rawData['c0'].substr(1,8)))

The issue I am encountering is that the last field is of variable width. It has a fixed start point but variable number of 'sets' of data that are like 20 chars wide. So for example

Row 1  A 1243 B 42225 C 23213 
Row 2  A 12425
Row 3  A 111 B 2222 C 3 D 4 E55555

I need to eventually read in those variable fields, just pull out the first character of each group in the variable width column, and then transpose so that the output looks like:

Row 1 A
Row 1 B
Row 1 C
Row 2 A
...
Row 3 D
Row 3 E

I've read in the fixed width columns I need but I am stuck at the variable width field.

Original Q&A

There are 1 answers

**Bhargav Kosaraju** · Accepted Answer · 2017-04-02T02:17:54+00:00

zipWithIndex and explode can help to transpose the data into rows of each element

sc.textFile ("csv.data").map(_.split("\\s+")).zipWithIndex.toDF("dataArray","rowId").select ($"rowId",explode($"dataArray")).show(false)

+-----+------+
|rowId|col   |
+-----+------+
|0    |A     |
|0    |1243  |
|0    |B     |
|0    |42225 |
|0    |C     |
|0    |23213 |
|1    |A     |
|1    |12425 |
|2    |A     |
|2    |111   |

TechQA.

Spark reading in fixed width file

There are 1 answers

Related Questions in CSV

Related Questions in APACHE-SPARK

Related Questions in FIXED-WIDTH

Popular Questions

Trending Questions