How do I define the record structure of ebcdic file?

830 views Asked by At

I have ebcdic file in hdfs I want to load data to spark dataframe, process it and load results as orc files, I found that there is a open source solution which is cobrix cobrix, that allow to get data from ebcdic files, but developer must provide a copybook file which is a schema definition.

A few line of my ebcedic file are presented in the attached image. I want to get the format of copybook of the ebcdic file, essentially I want to read the vin his length is 17, vin_data the length is 3 and finally vin_val the length is 100.

enter image description here

2

There are 2 answers

0
Gilbert Le Blanc On BEST ANSWER

Based on your comment in the question, and looking at the input file, you could start with this.

01  VIN-RECORD.
    05  VIN                 PIC X(17).
    05  VIN-COUNT           PIC S9(5) COMP-3.
    05  VIN-VALUE           PIC X(100).

I'm guessing that the second field is COMP-3 based on the six examples all ending with a C byte. This indicates a positive COMP-3 value. A D byte would be a negative COMP-3 value. An F byte would indicate an unsigned COMP-3 value.

The third field is variable length and right padded with spaces.

0
Simon Sobisch On

how to define a copybook file of ebcdic data?

You don't.

A copybook may be used as a record definition (=how the data is stored), it has nothing to do with the encoding of data that may be stored in that.

This leaves the question "How do I define the record structure?"

You'd need the amount of fields, their length and type (it likely is not only USAGE DISPLAY) and then just define it with some fancy names. Ideally you just get the original record definition from the COBOL program writing the file, put that into a copybook if it isn't in one yet, and use that.

Your link has samples that show actually how a copybook looks like, if you struggle on the definition then please edit your question with the copybook you've defined and we may be able to help.