z/OS Syncsort: omit duplicates without sort

1.2k views Asked by At

It can’t be figured out with the manual how to manage this problem with syncsort (we found solutions with dfsort which didn’t help). Due to a program error (which can’t be fixed in time, you know: programmer, test, quality check, deployment...) we got duplicate records in a file (FB/LRECL 250) where then

  • a header line exists
  • subsequent duplicate data lines which have to be omitted but the one unique
  • data lines must not be sorted (due to obligate logical relations of some records)
  • the trailer includes the data line count.

The file can not manually be edited because of its size (>2 mio records).

example infile:

HEADER xxxx
cccc
bbbb 123
bbbb 123
bbbb 123
dddd
aaaa 123
aaaa 123
aaaa
TRAILER COUNT: 8

Expected outfile:

HEADER xxxx
cccc
bbbb 123         
dddd
aaaa 123
aaaa
TRAILER COUNT: 5

So the outfile is not sorted at all, the omitted records

bbbb 123         (omitted)
bbbb 123         (omitted)
aaaa 123         (omitted)

are not needed at all and may go straight into Nirvana.

(I would even be happy with a solution omitting header/trailer as I could easily concatenate manually generated lines in the subsequent job.)

Thanks for your help!

2

There are 2 answers

0
Srinivasan JV On

I was able to achieve your expected result using two SYNCSORT steps.

Step 1:

INREC FIELDS=(1:SEQNUM,4,ZD,5:1,8)
SORT FIELDS=(5,8,CH,A),SKIPREC=1  
SUM FIELDS=NONE

Using INREC, I've appended Sequence number in the first 4 bytes followed by the actual data record. Then, I've sorted the file with first 8 bytes as the key. Header record is being skipped using SKIPREC.

Step 2:

SORT FIELDS=(1,4,CH,A)                                              
OUTFIL FNAMES=SORTOF01,REMOVECC,                                    
OUTREC=(1:5,8,80:X),TRAILER1=('TRAILER COUNT:',COUNT=(M11,LENGTH=8))

In Step 2, output file from STEP 1 is being read as input. As you expect the data lines to be not sorted, I've sorted the input with Sequence number as the key. Using OUTREC, I'm restraining from writing the Sequence number in the final output file. I've used TRAILER1 to print the count of records at the last.

Hope this helps. Please let me know if you've an alternative which works more efficiently.

0
Karthick On

See my below sort card. It is built for your sample data shown above.

//SORTOUT DD SYSOUT=*                                                 
//SYSIN DD *                                                          
 OPTION COPY                                                          
 INREC FIELDS=(1,50,SEQNUM,7,ZD,RESTART=(1,8))                        
 OUTFIL REMOVECC,OMIT=(51,7,ZD,GT,01,|,1,7,CH,EQ,C'TRAILER'),         
        OUTREC=(1,50),TRAILER1=(C'TRAILER COUNT:',COUNT-1=(M11,LENGTH=8))
/*