Need the last entry while using XSUM in SYNCSORT JCL

9.1k views Asked by At

I have some data like below:

This is not the actual data, but actual data is similar to this. And, the data comes in a file with 2 spaces between each field. No database is involved in input or output. I am using table format just to make it understandable.

 Name    Number   code
+---------------------+
 Albert  122234   xcc
 Robert  565435   rtd
 Robert  776567   iuy
 Robert  452890   yyt
 Stuart  776565   ter

In a file the data would look like..

Albert  122234  xcc
Robert  565435  rtd
Robert  776567  iuy
Robert  452890  yyt
Stuart  776565  ter

Now, I need to eliminate the duplicates using SYNCSORT. I can do this using XSUM, but I would get the following data:

 Name    Number   code
+---------------------+
 Albert  122234   xcc
 Robert  565435   rtd
 Stuart  776565   ter

But I need:

 Name     Number   code
+----------------------+
 Albert   122234   xcc
 Robert   452890   yyt
 Stuart   776565   ter

The last set of data has the last occurance of Robert in the output, while the former set has the first occurance.

So, is there any way to achieve this using XSUM...?

3

There are 3 answers

0
JackCColeman On

It looks like you want to keep the LAST record of a set of records that have the same sort key.

If you have a recent release of SyncSort then use DUPKEYS with LASTDUP, and EQUALS as mentioned in other answers.

It has been a while since I've used SyncSort, but if I remember correctly, it is possible to code an exit routine that has access to the sortkeys and can accept or reject records. The exit routine is entered for each record and so it is possible to keep prior sortkeys for comparison.

Also, I like writing exits in assembler (BAL), but this could be done with COBOL code.

So, if SyncSort supports a command that does you what want then by all means use it! If not then exits are relatively easy to code...

1
cschneid On

Two steps, the first adds a sequence number and sorts by the name ascending and by the sequence number descending. This is to get the last record for each name to come first.

The second step sorts just by name with the EQUALS parameter, which says to keep the data in the same sequence as the input file in the case of duplicate values in the SORT fields. Then we use SUM FIELDS=NONE to eliminate duplicates.

//SORT1    EXEC PGM=SORT
//SORTIN   DD  *
ALBERT  122234  XCC
ROBERT  565435  RTD
ROBERT  776567  IUY
ROBERT  452890  YYT
STUART  776565  TER
//SYSIN    DD  *
 INREC FIELDS=(001:001,020,         * REGULAR INPUT DATA
               021:SEQNUM,4,BI)     * ADD A SEQUENCE NUMBER
 SORT  FIELDS=(001,008,CH,A,
               021,004,BI,D)
 OUTREC FIELDS=(001:001,020,        * ORIGINAL INPUT DATA
                080:C' ')           * PADDING
//SORTOUT  DD  DISP=(NEW,PASS),
//             AVGREC=U,
//             LRECL=80,
//             RECFM=FB,
//             SPACE=(80,(1000,100))
//SYSOUT   DD  SYSOUT=*
//*
//SORT2    EXEC PGM=SORT
//SORTIN   DD  DISP=(SHR,PASS),DSN=*.SORT1.SORTOUT
//SYSIN    DD  *
 SORT  FIELDS=(001,008,CH,A),EQUALS
 SUM FIELDS=NONE
//SORTOUT  DD  SYSOUT=*
//SYSOUT   DD  SYSOUT=*
//*
0
Bill Woodger On

There are multiple ways to do this.

First, with SyncTool (most likely "aliased" to ICETOOL, such that you have EXEC PGM=ICETOOL somewhere along the way). Look at the SELECT operator, which has LAST, which should give you what you want. If your file is already sorted, don't sort it again (OPTION COPY in a USING file).

An example can be found under "Keep dropped duplicate records (XSUM)" in this publication: ftp://ftp.software.ibm.com/storage/dfsort/mvs/sorttrck.pdf

You'll end up with a Control Card along these lines:

  SELECT FROM(IN) TO(OUT) ON(1,3,CH) FIRST DISCARD(SORTXSUM)

Also a simple example here: http://www.ibmmainframes.com/viewtopic.php?p=310008#310008

If your data is already sorted, or after it has been sorted, use OUTFIL and the "reporting functions". Look at REMOVECC and NODETAIL, use SECTIONS and TRAILER3. Here, http://www.ibmmainframes.com/viewtopic.php?p=309955#309955, is an example.

You'll end up with Control Cards along these lines:

  OPTION EQUALS                                         
  SORT FIELDS=(1,3,CH,A)                                 
  OUTFIL REMOVECC,NODETAIL,SECTIONS=(1,3,TRAILER3=(1,80))

If you are sorting the data and want to use XSUM to keep the disacrded duplicates in a separate dataset, you can include a sequence number in a temporary extension to the record and SORT on that, Descending, after the main key. Drop the temporary extension. This cannot be done in one step, but SELECT (certainly the DFSORT version) can do all that XSUM does and more, in one function.

Finally, after something nagging at the back of my mind, if you have a current SyncSort of at least 1.4, you can do exactly what you want (with the proviso that we still don't know what that is) with DUPKEYS with LASTDUP and XDUP instead of SUM and XSUM.

You'll end up with a Control Card along these lines:

  DUPKEYS LASTDUP,NODUPS,XDUP