Oracle SQL Pipe Delimited File w/One Comma Delimited Column to Split and Load Using SQLLDR

3.3k views Asked by At

Input sample data:

Style|Size|Codes
Modern|9|A224, B153, C166
Retro|8|D532, E533, F122
Vintage|7|G324, H243, I432

Using SQLLDR - need this output:

Style     Size     Code1      Code2      Code3
Modern    9        A224       B153       C166
Retro     8        D532       E533       F122
Vintage   7        G324       H243       I432
1

There are 1 answers

0
Gary_W On BEST ANSWER

You could do a transformation on the data on the way in. It is specified in the control file. See this recent post for an example: SQLLDR CTL: Load date field received in DDMonYYYY to db fields with formats of YYYYMM or MM/DD/YYYY

I would use REGEXP_SUBSTR to extract the various codes from their positions in the comma separated list. Give it a try and let us know what happens.

Looks like your target table will have a finite amount of code columns. What if your data has more coeds than the target table has columns? Consider making codes for an item its own table, lest you violate first normal form (a single value only per column).

If the codes string from the source file is really a description as part of an item, then they should most likely remain together as a code_description column. It depends on how this data is used by your app or report.

All that being said, the trick is to define the codes field as BOUNDFILLER. This says treat it as a FILLER and not load it, but let it be used as a bind variable in later expressions. Try this in your control file:

LOAD DATA
INFILE 'data.txt'
APPEND
INTO TABLE X_TEST
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
 STYLE       CHAR
,SIZE        CHAR
,CODESTRING  BOUNDFILLER
,CODE1      "trim(regexp_substr(:CODESTRING, '([^,]*)(,|$)', 1, 1, NULL, 1))"
,CODE2      "trim(regexp_substr(:CODESTRING, '([^,]*)(,|$)', 1, 2, NULL, 1))"
,CODE3      "trim(regexp_substr(:CODESTRING, '([^,]*)(,|$)', 1, 3, NULL, 1))"
)

The regular expression can be read as "Return the nth occurrence of zero or more non-comma characters that are followed by a comma or the end of the line, and return the 1st subgroup (which is the data less the comma or end of the line). Just wrap it in a call to TRIM() to remove the spaces or wait for one of the REGEXP wizards to give an improved REGEX. If there are only 2 codes, CODE3 will be NULL.