I've tried to find the answer in other questions, and none of the "standard" answers are working, so I'm hoping someone can either point me to where this has already been answered, or can tell me how to do this.
I have a large file with multiple documents within it. For a sample, assume something like this
DOCUMENT_IDENTIFIER 123400000000000000000123457 OTHER STUFF HERE
LINE WITH STUFF HERE
LINE WITH STUFF HERE
DOCUMENT_IDENTIFIER 123500000000000000000127456 OTHER STUFF HERE
LINE WITH STUFF HERE
LINE WITH STUFF HERE
Now, I'll need to preserve everything in the DOCUMENT_IDENTIFIER
Line starting with the first 0 through the 123
(or 127
in the second Document) That header line, plus the all the LINE WITH STUFF HERE
lines below it should make up one Document, and a new document should start at the second DOCUMENT_IDENTIFIER
line.
When I attempt to use the standard Debatching techniques, the pipeline fails: either it just fails completely (when, for instance, I try to define a header and body schemas for the pipeline) or it never starts the second document (if I try just a body schema).
I'm certain this is something fairly simple, but I'm completely missing how to get it done. Any suggestions/direction would be welcome.
If it matters, I'm stuck on BT2006 R2, at current.
What does your Body Schema look like? I would start getting that right and make sure that you have something that will create xml with separate records of all the "DOCUMENT_IDENTIFIER 1234" records.
I would use the "DOCUMENT_IDENTIFIER "1234 bit as the Tag Identifier, and then I would set the Tag Offset to 4, to avoid the first 4 characters.
You should have a
RecordForDocumentIdentifiers (Root of your Schema) Group Maxoccurs=* RecordForDocumentIdentifier (Set the Tag Identifier here) Fields for the columns you want to parse
When that seems to parse your example okay, and generate the XML you want, I would start creating my header and body schemas from that. I know it is 2 steps, but it takes some of the guesswork out of it.
I guess the Header schema would be picked from the RecordDocumentIdentifier and the body would be RecordForOtherLines (The outer record for that).
I hope that helps. If not, post your actual file and schema and let us take a look at it.