This might take a while to explain, but I have a file (XMLList.txt) that contains the paths to multiple IDOC XMLs. The contents of the XMLList.txt look like this:
/usr/local/sterlingcommerce/data/archive/SFGprdr/SFTPGET/2017/Dec/week_4/AU_DHL_PW_Inbound_Delivery_from_Pfizer_20171220071754.xml /usr/local/sterlingcommerce/data/archive/SFGprdr/SFTPGET/2017/Dec/week_4/AU_DHL_PW_Inbound_Delivery_from_Pfizer_20171220083310.xml /usr/local/sterlingcommerce/data/archive/SFGprdr/SFTPGET/2017/Dec/week_4/CCMastOut_MQ_GLB_1_20171220154826.xml
I'm attempting to create a Perl script that reads each XML and parses just the values of the tags DOCNUM, SNDPRN and RCVPRN from each XML file into a pipe delimited file "report.csv"
Another thing to note is that my XML files could be: All on a single line - example
<?xml version="1.0" encoding="UTF-8"?><ZDELVRY073PL><IDOC BEGIN="1">
<EDI_DC40 SEGMENT="1"><TABNAM>EDI_DC40</TABNAM><MANDT>400</MANDT>
<DOCNUM>0000000443474886</DOCNUM><DOCREL>731</DOCREL><STATUS>30</STATUS>
<DIRECT>1</DIRECT><OUTMOD>4</OUTMOD><IDOCTYP>DELVRY07</IDOCTYP>
<CIMTYP>ZDELVRY073PL</CIMTYP><MESTYP>ZIBDADV</MESTYP><MESCOD>IBG</MESCOD>
<SNDPOR>SAPQ01</SNDPOR><SNDPRT>LS</SNDPRT><SNDPRN>Q01CLNT400</SNDPRN>
<RCVPOR>XMLDIST_MT</RCVPOR><RCVPRT>LS</RCVPRT><RCVPFC>LS</RCVPFC>
<RCVPRN>AU_DHL</RCVPRN>.... </EDI_DC40></IDOC>
or multiline XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<INVOIC02>
<IDOC>
<EDI_DC40>
<TABNAM/>
<DOCNUM>0000000658056255</DOCNUM>
<DIRECT/>
<IDOCTYP>INVOIC02</IDOCTYP>
<MESTYP>INVOIC</MESTYP>
<SNDPOR>SAPP01</SNDPOR>
<SNDPRT/>
<SNDPRN>ALE400</SNDPRN>
<RCVPOR>XMLINVOICE</RCVPOR>
<RCVPRT>KU</RCVPRT>
<RCVPRN>C18BASWARE</RCVPRN>
<CREDAT>20171220</CREDAT>
<CRETIM>222323</CRETIM>
</EDI_DC40>
The script I've used so far seems to work for small XMLs. However, some XMLs > 50 MB throw this error:
Out of memory! Out of memory! Callback called exit at /usr/opt/perl5/lib/site_perl/5.10.1/XML/SAX/Base.pm line 1941 (#1) (F) A subroutine invoked from an external package via call_sv() exited by calling exit.
Out of memory!
So, here's the code I'm using. Would like your help tweaking this:
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
# use module
use XML::Simple;
use Data::Dumper;
# create object
my $xml = new XML::Simple;
my $file_list = 'XMLList.txt';
open(my $fh_i, '<:encoding(UTF-8)', $file_list)
or die "Could not open file '$file_list' $!";
my $csv_out = 'report.csv';
open(my $fh_o, '>', $csv_out)
or die "Could not open file '$csv_out' $!";
while (my $row = <$fh_i>) {
$row =~ s/\R//g;
my $data = $xml->XMLin($row);
print $fh_o "$data->{IDOC}->{EDI_DC40}->{DOCNUM}|";
print $fh_o "$data->{IDOC}->{EDI_DC40}->{SNDPRN}|";
print $fh_o "$data->{IDOC}->{EDI_DC40}->{RCVPRN}\n";
}
close $fh_o;
First off, if the file contains newlines,
is going to read one line at a time from the file and attempt to do an XML conversion on that line alone instead of the whole document. I would recommend that you slurp each file into a buffer and use regex to eliminate newlines and carriage returns before XMLin conversion. Also, XMLin will die unceremoniously if there are any XML errors in the file, so you want to run it in an eval block.