Does anyone know a tool to convert from Cobol Copybook to XSD? Or XML.
Convert Cobol copybook to XSD
9.3k views Asked by lemotdit AtThere are 5 answers
A long time ago, I built some code to parse COBOL copybook and to generate XSD files.
Since COBOL language structure is pretty regular, I crafted a regular expression to get variable names and to identify field lengths. With that parsed structure, I could also create XML test data, MSXML DOM code to manipulate that structure and HTML forms to test those IMS transactions.
Bottom line: regular expressions could be really useful to do that.
Rational Developer for Z, XML Thunder, Syncsort ETL...there are many products that will do this.
Really though, if you learn the rules of schema datatypes, you can do it very easily manually. Mostly, you will deal with xsd:string, xsd:decimal, xsd:integer and some flavors of xsd:date to match your Cobol copybook.
You could try my Koopa Cobol parser project. While it doesn't do preprocessing I found that for most copybooks this isn't really necessary. It should cover most of what you'd expect from a standard copybook, and if not you can always extend the parser. It can export the parse tree to XML, which you can then process in any way you want.
Building a full blown parser for COBOL copybooks has a few challenges:
Copybooks are incorporated into COBOL programs during the text manipulation phase of compilation. The copybook source by itself may be incomplete. The only way to obtain a complete source for parsing is by pre-processing it as if it had been brought into a COBOL souce program. Normally copybooks are brought into a COBOL program via the COPY directive. Bringing this up may seem a bit pointless, but consider the following:
1) The COPY directive comes with a REPLACING option. On the surface this may seem simple enough to deal with, but once you get into the details it becomes very "interesting". See: COPY DIRECTIVE
2) The REPLACE directive. This directive may also manipulate source text after the COPY directive has done its bit. See: REPLACE DIRECTIVE
3) Nested copybooks. This one may not be as nasty as the previous two but keep nesting in mind too.
4) The syntax of COBOL Picture strings is noting to laugh at either. Have a look at: Picture String Symbols
5) Your parser will need to deal with COBOL continuation rules as well. See: Continuation Lines, and continuation of PSEUDO TEXT in particular.
I don't want to discourage you, but parsing COBOL is not a trivial task.
On the bright side, if your copybooks have a drop-dead-simple structure to them, as many do, it may be possible to get this done using a cascade of regular expressions. This approach is fairly common among those who need to parse COBOL programs (and copybooks) on software renovation projects. Maybe have a look at: RegReg
Cheers...