xjc or jaxb2-maven-plugin or maven: weird behavior while compiling XSDs, processing files twice but with altered names?

3.1k views Asked by At

Not sure where the issue is here; I suspect XJC but it's been driven by the jaxb2-maven-plugin within maven, so there's a couple of layers to unpack.

I'm compiling a folder of XSDs and it seems to be processing each file twice, once with the actual filename and once with a slightly altered filename. [This is on OSX, by the way, but I don't think it's a straight case-sensitive filesystem problem at all (as you'll see later).]

Here's the relevant part of the pom.xml:

  <plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>jaxb2-maven-plugin</artifactId>
    <version>1.6</version>
    <executions>
      <execution>
        <id>xjc</id>
        <goals>
          <goal>xjc</goal>
        </goals>
      </execution>
    </executions>
    <configuration>
      <schemaDirectory>src/main/resources</schemaDirectory>
    </configuration>
  </plugin>

The src/main/resources contains these XSDs:

ATIS_03_00_74_Local.xsd     ITIS_Final_3_0_0.xsd        LRMS_Final_09_07.xsd        TCIP_4_0_0_Final.xsd
ATIS_Partial_03_00_74.xsd   ITIS_Final_3_0_0_for_atis.xsd   LRMS_Final_09_07_for_atis.xsd   TCIP_4_0_0_Local.xsd
IM_03_00_38_Local.xsd       ITIS_Local_for_atis.xsd     LRMS_Final_09_07_for_im.xsd TMDD_Partial_0_0_0.xsd
IM_Partial_03_00_38.xsd     ITIS_Local_for_im.xsd       LRMS_Local_for_atis.xsd
ITIS_3_0_0_Local.xsd        LRMS_09_07_Local.xsd        LRMS_Local_for_im.xsd

when I run maven, it fails on very single file with something like:

[ERROR] file:/Users/dhaskin/clients/cs/onebusaway-nyc/onebusaway-nyc-tcip-api/src/main/resources/atis_Partial_03_00_74.xsd[35,50]
org.xml.sax.SAXParseException: 'RouteRequest' is already defined
...
[ERROR] file:/Users/dhaskin/clients/cs/onebusaway-nyc/onebusaway-nyc-tcip-api/src/main/resources/ATIS_Partial_03_00_74.xsd[22,38]
org.xml.sax.SAXParseException: (related to above error) the first definition appears here
...

note that the filename in the first error doesn't even exist; it's the same as the 2nd filename (which does exist) with the first underscore-separated word transformed to lowercase (but note the second word, Partial, is left unchanged).

Looking at maven -X output, I'm pretty sure it's XJC itself that's doing this, but I haven't been able to determine yet how to fix it.

Note that this project is a sub-project in a larger maven project, but I don't think that's relevant. For what it's worth, my maven command line in the parent project is: mvn -X -U install -pl onebusaway-nyc-tcip-api. (onebusaway-nyc-tcip-api is this sub-project.)

3

There are 3 answers

4
lexicore On

This can be many many things.

  • First of all, check your imports and includes, the wrong name very probably comes from one of them. (Credit goes to Xstian).
  • If it does, consider using the catalog file to fix it.
  • Next, as the error points to two different places in a schema file, so I would also consider a possibility of an error in the schema. What do you have there? Can you show relevant schema fragments? Is it some public schema we could check.
  • This might be a relevant issue.

Ok, how would I address this.

Disclaimer: I am the author of and also the author/lead dev of the OGC Schemas project which compiles a huge set of GIS schemas.

  • I'd use my own plugin as it
    • fixes a number of XJC issues
    • outputs debug information on schema resolution
  • Put all the schemas in src/main/resources
  • Check all the imports and includes if everything correct
  • In case of problems with schema locations in imports and includes, I'd write a catalog file which fixes/rewrites the invalid references
  • I'd run the compilation with mvn -X clean install and check the log, specificaly the schema resolution part.
  • In case of problems I'd either edit the catalog file, or, as the last measure patch the schemas (sometimes you really have to do this)
  • For patching I'd create a separate "original" copy of schemas and apply a patch using the Maven patch plugin during the build. (Not just simply edit local copies.)

I did all of this in the mentioned OGC Schemas Project.

An example of how the schema resolution log may look like:

REWRITE_SYSTEM: http://www.w3.org
    maven:org.hisrc.w3c:w3c-schemas:jar::!/w3c
resolveSystem(http://schemas.opengis.net/gml/3.2.1/gml.xsd)
Resolved system: http://schemas.opengis.net/gml/3.2.1/gml.xsd
    maven:org.jvnet.ogc:ogc-schemas:jar::!/ogc/gml/3.2.1/gml.xsd
[DEBUG] Resolved dependency resource [Dependency {groupId=org.jvnet.ogc, artifactId=ogc-schemas, version=2.0.1-SNAPSHOT, type=jar, classifier=null, resource=ogc/gml/3.2.1/gml.xsd}] to resource URL [jar:file:/C:/Repository/org/jvnet/ogc/ogc-schemas/2.0.1-SNAPSHOT/ogc-schemas-2.0.1-SNAPSHOT.jar!/ogc/gml/3.2.1/gml.xsd].

This normally sheds some light on what's happening and which schemas get loaded.

Most of the hints are applicable to as well.

0
denishaskin On

@Xstian had the answer (although I found it before I read the comment).

I was working with a pom.xml that I assumed had been used successfully before, but now I don't believe that's the case.

The issue was that I needed to be compiling only the primary XSD, which was including other XSDs via import. Because I was having XJC compile all of the XSDs in the folder, it was compiling some of them twice and hence the duplicate definitions.

By changing the the relevant part of the pom.xml to be as below, I no longer have that issue (although I have another which I will post separately):

  <plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>jaxb2-maven-plugin</artifactId>
    <version>1.6</version>
    <executions>
      <execution>
        <id>xjc</id>
        <goals>
          <goal>xjc</goal>
        </goals>
      </execution>
    </executions>
    <configuration>
      <schemaDirectory>src/main/resources</schemaDirectory>
      <schemaFiles>TCIP_4_0_0_Final.xsd</schemaFiles>
    </configuration>
  </plugin>
2
lexicore On

Ok, I have checked your schemas.

Your original problem is due to the incorrect imports. For instance, th schema TCIP_4_0_0_Final.xsd has the following imports:

<xs:import namespace="http://www.im-partial-03-00-38" schemaLocation="im_Partial_03_00_38.xsd"/>
<xs:import namespace="http://www.itis-final-3-0-0" schemaLocation="itis_Final_3_0_0.xsd"/>
<xs:import namespace="http://www.lrms-final-09-07" schemaLocation="lrms_Final_09_07.xsd"/>
<xs:import namespace="http://www.tcip-4-0-0-local" schemaLocation="tcip_4_0_0_local.xsd"/>

Whereas the files are named in different case:

IM_Partial_03_00_38.xsd
ITIS_Final_3_0_0.xsd
LRMS_Final_09_07.xsd
TCIP_4_0_0_Local.xsd

This is actually incorrect as URLs are actually case sensitive. So I would say, import structure of these schemas is invalid.

When you compile the schemas, XJC creates and maintains a hashmap URL -> schema document which it uses to avoid loading the same schema twice. URLs (or "system ids", specificaly) are treated case-sensitive.

If you compile all of the schemas (*.xsd) then some of the schemas are included into the compilation set (at least) twice: first time directly and the second time via direct or indirect import. So you basically get IM_Partial_03_00_38.xsd two times. And since the XJC uses case-sensitive system ids for the schema cache, it thinks these are two different documents, tries to compile the same file twice - which leads to collisions. (The errors you get.)

If you just compile TCIP_4_0_0_Final.xsd then each schema is accessed only once. The operating system happily ignores case when accessing files and everything works.

I have experimented with your schemas in the following demo project. (Schemas ZIP is downloaded during the build, so it's legally non-problematic). I had to add a bindings file, but got it to work rather fast. This works on my machine (Windows) but I think this may fail on *nix. Not sure though.

Then I thought I could use a catalog file to fix the case problem in schema URLs.

To my regret, this did not work easily.

First, I have found out that it is not practical to rewrite system ids when compiling local files. The URLs to be rewritten are given as fully-qualified absolute file://.../schema.xsd URLs, so including such rewriting rules in a catalog file will make the catalog file directory/machine-specific, which is not practical. This is actually a XJC catalog resolver problem, but I will try to address it.

Next, I thought that if local file URLs don't work then absolute URLs will do. Let's pretend we want to compile schemas from inside the ZIP file from the original URL:

<plugin>
    <groupId>org.jvnet.jaxb2.maven2</groupId>
    <artifactId>maven-jaxb2-plugin</artifactId>
    <executions>
        <execution>
            <id>generate</id>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>
                <schemaIncludes/>
                <catalog>src/main/resources/catalog.cat</catalog>
                <schemas>
                    <schema>
                        <url>http://www.aptatcip.com/APTA-TCIP-S-01%204.0_files/Schema%20Set.zip!/Schema%20Set/TCIP_4_0_0_Final.xsd</url>
                    </schema>
                </schemas>
            </configuration>
        </execution>
    </executions>
</plugin>

The URL http://www.aptatcip.com/APTA-TCIP-S-01%204.0_files/Schema%20Set.zip!/Schema%20Set/TCIP_4_0_0_Final.xsd does not work, of course. (Remember, we're just pretending.) But it gives an absolute URL which is not machine-specific.

To let XJC resolve schemas from this URL we'll need a catalog file. If we've made local copies of schemas in src/main/resources then the catalog file src/main/resource/catalog.cat should resolve the schemas from http://www.aptatcip.com/APTA-TCIP-S-01%204.0_files/Schema%20Set.zip!/Schema%20Set to src/main/resources:

REWRITE_SYSTEM "http://www.aptatcip.com/APTA-TCIP-S-01%204.0_files/Schema%20Set.zip!/Schema%20Set/" "./"

I thought, then we could rewrite the invalid lower-case URLs and everybody's happy.

Well this worked but only for the absolute url http://www.aptatcip.com/APTA-TCIP-S-01%204.0_files/Schema%20Set.zip!/Schema%20Set/TCIP_4_0_0_Final.xsd. XJC was trying to resolve relative imports just as relative URLs like lrms_Final_09_07.xsd instead of (as I have expected) as absolute URLs like http://www.aptatcip.com/APTA-TCIP-S-01%204.0_files/Schema%20Set.zip!/Schema%20Set/lrms_Final_09_07.xsd.

For a second, when I was compiling local files, they were resolved as absolute URLs first. And when I tried to compile a schema via absolute URLs, relative imports were resolved as relative URLs.

Nevertheless at the end I have arrived to the following catalog file:

REWRITE_SYSTEM "http://www.aptatcip.com/APTA-TCIP-S-01%204.0_files/Schema%20Set.zip!/Schema%20Set/" "./"
REWRITE_SYSTEM "tmdd_Partial_0_0_0.xsd" "TMDD_Partial_0_0_0.xsd"
REWRITE_SYSTEM "lrms_Final_09_07.xsd" "LRMS_Final_09_07.xsd"
REWRITE_SYSTEM "atis_Partial_03_00_74.xsd" "ATIS_Partial_03_00_74.xsd"
REWRITE_SYSTEM "im_Partial_03_00_38.xsd" "IM_Partial_03_00_38.xsd"
REWRITE_SYSTEM "itis_Final_3_0_0.xsd" "ITIS_Final_3_0_0.xsd"
REWRITE_SYSTEM "tcip_4_0_0_local.xsd" "TCIP_4_0_0_Local.xsd"

REWRITE_SYSTEM "TCIP_4_0_0_Final.xsd" "TCIP_4_0_0_Final.xsd"
REWRITE_SYSTEM "atis_Partial_03_00_74.xsd" "atis_Partial_03_00_74.xsd"
REWRITE_SYSTEM "ITIS_Final_3_0_0_for_atis.xsd" "ITIS_Final_3_0_0_for_atis.xsd"
REWRITE_SYSTEM "ITIS_Local_for_atis.xsd" "ITIS_Local_for_atis.xsd"
REWRITE_SYSTEM "LRMS_Final_09_07_for_atis.xsd" "LRMS_Final_09_07_for_atis.xsd"
REWRITE_SYSTEM "LRMS_Local_for_atis.xsd" "LRMS_Local_for_atis.xsd"
REWRITE_SYSTEM "ATIS_03_00_74_Local.xsd" "ATIS_03_00_74_Local.xsd"
REWRITE_SYSTEM "TMDD_Partial_0_0_0.xsd" "TMDD_Partial_0_0_0.xsd"
REWRITE_SYSTEM "ITIS_Local_for_im.xsd" "ITIS_Local_for_im.xsd"
REWRITE_SYSTEM "LRMS_Final_09_07_for_im.xsd" "LRMS_Final_09_07_for_im.xsd"
REWRITE_SYSTEM "LRMS_Local_for_im.xsd" "LRMS_Local_for_im.xsd"
REWRITE_SYSTEM "IM_03_00_38_Local.xsd" "IM_03_00_38_Local.xsd"
REWRITE_SYSTEM "ITIS_3_0_0_Local.xsd" "ITIS_3_0_0_Local.xsd"
REWRITE_SYSTEM "LRMS_09_07_Local.xsd" "LRMS_09_07_Local.xsd"

Rewriting lowercase file names into uppercase was exactly what I wanted to do. But why I had to rewrite all the other file names into identicat file names is above my understanding.

Nevertheless, the catalog file above is 3 times more that it should have been, but it works. Here is another demo project which now also builds without errors.

Unfortunatelly, I have to say that catalogs still do not work satisfactory. I have filed the following issues in :

To be clear, from my point of view none of these issues are bugs in my maven-jaxb2-plugin. This is something I "inherit" from XJC and the catalog resolver used there.

But I will try to solve it in my plugin.