XSL transformation from EAD to MARC skips over 2nd subject term

154 views Asked by At

I have a very strange problem. I have XML documents encoded in EAD that I'm transforming into MARC records for a library catalog. There is a section of the EAD document that looks like this:

    <controlaccess>
        <list type="simple">
            <item><subject encodinganalog="650" source="lcsh">Prisons -- History -- 19th century</subject></item>
            <item><subject encodinganalog="650" source="lcsh">Prisons -- Statistics -- History -- 19th century</subject></item>
            <item><subject encodinganalog="650" source="lcsh">Prisons -- Statistics -- Extra term 1 -- History -- 19th century</subject></item>
            <item><subject encodinganalog="650" source="lcsh">Prisons -- Statistics -- Extra term 1 -- Extra term 2 -- History -- 19th century</subject></item>
        </list>
    </controlaccess>

What the code does correctly is pull out each item/subject and create a MARC field for each one, and each term that's separated by "--" gets put into a separate subfield (either a, x, y, or whatever).

The code does this properly if there are 1-3 terms in a single subject element, but if there are 4 or more terms, the second term gets left out entirely and the rest of the terms (from the third one on) are extracted properly. I can't figure out why the second term gets skipped over if there are 4+ terms. That's what I'd like your help figuring out.

I'm using XSL 1.0 and the subject portion of the code looks like this. The parameter gets called properly from the main template.

<xsl:template name="subject_template">
        <xsl:param name="string" />
        <marc:datafield>
            <xsl:choose>
                <xsl:when test="contains($string, '--')!=0">
                    <xsl:variable name="tmp1" select="substring-before($string, '--')" />
                    <xsl:variable name="tmp2" select="substring-after($string, '--')" />
                    <marc:subfield code="a">
                        <xsl:value-of select="$tmp1" />
                    </marc:subfield>
                    <xsl:call-template name="subject_tokenize">
                        <xsl:with-param name="string" select="$tmp2" />
                        <xsl:with-param name="type" select="'x'" />
                    </xsl:call-template>
                </xsl:when>
                <xsl:otherwise>
                    <marc:subfield code="a">
                        <xsl:value-of select="$string" />
                    </marc:subfield>
                </xsl:otherwise>
            </xsl:choose>
        </marc:datafield>
    </xsl:template>

Here is the tokenize template, which is hundreds of lines long. I tried to only include what was necessary/relevant to my problem.. The 4 variables in the beginning (genx, etc) pull from a huge list of terms to determine what the subfield code should be.

<xsl:template name="subject_tokenize">
    <xsl:param name="string" />
    <xsl:param name="type" />
    <xsl:variable name="genx">
        <xsl:call-template name="genx" />
    </xsl:variable>
    <xsl:variable name="geny">
        <xsl:call-template name="geny" />
    </xsl:variable>
    <xsl:variable name="formlist">
        <xsl:call-template name="formlist" />
    </xsl:variable>
    <xsl:variable name="geoglist">
        <xsl:call-template name="geoglist" />
    </xsl:variable>
    <xsl:if test="contains($string, '--')!=0">
        <xsl:variable name="str1" select="substring-before($string, '--')"/>
        <xsl:variable name="str2" select="substring-after($string, '--')"/>
        <xsl:if test="contains($str2, '--')!=0">
            <xsl:variable name="newstr2" select="substring-after($str2, '--')"/>
            <xsl:variable name="tmpvar" select="substring-before($str2, '--')"/>
            <xsl:choose>
                <xsl:when test="testsomething">
                    do stuff
                </xsl:when>
                <xsl:otherwise>
                    <xsl:if test="contains($geoglist, translate($str1, '.', ''))!=0">
                        <marc:subfield code="z">
                            <xsl:value-of select="$str1"/>
                        </marc:subfield>
                        <xsl:if
                            test="contains($formlist, translate(substring-before($str2, '--'), '.', ''))!=0">
                            <marc:subfield code="v">
                                <xsl:value-of select="substring-before($str2, '--')"/>
                            </marc:subfield>
                        </xsl:if>
                        <xsl:if
                            test="contains($geny, translate(substring-before($str2, '--'), '.', ''))!=0">
                            <marc:subfield code="y">
                                <xsl:value-of select="substring-before($str2, '--')"/>
                            </marc:subfield>
                        </xsl:if>
                        <xsl:if
                            test="contains($genx, translate(substring-before($str2, '--'), '.', ''))!=0">
                            <marc:subfield code="x">
                                <xsl:value-of select="substring-before($str2, '--')"/>
                            </marc:subfield>
                        </xsl:if>
                        <xsl:if
                            test="contains($formlist, translate(substring-before($str2, '--'), '.', ''))=0 and contains($genx, translate(substring-before($str2, '--'), '.', ''))=0 and contains($geny, translate(substring-before($str2, '--'), '.', ''))=0">
                            <marc:subfield code="z">
                                <xsl:value-of select="substring-before($str2, '--')"/>
                            </marc:subfield>
                        </xsl:if>
                    </xsl:if>
                    <xsl:if test="contains($formlist, translate($str1, '.', ''))!=0">
                        <marc:subfield code="v">
                            <xsl:value-of select="$str1"/>
                        </marc:subfield>
                    </xsl:if>
                    <xsl:if test="contains($geny, translate($str1, '.', ''))!=0">
                        <marc:subfield code="y">
                            <xsl:value-of select="$str1"/>
                        </marc:subfield>
                    </xsl:if>
                    <xsl:if
                        test="contains($formlist, translate($str1, '.', ''))=0 and contains($geny, translate($str1, '.', ''))!=0">
                        <marc:subfield code="x">
                            <xsl:value-of select="$str1"/>
                        </marc:subfield>
                    </xsl:if>
                    <xsl:if test="contains($geoglist, translate($str1, '.', ''))=0">
                        <xsl:if
                            test="contains($formlist, translate(substring-before($str2, '--'), '.', ''))!=0">
                            <marc:subfield code="v">
                                <xsl:value-of select="substring-before($str2, '--')"/>
                            </marc:subfield>
                        </xsl:if>
                        <xsl:if
                            test="contains($geny, translate(substring-before($str2, '--'), '.', ''))!=0">
                            <marc:subfield code="y">
                                <xsl:value-of select="substring-before($str2, '--')"/>
                            </marc:subfield>
                        </xsl:if>
                        <xsl:if
                            test="contains($geoglist, translate(substring-before($str2, '--'), '.', ''))!=0">
                            <marc:subfield code="z">
                                <xsl:value-of select="substring-before($str2, '--')"/>
                            </marc:subfield>
                        </xsl:if>
                        <xsl:if
                            test="contains($geoglist, translate(substring-before($str2, '--'), '.', ''))=0 and contains($geny, translate(substring-before($str2, '--'), '.', ''))=0 and contains($formlist, translate(substring-before($str2, '--'), '.', ''))=0">
                            <marc:subfield code="x">
                                <xsl:value-of select="substring-before($str2, '--')"/>
                            </marc:subfield>
                        </xsl:if>
                    </xsl:if>
                    <xsl:call-template name="subject_tokenize">
                        <xsl:with-param name="string" select="$newstr2"/>
                        <xsl:with-param name="type" select="'x'"/>
                    </xsl:call-template>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:if>

My output looks like this:

=650  \0$aPrisons $x History $x 19th century
=650  \0$aPrisons $x History $x 19th century
=650  \0$aPrisons $x Extra term 1 $x History $x 19th century
=650  \0$aPrisons $x Extra term 1 $x Extra term 2 $x History $x 19th century

The first 650 field is correct. The following 3 are all missing the second term, "Statistics." This is just an example and has been replicated with different terms, different ordering of terms, and/or different quantity of terms. I assume the problem lies in the XSL code I showed because that's the only part of the code that should be affecting the example I provided. If nobody finds any errors in the XSL snippet, perhaps someone could take a look at the full XSL.

UPDATE: Here is a link (https://drive.google.com/folderview?id=0B647OE0WvD5-RFFPMjhqSjk3cVE&usp=sharing) to all of the files. This includes the entire XSL and XML, an additional XSL that gets imported, the resulting output MRC file, and a TXT version of the MRC file for easier viewing.

1

There are 1 answers

0
Bryn Lewis On

I would change this:

<xsl:variable name="str1" select="substring-before($string, '--')"/>
<xsl:variable name="str2" select="substring-after($string, '--')"/>

to this:

<xsl:variable name="str1" select="substring-before($string, '--')"/>
<xsl:variable name="str2" select="substring-after($string, concat($str1,'--'))"/>

this will ensure the str2 starts after the '--' that you think it does.