I have a very strange problem. I have XML documents encoded in EAD that I'm transforming into MARC records for a library catalog. There is a section of the EAD document that looks like this:
<controlaccess>
<list type="simple">
<item><subject encodinganalog="650" source="lcsh">Prisons -- History -- 19th century</subject></item>
<item><subject encodinganalog="650" source="lcsh">Prisons -- Statistics -- History -- 19th century</subject></item>
<item><subject encodinganalog="650" source="lcsh">Prisons -- Statistics -- Extra term 1 -- History -- 19th century</subject></item>
<item><subject encodinganalog="650" source="lcsh">Prisons -- Statistics -- Extra term 1 -- Extra term 2 -- History -- 19th century</subject></item>
</list>
</controlaccess>
What the code does correctly is pull out each item/subject and create a MARC field for each one, and each term that's separated by "--" gets put into a separate subfield (either a, x, y, or whatever).
The code does this properly if there are 1-3 terms in a single subject element, but if there are 4 or more terms, the second term gets left out entirely and the rest of the terms (from the third one on) are extracted properly. I can't figure out why the second term gets skipped over if there are 4+ terms. That's what I'd like your help figuring out.
I'm using XSL 1.0 and the subject portion of the code looks like this. The parameter gets called properly from the main template.
<xsl:template name="subject_template">
<xsl:param name="string" />
<marc:datafield>
<xsl:choose>
<xsl:when test="contains($string, '--')!=0">
<xsl:variable name="tmp1" select="substring-before($string, '--')" />
<xsl:variable name="tmp2" select="substring-after($string, '--')" />
<marc:subfield code="a">
<xsl:value-of select="$tmp1" />
</marc:subfield>
<xsl:call-template name="subject_tokenize">
<xsl:with-param name="string" select="$tmp2" />
<xsl:with-param name="type" select="'x'" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<marc:subfield code="a">
<xsl:value-of select="$string" />
</marc:subfield>
</xsl:otherwise>
</xsl:choose>
</marc:datafield>
</xsl:template>
Here is the tokenize template, which is hundreds of lines long. I tried to only include what was necessary/relevant to my problem.. The 4 variables in the beginning (genx, etc) pull from a huge list of terms to determine what the subfield code should be.
<xsl:template name="subject_tokenize">
<xsl:param name="string" />
<xsl:param name="type" />
<xsl:variable name="genx">
<xsl:call-template name="genx" />
</xsl:variable>
<xsl:variable name="geny">
<xsl:call-template name="geny" />
</xsl:variable>
<xsl:variable name="formlist">
<xsl:call-template name="formlist" />
</xsl:variable>
<xsl:variable name="geoglist">
<xsl:call-template name="geoglist" />
</xsl:variable>
<xsl:if test="contains($string, '--')!=0">
<xsl:variable name="str1" select="substring-before($string, '--')"/>
<xsl:variable name="str2" select="substring-after($string, '--')"/>
<xsl:if test="contains($str2, '--')!=0">
<xsl:variable name="newstr2" select="substring-after($str2, '--')"/>
<xsl:variable name="tmpvar" select="substring-before($str2, '--')"/>
<xsl:choose>
<xsl:when test="testsomething">
do stuff
</xsl:when>
<xsl:otherwise>
<xsl:if test="contains($geoglist, translate($str1, '.', ''))!=0">
<marc:subfield code="z">
<xsl:value-of select="$str1"/>
</marc:subfield>
<xsl:if
test="contains($formlist, translate(substring-before($str2, '--'), '.', ''))!=0">
<marc:subfield code="v">
<xsl:value-of select="substring-before($str2, '--')"/>
</marc:subfield>
</xsl:if>
<xsl:if
test="contains($geny, translate(substring-before($str2, '--'), '.', ''))!=0">
<marc:subfield code="y">
<xsl:value-of select="substring-before($str2, '--')"/>
</marc:subfield>
</xsl:if>
<xsl:if
test="contains($genx, translate(substring-before($str2, '--'), '.', ''))!=0">
<marc:subfield code="x">
<xsl:value-of select="substring-before($str2, '--')"/>
</marc:subfield>
</xsl:if>
<xsl:if
test="contains($formlist, translate(substring-before($str2, '--'), '.', ''))=0 and contains($genx, translate(substring-before($str2, '--'), '.', ''))=0 and contains($geny, translate(substring-before($str2, '--'), '.', ''))=0">
<marc:subfield code="z">
<xsl:value-of select="substring-before($str2, '--')"/>
</marc:subfield>
</xsl:if>
</xsl:if>
<xsl:if test="contains($formlist, translate($str1, '.', ''))!=0">
<marc:subfield code="v">
<xsl:value-of select="$str1"/>
</marc:subfield>
</xsl:if>
<xsl:if test="contains($geny, translate($str1, '.', ''))!=0">
<marc:subfield code="y">
<xsl:value-of select="$str1"/>
</marc:subfield>
</xsl:if>
<xsl:if
test="contains($formlist, translate($str1, '.', ''))=0 and contains($geny, translate($str1, '.', ''))!=0">
<marc:subfield code="x">
<xsl:value-of select="$str1"/>
</marc:subfield>
</xsl:if>
<xsl:if test="contains($geoglist, translate($str1, '.', ''))=0">
<xsl:if
test="contains($formlist, translate(substring-before($str2, '--'), '.', ''))!=0">
<marc:subfield code="v">
<xsl:value-of select="substring-before($str2, '--')"/>
</marc:subfield>
</xsl:if>
<xsl:if
test="contains($geny, translate(substring-before($str2, '--'), '.', ''))!=0">
<marc:subfield code="y">
<xsl:value-of select="substring-before($str2, '--')"/>
</marc:subfield>
</xsl:if>
<xsl:if
test="contains($geoglist, translate(substring-before($str2, '--'), '.', ''))!=0">
<marc:subfield code="z">
<xsl:value-of select="substring-before($str2, '--')"/>
</marc:subfield>
</xsl:if>
<xsl:if
test="contains($geoglist, translate(substring-before($str2, '--'), '.', ''))=0 and contains($geny, translate(substring-before($str2, '--'), '.', ''))=0 and contains($formlist, translate(substring-before($str2, '--'), '.', ''))=0">
<marc:subfield code="x">
<xsl:value-of select="substring-before($str2, '--')"/>
</marc:subfield>
</xsl:if>
</xsl:if>
<xsl:call-template name="subject_tokenize">
<xsl:with-param name="string" select="$newstr2"/>
<xsl:with-param name="type" select="'x'"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
My output looks like this:
=650 \0$aPrisons $x History $x 19th century
=650 \0$aPrisons $x History $x 19th century
=650 \0$aPrisons $x Extra term 1 $x History $x 19th century
=650 \0$aPrisons $x Extra term 1 $x Extra term 2 $x History $x 19th century
The first 650 field is correct. The following 3 are all missing the second term, "Statistics." This is just an example and has been replicated with different terms, different ordering of terms, and/or different quantity of terms. I assume the problem lies in the XSL code I showed because that's the only part of the code that should be affecting the example I provided. If nobody finds any errors in the XSL snippet, perhaps someone could take a look at the full XSL.
UPDATE: Here is a link (https://drive.google.com/folderview?id=0B647OE0WvD5-RFFPMjhqSjk3cVE&usp=sharing) to all of the files. This includes the entire XSL and XML, an additional XSL that gets imported, the resulting output MRC file, and a TXT version of the MRC file for easier viewing.
I would change this:
to this:
this will ensure the str2 starts after the '--' that you think it does.