SXLM0001: Too many nested apply-templates calls. saxon 8

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

SXLM0001: Too many nested apply-templates calls. saxon 8

Dave Pawson-2
Tiny source file,
single template in the xslt file.
200K external xml file loaded into a variable (but not transformed)
Calls the identity transform (imported) which is the source
of the reported error.

Can I get round this precaution?

regards DaveP




<xsl:variable name="acronyms"
    select="document('acronyms.xml')/acronyms" />
  <xsl:variable name="acronym-regex" as="xs:string"
    select="string-join($acronyms/acr, '|')" />

  <xsl:template match="p|pre" priority="0.6">

    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:analyze-string select="." regex="{$acronym-regex}">
        <xsl:matching-substring>
            <acronym><xsl:value-of select="." /></acronym>
            </xsl:matching-substring>
           <xsl:non-matching-substring>
              <xsl:value-of select="." />
            </xsl:non-matching-substring>
          </xsl:analyze-string>
        </xsl:copy>
      </xsl:template>






--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: SXLM0001: Too many nested apply-templates calls. saxon 8

Christian Roth-4
Dave Pawson wrote:

>    <xsl:copy>
>      <xsl:copy-of select="@*"/>
>      <xsl:analyze-string select="." regex="{$acronym-regex}">
>        <xsl:matching-substring>
>            <acronym><xsl:value-of select="." /></acronym>
>            </xsl:matching-substring>
>           <xsl:non-matching-substring>
>              <xsl:value-of select="." />
>            </xsl:non-matching-substring>
>          </xsl:analyze-string>
>        </xsl:copy>

This looks very much to be the same issue I reported under the subject
"[saxon] Regex, Mac OS X VM differences and a console message" a few
days ago. We were seeing it only on Mac OS X JVM 1.4.2 at that time, but
in the meantime have been getting reports of this error message on some
versions of the Java 1.4.x runtime on Windows as well, though we were
not able to reproduce it on our own Windows installations so far. We are
in the process of collecting and narrowing the variables and produce a
small enough test case to be usable in any way - something you seem to
have already succeeded at. :-)

The important thing in common seems to be the use of regular
expressions. I'll be watching this thread and hopefully be able to add
another test case shortly.

Regards, Christian.



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: SXLM0001: Too many nested apply-templates calls. saxon 8

Michael Kay
In reply to this post by Dave Pawson-2
The error message just means "stack overflow". The "Too many nested
apply-templates calls" is a bad guess at the reason.

What seems to be happening is that you've constructed a rather large regular
expression programmatically, and the Java regex engine can't evaluate it
with the stack space available. How large is the regex, as a matter of
interest?

I would have thought that there were better ways of achieving what you're
trying to do: perhaps you could explain the problem you are trying to solve?

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> Dave Pawson
> Sent: 13 March 2006 15:45
> To: [hidden email]
> Subject: [saxon] SXLM0001: Too many nested apply-templates
> calls. saxon 8
>
> Tiny source file,
> single template in the xslt file.
> 200K external xml file loaded into a variable (but not transformed)
> Calls the identity transform (imported) which is the source
> of the reported error.
>
> Can I get round this precaution?
>
> regards DaveP
>
>
>
>
> <xsl:variable name="acronyms"
>     select="document('acronyms.xml')/acronyms" />
>   <xsl:variable name="acronym-regex" as="xs:string"
>     select="string-join($acronyms/acr, '|')" />
>
>   <xsl:template match="p|pre" priority="0.6">
>
>     <xsl:copy>
>       <xsl:copy-of select="@*"/>
>       <xsl:analyze-string select="." regex="{$acronym-regex}">
>         <xsl:matching-substring>
>             <acronym><xsl:value-of select="." /></acronym>
>             </xsl:matching-substring>
>            <xsl:non-matching-substring>
>               <xsl:value-of select="." />
>             </xsl:non-matching-substring>
>           </xsl:analyze-string>
>         </xsl:copy>
>       </xsl:template>
>
>
>
>
>
>
> --
> Dave Pawson
> XSLT XSL-FO FAQ.
> http://www.dpawson.co.uk
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking
> scripting language
> that extends applications into web and mobile media. Attend
> the live webcast
> and join the prime developer group breaking into this new
> coding territory!
> <a href="http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: SXLM0001: Too many nested apply-templates calls. saxon 8

Dave Pawson-2
On 13/03/06, Michael Kay <[hidden email]> wrote:

> What seems to be happening is that you've constructed a rather large regular
> expression programmatically, and the Java regex engine can't evaluate it
> with the stack space available. How large is the regex, as a matter of
> interest?
..... About 200K?

>
> I would have thought that there were better ways of achieving what you're
> trying to do: perhaps you could explain the problem you are trying to solve?

Problem. Pick out the acronyms from 6000 documents on Gutenberg.
Solution
http://www.dpawson.co.uk/xsl/rev2/regex2.html#d14001e157
(I blame Jeni :-)

Then pick a list of 12000 acronyms,
mark them up and process.

Result? The stack issue.

I'm working with 1GB of memory, but not declaring it to the jvm.

regards

--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: SXLM0001: Too many nested apply-templates calls. saxon 8

Michael Kay
I don't really know how well a general-purpose regex engine is going to cope
with a regex expression of 200K characters, but it certainly wouldn't
surprise me if it splutters and dies.

I don't think you need it here. Construct an XML document containing the
acronyms and their definitions, and use keys to index it.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> Dave Pawson
> Sent: 13 March 2006 17:41
> To: [hidden email]
> Subject: Re: [saxon] SXLM0001: Too many nested
> apply-templates calls. saxon 8
>
> On 13/03/06, Michael Kay <[hidden email]> wrote:
>
> > What seems to be happening is that you've constructed a
> rather large regular
> > expression programmatically, and the Java regex engine
> can't evaluate it
> > with the stack space available. How large is the regex, as
> a matter of
> > interest?
> ..... About 200K?
>
> >
> > I would have thought that there were better ways of
> achieving what you're
> > trying to do: perhaps you could explain the problem you are
> trying to solve?
>
> Problem. Pick out the acronyms from 6000 documents on Gutenberg.
> Solution
> http://www.dpawson.co.uk/xsl/rev2/regex2.html#d14001e157
> (I blame Jeni :-)
>
> Then pick a list of 12000 acronyms,
> mark them up and process.
>
> Result? The stack issue.
>
> I'm working with 1GB of memory, but not declaring it to the jvm.
>
> regards
>
> --
> Dave Pawson
> XSLT XSL-FO FAQ.
> http://www.dpawson.co.uk
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking
> scripting language
> that extends applications into web and mobile media. Attend
> the live webcast
> and join the prime developer group breaking into this new
> coding territory!
> <a href="http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: SXLM0001: Too many nested apply-templates calls. saxon 8

Dave Pawson-2
On 14/03/06, Michael Kay <[hidden email]> wrote:
> I don't really know how well a general-purpose regex engine is going to cope
> with a regex expression of 200K characters, but it certainly wouldn't
> surprise me if it splutters and dies.
Ditto, but it is so elegant :-)

>
> I don't think you need it here. Construct an XML document containing the
> acronyms and their definitions, and use keys to index it.

I have the acronyms in such a file.
What that omits is matching any of the acronyms within the CDATA
of the source document?

I can't see how keys can be used in place of a regex Michael?

regards

--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: SXLM0001: Too many nested apply-templates calls. saxon 8

Michael Kay
 
>
> I can't see how keys can be used in place of a regex Michael?

You need to show me cut-down samples of the input and output files. So far I
haven't seen the detail of the problem, only an outline description.

Michael Kay




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: SXLM0001: Too many nested apply-templates calls. saxon 8

Dave Pawson-2
On 14/03/06, Michael Kay <[hidden email]> wrote:
>
> >
> > I can't see how keys can be used in place of a regex Michael?
>
> You need to show me cut-down samples of the input and output files. So far I
> haven't seen the detail of the problem, only an outline description.

XSL

 <!-- External acronyms -->
<xsl:variable name="acronyms"
    select="document('acronyms.xml')/acronyms" />

  <xsl:variable name="acronym-regex" as="xs:string"
    select="string-join($acronyms/acr, '|')" />


  <xsl:template match="p|pre" priority="0.6">

    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:analyze-string select="." regex="{$acronym-regex}">
        <xsl:matching-substring>
            <acronym><xsl:value-of select="." /></acronym>
            </xsl:matching-substring>
           <xsl:non-matching-substring>
              <xsl:value-of select="." />
            </xsl:non-matching-substring>
          </xsl:analyze-string>
        </xsl:copy>
      </xsl:template>

I'm only processing p and pre elements as yet.
I imprort the identity transform.

Input is a gutenberg text, basically any sequence of p and pre elements.


E.g.

<pre>

   [NOTE by the Project Gutenberg Contributor of this file:

   This etext was prepared by Alan. R. Light To assure a high quality text,
   the original was typed in (manually) twice and electronically compared.
   Italicized words or phrases are CAPITALIZED.
</pre>

(Which is why I'm failing badly with plain regex)

The acronyms file looks like
<?xml version="1.0" encoding="utf-8" ?>
<acronyms>
<acr>0TLP</acr>
<acr>10K</acr>
<acr>10Q</acr>
<acr>10bt</acr>
<acr>12AF</acr>
<acr>1394</acr>
<acr>143</acr>
<acr>1FTR</acr>
<acr>1LT</acr>
<acr>24-7</acr>
<acr>24KHGE</acr>
<acr>2B1Q</acr>
<acr>2D</acr>
<acr>2DEG</acr>
<acr>2IC</acr>
<acr>2Lt</acr>
<acr>2PC</acr>
<acr>2h0t4u</acr>
<acr>3ACC</acr>
<acr>3AF</acr>
<acr>3D</acr>
<acr>3DMF</acr>
<acr>3DS</acr>
<acr>3M</acr>
</acronyms>

Yes, could be shortened a little.


I think that presents the problem.

If we can crack this, we have synthetic voice versions
of most of the Gutenberg texts for our customers.


regards

--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: SXLM0001: Too many nested apply-templates calls. saxon 8

Andrew Welch
On 3/14/06, Dave Pawson <[hidden email]> wrote:

> On 14/03/06, Michael Kay <[hidden email]> wrote:
> >
> > >
> > > I can't see how keys can be used in place of a regex Michael?
> >
> > You need to show me cut-down samples of the input and output files. So far I
> > haven't seen the detail of the problem, only an outline description.
>
> XSL
>
>  <!-- External acronyms -->
> <xsl:variable name="acronyms"
>     select="document('acronyms.xml')/acronyms" />
>
>   <xsl:variable name="acronym-regex" as="xs:string"
>     select="string-join($acronyms/acr, '|')" />
>
>
>   <xsl:template match="p|pre" priority="0.6">
>
>     <xsl:copy>
>       <xsl:copy-of select="@*"/>
>       <xsl:analyze-string select="." regex="{$acronym-regex}">
>         <xsl:matching-substring>
>             <acronym><xsl:value-of select="." /></acronym>
>             </xsl:matching-substring>
>            <xsl:non-matching-substring>
>               <xsl:value-of select="." />
>             </xsl:non-matching-substring>
>           </xsl:analyze-string>
>         </xsl:copy>
>       </xsl:template>

I'll jump in too if that's ok...

I would create a key:

<xsl:key name="acronyms" match="acr" use="."/>

and then check if the acronym exists by seeing if its in the key:

<xsl:choose>
  <xsl:when test="key('acronyms', ., $acronyms)">
    <acronym>....
  </xsl:when>
  <xsl:otherwise>
    <xsl:value-of select="."/>

cheers
andrew


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: SXLM0001: Too many nested apply-templates calls. saxon 8

Michael Kay
The only change I would make to Andrew's code is to use templates:

<xsl:template match="*[key('acr', ., $acronyms)]">
  <xsl:copy>
    <xsl:copy-of select="@*">
    <acronym><xsl:value-of select="."/></acronym>
  </xsl:copy>
</xsl:template>

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*">
    <xsl:value-of select="."/>
  </xsl:copy>
</xsl:template>

However, I'm not quite sure this is what you want. In your code
<p>IBMICL</p> would expand to
<p><acronym>IBM</acronym><acronym>ICL</acronym></p> if IBM and ICL are both
acronyms. Is this what you want? I would have thought that if you are
looking for acronyms anywhere in the text, then you need to do some kind of
tokenization of the text first, and then process each token to see if it is
an acronym. It might well be appropriate to use a regex for the
tokenization, but testing whether the token is an acronym is still best done
using a key.

Michael Kay

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> andrew welch
> Sent: 14 March 2006 15:51
> To: [hidden email]
> Subject: Re: [saxon] SXLM0001: Too many nested
> apply-templates calls. saxon 8
>
> On 3/14/06, Dave Pawson <[hidden email]> wrote:
> > On 14/03/06, Michael Kay <[hidden email]> wrote:
> > >
> > > >
> > > > I can't see how keys can be used in place of a regex Michael?
> > >
> > > You need to show me cut-down samples of the input and
> output files. So far I
> > > haven't seen the detail of the problem, only an outline
> description.
> >
> > XSL
> >
> >  <!-- External acronyms -->
> > <xsl:variable name="acronyms"
> >     select="document('acronyms.xml')/acronyms" />
> >
> >   <xsl:variable name="acronym-regex" as="xs:string"
> >     select="string-join($acronyms/acr, '|')" />
> >
> >
> >   <xsl:template match="p|pre" priority="0.6">
> >
> >     <xsl:copy>
> >       <xsl:copy-of select="@*"/>
> >       <xsl:analyze-string select="." regex="{$acronym-regex}">
> >         <xsl:matching-substring>
> >             <acronym><xsl:value-of select="." /></acronym>
> >             </xsl:matching-substring>
> >            <xsl:non-matching-substring>
> >               <xsl:value-of select="." />
> >             </xsl:non-matching-substring>
> >           </xsl:analyze-string>
> >         </xsl:copy>
> >       </xsl:template>
>
> I'll jump in too if that's ok...
>
> I would create a key:
>
> <xsl:key name="acronyms" match="acr" use="."/>
>
> and then check if the acronym exists by seeing if its in the key:
>
> <xsl:choose>
>   <xsl:when test="key('acronyms', ., $acronyms)">
>     <acronym>....
>   </xsl:when>
>   <xsl:otherwise>
>     <xsl:value-of select="."/>
>
> cheers
> andrew
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking
> scripting language
> that extends applications into web and mobile media. Attend
> the live webcast
> and join the prime developer group breaking into this new
> coding territory!
> <a href="http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: SXLM0001: Too many nested apply-templates calls. saxon 8

Dave Pawson-2
On 14/03/06, Michael Kay <[hidden email]> wrote:
> The only change I would make to Andrew's code is to use templates:

<snip/>

> However, I'm not quite sure this is what you want. In your code
> <p>IBMICL</p> would expand to
> <p><acronym>IBM</acronym><acronym>ICL</acronym></p> if IBM and ICL are both
> acronyms. Is this what you want?
Clearly not in this example! This usage is to feed a text to speech engine
so my focus is on pronunciation.


 I would have thought that if you are
> looking for acronyms anywhere in the text, then you need to do some kind of
> tokenization of the text first, and then process each token to see if it is
> an acronym. It might well be appropriate to use a regex for the
> tokenization, but testing whether the token is an acronym is still best done
> using a key.

David C sent me a message (didn't arrive on the list, or I didn't see it)
which suggested that.
~Use a simple regex (I'd probably add \b[A-Z]+\b to get isolated words,
then as you suggest use the key.

I want to see how that works out  in practice. Because Gutenberg comes
from so many sources it's not going to be easy to find something common
across all sources :-)

Thanks Michael (Andrew and David!)

I'm off for two days, but I'll let you know how I get on later.


--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: SXLM0001: Too many nested apply-templates calls. saxon 8

Dave Pawson-2
On 15/03/06, Dave Pawson <[hidden email]> wrote:

> I'm off for two days, but I'll let you know how I get on later.

Half way.

A solution.
Unsure if my key usage is performant, but with
a 360K acronyms file the processing time is negligable.

Mikes book, page 334, implies I can use variables in the key -
but provides no example, so I used Jeni's example, page 517.

 <xsl:key name="acronyms" match="a" use="."/>

  <xsl:variable name="acronym-regex" select="'[A-Z0-9]{2,}'"/>

   <xsl:template match="p|pre" priority="0.6">
     <xsl:copy>
       <xsl:copy-of select="@*"/>
       <xsl:analyze-string select="." regex="{$acronym-regex}">
         <xsl:matching-substring>
           <xsl:choose>
             <xsl:when test="key('acronyms',.,doc('acronyms.xml')/acronyms)">
               <acronym><xsl:value-of select="." /></acronym>
             </xsl:when>
             <xsl:otherwise>
               <xsl:value-of select="."/>
             </xsl:otherwise>
           </xsl:choose>

             </xsl:matching-substring>
            <xsl:non-matching-substring>
               <xsl:value-of select="." />
             </xsl:non-matching-substring>
           </xsl:analyze-string>
         </xsl:copy>
       </xsl:template>

The acronyms.xml file is
<acronyms>
<a>$$</a>
<a>0G</a>
<a>0TLP</a>
<a>100VG</a>
<a>10BASE-T</a>
format

Tested with a tiny input file, the weaknesses are shown.

<doc>

<p>Test for ACR and RNIB NONaACRONYM</p>
</doc>

output is

<doc>

   <p>Test for <acronym>ACR</acronym> and <acronym>RNIB</acronym>
      <acronym>NON</acronym>a<acronym>ACRONYM</acronym>
   </p>
</doc>

David C pointed out a late change request which blocks
the non capturing \b
He points out that...
non capturing regexp expressions are "not germane" to xpath,don't you
know?

http://www.w3.org/Bugs/Public/show_bug.cgi?id=2732

Michael. How do I get a feature request in for the next revision please?

I can't get negative lookbehind either so I can't do it that way.

So I've got half a solution....


Unless you can see a better solution.

regards







--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: SXLM0001: Too many nested apply-templates calls. saxon 8

Dave Pawson-2
An improvement. Now picks out and ignores midword acronyms.

<xsl:key name="acronyms" match="a" use="."/>

 <xsl:variable name="acronym-regex" select="'\p{L}{2,}'"/>

  <xsl:template match="p|pre" priority="0.6">
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:analyze-string select="." regex="{$acronym-regex}">
        <xsl:matching-substring>
          <xsl:choose>
            <xsl:when test="key('acronyms',.,doc('acronyms.xml')/acronyms)">
              <acronym><xsl:value-of select="." /></acronym>
            </xsl:when>
            <xsl:otherwise>
              <xsl:value-of select="."/>
            </xsl:otherwise>
          </xsl:choose>
            </xsl:matching-substring>
           <xsl:non-matching-substring>
              <xsl:value-of select="." />
            </xsl:non-matching-substring>
          </xsl:analyze-string>
        </xsl:copy>
      </xsl:template>




Thanks for the help. That will do until I can improve the regex.

regards
--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help