Matching pattern with key()

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Matching pattern with key()

Imsieke, Gerrit, le-tex
I’m using generated XSLT 2.0 stylesheets with templates of the form:

<xsl:template match="key('by-srcpath', '/some/string')">

Their purpose is to match elements with certain values of their @srcpath
attribute.

When I recently migrated from Saxon 9.5 to 9.6, I noticed that a
stylesheet that used to take less than a second would run for hours.

In order to see what’s going on, I put the key’s @use expression in a
function and used -TP to count how often the function was called.

The source document contains 3231 @srcpath attributes. In Saxon 9.5.1.8,
the function is called 3231 times. In Saxon 9.6.0.5, the function is
called 63,576,387 times.

I boiled this down to the following test case:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my"
  version="2.0" exclude-result-prefixes="xs">

  <xsl:function name="my:key" as="xs:string?">
    <xsl:param name="arg" as="xs:string?"/>
    <xsl:message select="'arg', $arg"/>
    <xsl:sequence select="$arg"/>
  </xsl:function>

  <xsl:key name="by-key" match="*[@key]" use="my:key(@key)"/>

  <xsl:template match="@* | *">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="key('by-key', 'c')"/>
</xsl:stylesheet>

Applied to the following input,

<doc>
  <p key="a">A</p>
  <p key="b">B</p>
  <p key="c">C</p>
</doc> ,

it yields

arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
arg a
arg b
arg c
<?xml version="1.0" encoding="UTF-8"?><doc>
  <p key="a">A</p>
  <p key="b">B</p>

</doc>

This amounts to 39 function invocations.

One could argue that because of the -TP or the message, the processor
isn’t able to optimize. However, in my generated code, I usually don’t
output messages or use -TP measurements, and still it takes hours.

Saxon 9.5 gives this result for the example above:

arg a
arg b
arg c
<?xml version="1.0" encoding="UTF-8"?><doc>
  <p key="a">A</p>
  <p key="b">B</p>

</doc>

39 is the number of attributes and nodes (excluding the document node)
times the number of computed keys. If I remove two whitespace text
nodes, the count will decrease to 33 (11×3). If I remove another p
element (including its attribute and text node), the count is 16 (8×2).
If I put a comment around the p element, the count is 18.

If I compute a key for every node() instead of elements, the count is
143 – 11 nodes incl. doc node, times (10 non-doc nodes plus 3 atts).

As a workaround, I’m going to switch to patterns as in
<xsl:template match="*[my:key(@key) = 'c']"/>

Is the 9.6 phenomenon a side effect of XSLT 3 pattern matching?

Or am I using “pattern match by key()” on the unfounded assumption that
it will work like “pattern match by id()”?

Gerrit

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Matching pattern with key()

Michael Kay
Thanks for reporting it. I am investigating. The problem appears to occur in Saxon-HE but not in Saxon-EE.

Michael Kay
Saxonica
[hidden email]
+44 (0) 118 946 5893




On 28 Apr 2015, at 20:54, Imsieke, Gerrit, le-tex <[hidden email]> wrote:

> I’m using generated XSLT 2.0 stylesheets with templates of the form:
>
> <xsl:template match="key('by-srcpath', '/some/string')">
>
> Their purpose is to match elements with certain values of their @srcpath
> attribute.
>
> When I recently migrated from Saxon 9.5 to 9.6, I noticed that a
> stylesheet that used to take less than a second would run for hours.
>
> In order to see what’s going on, I put the key’s @use expression in a
> function and used -TP to count how often the function was called.
>
> The source document contains 3231 @srcpath attributes. In Saxon 9.5.1.8,
> the function is called 3231 times. In Saxon 9.6.0.5, the function is
> called 63,576,387 times.
>
> I boiled this down to the following test case:
>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>  xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my"
>  version="2.0" exclude-result-prefixes="xs">
>
>  <xsl:function name="my:key" as="xs:string?">
>    <xsl:param name="arg" as="xs:string?"/>
>    <xsl:message select="'arg', $arg"/>
>    <xsl:sequence select="$arg"/>
>  </xsl:function>
>
>  <xsl:key name="by-key" match="*[@key]" use="my:key(@key)"/>
>
>  <xsl:template match="@* | *">
>    <xsl:copy>
>      <xsl:apply-templates select="@*, node()"/>
>    </xsl:copy>
>  </xsl:template>
>
>  <xsl:template match="key('by-key', 'c')"/>
> </xsl:stylesheet>
>
> Applied to the following input,
>
> <doc>
>  <p key="a">A</p>
>  <p key="b">B</p>
>  <p key="c">C</p>
> </doc> ,
>
> it yields
>
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> <?xml version="1.0" encoding="UTF-8"?><doc>
>  <p key="a">A</p>
>  <p key="b">B</p>
>
> </doc>
>
> This amounts to 39 function invocations.
>
> One could argue that because of the -TP or the message, the processor
> isn’t able to optimize. However, in my generated code, I usually don’t
> output messages or use -TP measurements, and still it takes hours.
>
> Saxon 9.5 gives this result for the example above:
>
> arg a
> arg b
> arg c
> <?xml version="1.0" encoding="UTF-8"?><doc>
>  <p key="a">A</p>
>  <p key="b">B</p>
>
> </doc>
>
> 39 is the number of attributes and nodes (excluding the document node)
> times the number of computed keys. If I remove two whitespace text
> nodes, the count will decrease to 33 (11×3). If I remove another p
> element (including its attribute and text node), the count is 16 (8×2).
> If I put a comment around the p element, the count is 18.
>
> If I compute a key for every node() instead of elements, the count is
> 143 – 11 nodes incl. doc node, times (10 non-doc nodes plus 3 atts).
>
> As a workaround, I’m going to switch to patterns as in
> <xsl:template match="*[my:key(@key) = 'c']"/>
>
> Is the 9.6 phenomenon a side effect of XSLT 3 pattern matching?
>
> Or am I using “pattern match by key()” on the unfounded assumption that
> it will work like “pattern match by id()”?
>
> Gerrit
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Matching pattern with key()

Michael Kay
In reply to this post by Imsieke, Gerrit, le-tex
OK, got it.

The regression is caused by the fix to bug 1968, here:

https://saxonica.plan.io/issues/1968

The essence of this bug is that it's not safe to assume (as Saxon was doing up to and including 9.5) that an index for a given (stylesheet, sourceDoc) pair is reusable for different transformations if the key definition depends on per-transformation variables, e.g. on the global context item or global parameters.

A key definition that calls a user-defined function is deemed unsafe under this rule, because the analysis otherwise gets too complicated. But in Saxon-EE the call on the user-defined function has been eliminated by inlining.

The fix to bug 1968 produced correct results but bad performance, because in effect the index isn't being reused at all, not even within a single transformation: it's being rebuilt each time it is needed.

The fix-to-the-fix appears to be trivial to implement.

Michael Kay
Saxonica
[hidden email]
+44 (0) 118 946 5893




On 28 Apr 2015, at 20:54, Imsieke, Gerrit, le-tex <[hidden email]> wrote:

> I’m using generated XSLT 2.0 stylesheets with templates of the form:
>
> <xsl:template match="key('by-srcpath', '/some/string')">
>
> Their purpose is to match elements with certain values of their @srcpath
> attribute.
>
> When I recently migrated from Saxon 9.5 to 9.6, I noticed that a
> stylesheet that used to take less than a second would run for hours.
>
> In order to see what’s going on, I put the key’s @use expression in a
> function and used -TP to count how often the function was called.
>
> The source document contains 3231 @srcpath attributes. In Saxon 9.5.1.8,
> the function is called 3231 times. In Saxon 9.6.0.5, the function is
> called 63,576,387 times.
>
> I boiled this down to the following test case:
>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>  xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my"
>  version="2.0" exclude-result-prefixes="xs">
>
>  <xsl:function name="my:key" as="xs:string?">
>    <xsl:param name="arg" as="xs:string?"/>
>    <xsl:message select="'arg', $arg"/>
>    <xsl:sequence select="$arg"/>
>  </xsl:function>
>
>  <xsl:key name="by-key" match="*[@key]" use="my:key(@key)"/>
>
>  <xsl:template match="@* | *">
>    <xsl:copy>
>      <xsl:apply-templates select="@*, node()"/>
>    </xsl:copy>
>  </xsl:template>
>
>  <xsl:template match="key('by-key', 'c')"/>
> </xsl:stylesheet>
>
> Applied to the following input,
>
> <doc>
>  <p key="a">A</p>
>  <p key="b">B</p>
>  <p key="c">C</p>
> </doc> ,
>
> it yields
>
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> arg a
> arg b
> arg c
> <?xml version="1.0" encoding="UTF-8"?><doc>
>  <p key="a">A</p>
>  <p key="b">B</p>
>
> </doc>
>
> This amounts to 39 function invocations.
>
> One could argue that because of the -TP or the message, the processor
> isn’t able to optimize. However, in my generated code, I usually don’t
> output messages or use -TP measurements, and still it takes hours.
>
> Saxon 9.5 gives this result for the example above:
>
> arg a
> arg b
> arg c
> <?xml version="1.0" encoding="UTF-8"?><doc>
>  <p key="a">A</p>
>  <p key="b">B</p>
>
> </doc>
>
> 39 is the number of attributes and nodes (excluding the document node)
> times the number of computed keys. If I remove two whitespace text
> nodes, the count will decrease to 33 (11×3). If I remove another p
> element (including its attribute and text node), the count is 16 (8×2).
> If I put a comment around the p element, the count is 18.
>
> If I compute a key for every node() instead of elements, the count is
> 143 – 11 nodes incl. doc node, times (10 non-doc nodes plus 3 atts).
>
> As a workaround, I’m going to switch to patterns as in
> <xsl:template match="*[my:key(@key) = 'c']"/>
>
> Is the 9.6 phenomenon a side effect of XSLT 3 pattern matching?
>
> Or am I using “pattern match by key()” on the unfounded assumption that
> it will work like “pattern match by id()”?
>
> Gerrit
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help