Quantcast

RE: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

Michael Kay
An interim response: I haven't tried tracing Saxon's behaviour on this yet,
doing that is always very time-consuming.

The use of virtual node copies (when it kicks in at all) makes the cost of
copying a node independent of the size of the subtree rooted at that node.
However, what you are doing here is copying a sequence of nodes, and the
cost of this will still be proportional to the length of the sequence.

It's not clear to me here why you are copying the nodes at all. Why not use
xsl:sequence instead of xsl:copy? (To be honest, I suspect that the "virtual
copy" optimization is essentially bringing the cost of xsl:copy close to
that of xsl:sequence in situations where the user could have written
xsl:sequence in the first place. So perhaps my hopes for it, quoted in your
message, were a little over-optimistic. It's still worth doing, though,
because people will write xsl:copy where they didn't need to.)

You should be getting some benefits here from the ordinary lazy evaluation
behavior. This should mean that the result of the f:envAppend call is not
actually materialized immediately. However, Saxon puts an artificial limit
on the depth of nesting of "closures" - when they get more than 10 deep, the
sequence is materialized, because it actually takes less memory to store the
sequence of 10 items than to store a nested tree of 10 closures. This is
probably less true than it was, because I'm more selective about which local
variables are saved in the closure, and it might be worth experimenting with
different cut-offs.

More relevantly, however, Saxon currently does an optimization designed to
help head-tail recursion: when you do a delayed evaluation of
$x[position()>1] and it sees that $x is itself a delayed evaluation of
$y[position()>n], then it collapses the two expressions constructing
$y[position()>n+1]. This means that head-tail recursion is often linear in
space and time. What seems to be needed here is a similar collapsing of
levels for append operations, so that append($x, X), when $x is itself
append($y, Y), gets turned into append($y, Y, X). That doesn't look too
difficult in principle - it's a question of recognizing the actual patterns
that arise in practice.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> Dimitre Novatchev
> Sent: 03 September 2005 06:40
> To: [hidden email]
> Subject: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for big enough N
>
> Hi,
>
> Sometimes ago in the thread:
>   "Template recursion, StackOverflowError, saxon:while a nd variable
> assignability"
>
> Michael Kay recommended that using xsl:copy-of to append to a node-set
> will only need linear time:
>
> "This style of programming, where updating an environment
> involves copying
> the old environment using xsl:copy-of and appending a new
> name=value pair,
> should benefit greatly from the optimization introduced a
> couple of releases
> ago whereby xsl:copy-of creates a virtual copy of a tree
> rather than a real
> copy. This won't reduce the search time for getting the value of a
> particular variable in the store, but it will greatly reduce
> the copy time
> and the use of memory; and for an environment that only contains the
> (changing) value of one variable, this is what matters.
>
> The virtual copy is in essence just a pointer to the node
> that has been
> copied. The virtual nodes in the virtual copy are
> instantiated only when
> they are referenced, and themselves consist simply of pointers to the
> original nodes: they differ from the original nodes only in
> that they have
> different node identifiers (and thus a different position in
> document order)
> and that axes cannot navigate outside the subtree that was
> copied. A copy of
> a virtual copy is implemented as a virtual copy of the
> original, to avoid
> long chains of indirection."
>
> Indeed for small N I observe that the time needed to append N nodes ro
> the node-set increases in a linear fashion.
>
> The code below:
>
> <xsl:stylesheet version="2.0"
>  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>  xmlns:xs="http://www.w3.org/2001/XMLSchema"
>  xmlns:f="http://fxsl.sf.net/"
>  exclude-result-prefixes="f xs"
>  >
> <xsl:import href="../f/func-iter.xsl"/>
> <xsl:output omit-xml-declaration="yes"/>
>
> <xsl:variable name="vEnv" as="element()*">
> <x n="n1">
> <v>v1</v>
> </x>
> </xsl:variable>
>
> <xsl:template match="/">
> <t>
>   <xsl:sequence select="f:iter(100,
> f:envAppend(), $vEnv)"/>
> </t>
> </xsl:template>
>
> <xsl:function name="f:envAppend" as="element()*">
> <xsl:param name="pEnv" as="element()*"/>
> <x name="Nx">
> <v>Vz</v>
> </x>
> <xsl:copy-of select="$pEnv"/>
> </xsl:function>
>
> <xsl:function name="f:envAppend" as="element()">
> <f:envAppend/>
> </xsl:function>
>
> <xsl:template match="f:envAppend" mode="f:FXSL">
> <xsl:param name="arg1" as="element()*"/>
> <xsl:sequence select="f:envAppend($arg1)"/>
> </xsl:template>
> </xsl:stylesheet>
>
>
> takes 172 milliseconds to append 100 nodes.
>
> It takes 1672 milliseconds for appending 1000 nodes (specified by
> changing the firs argument of f:iter() above from 100 to 1000).
>
> However, the time for appending 5000 nodes is 33828 ms (20 times more
> than for 1000 nodes)
>
> and for appending 10000 nodes the time is  137140 ms (4 times more
> than the time for 5000 nodes).
>
> I did check that the implementation of f:iter() does not contribute to
> the observed behaviour -- iterating other functions exibits even
> sublinear times.
>
> For example this code:
>
> <xsl:stylesheet version="2.0"
>  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>  xmlns:xs="http://www.w3.org/2001/XMLSchema"
>  xmlns:f="http://fxsl.sf.net/"
>  exclude-result-prefixes="f xs"
>  >
>    <xsl:import href="../f/func-iter.xsl"/>
>    <xsl:import href="../f/func-Operators.xsl"/>
>    
>    <xsl:output omit-xml-declaration="yes"/>
>   <xsl:template match="/">
>   <t>
>     <xsl:sequence select="f:iter(1000, f:add(1), 1)"/>
>   </t>
>  
>  </xsl:template>  
> </xsl:stylesheet>
>
> takes 672 milliseconds for 1000 additions.
> It takes 11141 ms for 100 thousand additions --not 100 times more, but
> only 20 times more time.
>
>
> If virtual copy really takes linear times then what is it I'm
> doing wrong?
>
> --
> Cheers,
> Dimitre Novatchev
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development
> Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams *
> Testing & QA
> Security * Process Improvement & Measurement *
> http://www.sqe.com/bsce5sf
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

Dimitre Novatchev
Hi Mike,
Thank you for your reply.

I changed xsl:copy-of to xsl:sequence and although the times
decreased, the observed
O(N^2) time complexity still holds for N > 1000.

I hope that an optimisation described in the introductory chapter of
the book "Purely Functional Data Structures" by Chris Okasaki:

   http://www.amazon.com/exec/obidos/tg/detail/-/0521663504/002-7517683-2986405?v=glance

Chapter 2 Persistence -- page 9 and 10 (can be seen using Amazon's
"View Inside" feature).

are very logical, easy to understand and to implement. They are called
"List Sharing" and "Tree Sharing"


The idea is that the concatenation of two lists:

     zs = xs ++ ys

can be implemented by copying only the first list -- xs and pointing
from its last element to (sharing) the list ys.


It is easy to see that using list sharing it is possible to implement
repeated append of an element in front of a list in linear time.

From the description of "virtual copy" I was left with the impression
that it implemented exactly the list sharing technique.

Similarly, adding a node to a binary search tree can be done by
copying only the nodes that are on the way of the new node until it
finds its place in the tree (page 13) and pointing from the new/copied
nodes to the rest of the unchanged nodes (subtree sharing).

I believe that implementing efficiently the data structures described
by Okasaki is extremely important as this decreases significantly the
time necessary for "update" operations.


Thanks,
Dimitre Novatchev

On 9/7/05, Michael Kay <[hidden email]> wrote:

> An interim response: I haven't tried tracing Saxon's behaviour on this yet,
> doing that is always very time-consuming.
>
> The use of virtual node copies (when it kicks in at all) makes the cost of
> copying a node independent of the size of the subtree rooted at that node.
> However, what you are doing here is copying a sequence of nodes, and the
> cost of this will still be proportional to the length of the sequence.
>
> It's not clear to me here why you are copying the nodes at all. Why not use
> xsl:sequence instead of xsl:copy? (To be honest, I suspect that the "virtual
> copy" optimization is essentially bringing the cost of xsl:copy close to
> that of xsl:sequence in situations where the user could have written
> xsl:sequence in the first place. So perhaps my hopes for it, quoted in your
> message, were a little over-optimistic. It's still worth doing, though,
> because people will write xsl:copy where they didn't need to.)
>
> You should be getting some benefits here from the ordinary lazy evaluation
> behavior. This should mean that the result of the f:envAppend call is not
> actually materialized immediately. However, Saxon puts an artificial limit
> on the depth of nesting of "closures" - when they get more than 10 deep, the
> sequence is materialized, because it actually takes less memory to store the
> sequence of 10 items than to store a nested tree of 10 closures. This is
> probably less true than it was, because I'm more selective about which local
> variables are saved in the closure, and it might be worth experimenting with
> different cut-offs.
>
> More relevantly, however, Saxon currently does an optimization designed to
> help head-tail recursion: when you do a delayed evaluation of
> $x[position()>1] and it sees that $x is itself a delayed evaluation of
> $y[position()>n], then it collapses the two expressions constructing
> $y[position()>n+1]. This means that head-tail recursion is often linear in
> space and time. What seems to be needed here is a similar collapsing of
> levels for append operations, so that append($x, X), when $x is itself
> append($y, Y), gets turned into append($y, Y, X). That doesn't look too
> difficult in principle - it's a question of recognizing the actual patterns
> that arise in practice.
>
> Michael Kay
> http://www.saxonica.com/
>
> > -----Original Message-----
> > From: [hidden email]
> > [mailto:[hidden email]] On Behalf Of
> > Dimitre Novatchev
> > Sent: 03 September 2005 06:40
> > To: [hidden email]
> > Subject: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for big enough N
> >
> > Hi,
> >
> > Sometimes ago in the thread:
> >   "Template recursion, StackOverflowError, saxon:while a nd variable
> > assignability"
> >
> > Michael Kay recommended that using xsl:copy-of to append to a node-set
> > will only need linear time:
> >
> > "This style of programming, where updating an environment
> > involves copying
> > the old environment using xsl:copy-of and appending a new
> > name=value pair,
> > should benefit greatly from the optimization introduced a
> > couple of releases
> > ago whereby xsl:copy-of creates a virtual copy of a tree
> > rather than a real
> > copy. This won't reduce the search time for getting the value of a
> > particular variable in the store, but it will greatly reduce
> > the copy time
> > and the use of memory; and for an environment that only contains the
> > (changing) value of one variable, this is what matters.
> >
> > The virtual copy is in essence just a pointer to the node
> > that has been
> > copied. The virtual nodes in the virtual copy are
> > instantiated only when
> > they are referenced, and themselves consist simply of pointers to the
> > original nodes: they differ from the original nodes only in
> > that they have
> > different node identifiers (and thus a different position in
> > document order)
> > and that axes cannot navigate outside the subtree that was
> > copied. A copy of
> > a virtual copy is implemented as a virtual copy of the
> > original, to avoid
> > long chains of indirection."
> >
> > Indeed for small N I observe that the time needed to append N nodes ro
> > the node-set increases in a linear fashion.
> >
> > The code below:
> >
> > <xsl:stylesheet version="2.0"
> >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> >  xmlns:f="http://fxsl.sf.net/"
> >  exclude-result-prefixes="f xs"
> >  >
> >       <xsl:import href="../f/func-iter.xsl"/>
> >       <xsl:output omit-xml-declaration="yes"/>
> >
> >       <xsl:variable name="vEnv" as="element()*">
> >               <x n="n1">
> >                       <v>v1</v>
> >               </x>
> >       </xsl:variable>
> >
> >       <xsl:template match="/">
> >               <t>
> >                  <xsl:sequence select="f:iter(100,
> > f:envAppend(), $vEnv)"/>
> >               </t>
> >       </xsl:template>
> >
> >       <xsl:function name="f:envAppend" as="element()*">
> >               <xsl:param name="pEnv" as="element()*"/>
> >               <x name="Nx">
> >                       <v>Vz</v>
> >               </x>
> >               <xsl:copy-of select="$pEnv"/>
> >       </xsl:function>
> >
> >       <xsl:function name="f:envAppend" as="element()">
> >               <f:envAppend/>
> >       </xsl:function>
> >
> >       <xsl:template match="f:envAppend" mode="f:FXSL">
> >               <xsl:param name="arg1" as="element()*"/>
> >               <xsl:sequence select="f:envAppend($arg1)"/>
> >       </xsl:template>
> > </xsl:stylesheet>
> >
> >
> > takes 172 milliseconds to append 100 nodes.
> >
> > It takes 1672 milliseconds for appending 1000 nodes (specified by
> > changing the firs argument of f:iter() above from 100 to 1000).
> >
> > However, the time for appending 5000 nodes is 33828 ms (20 times more
> > than for 1000 nodes)
> >
> > and for appending 10000 nodes the time is  137140 ms (4 times more
> > than the time for 5000 nodes).
> >
> > I did check that the implementation of f:iter() does not contribute to
> > the observed behaviour -- iterating other functions exibits even
> > sublinear times.
> >
> > For example this code:
> >
> > <xsl:stylesheet version="2.0"
> >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> >  xmlns:f="http://fxsl.sf.net/"
> >  exclude-result-prefixes="f xs"
> >  >
> >    <xsl:import href="../f/func-iter.xsl"/>
> >    <xsl:import href="../f/func-Operators.xsl"/>
> >
> >    <xsl:output omit-xml-declaration="yes"/>
> >   <xsl:template match="/">
> >   <t>
> >     <xsl:sequence select="f:iter(1000, f:add(1), 1)"/>
> >   </t>
> >
> >  </xsl:template>
> > </xsl:stylesheet>
> >
> > takes 672 milliseconds for 1000 additions.
> > It takes 11141 ms for 100 thousand additions --not 100 times more, but
> > only 20 times more time.
> >
> >
> > If virtual copy really takes linear times then what is it I'm
> > doing wrong?
> >
> > --
> > Cheers,
> > Dimitre Novatchev
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > September 19-22, 2005 * San Francisco, CA * Development
> > Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams *
> > Testing & QA
> > Security * Process Improvement & Measurement *
> > http://www.sqe.com/bsce5sf
> > _______________________________________________
> > saxon-help mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/saxon-help
> >
>
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

Michael Kay
Thanks for the reference. Saxon is doing a kind of list sharing when you
create a subsequence of an existing list, but it isn't currently doing it
for an append operation, except in certain cases where you get this effect
implicitly as a consequence of lazy evaluation. An explicit list sharing for
append operations seems like a good idea.

The other thing that could be useful here is an extension to the
optimization for tail-recursive functions so that a function f() whose body
is (EXP, f()) or (f(), EXP) can be unwound in the same way as a truly
tail-recursive function. I'll look at these possibilities.

Michael Kay

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> Dimitre Novatchev
> Sent: 07 September 2005 12:12
> To: [hidden email]
> Subject: Re: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> big enough N
>
> Hi Mike,
> Thank you for your reply.
>
> I changed xsl:copy-of to xsl:sequence and although the times
> decreased, the observed
> O(N^2) time complexity still holds for N > 1000.
>
> I hope that an optimisation described in the introductory chapter of
> the book "Purely Functional Data Structures" by Chris Okasaki:
>
>    
> http://www.amazon.com/exec/obidos/tg/detail/-/0521663504/002-7
> 517683-2986405?v=glance
>
> Chapter 2 Persistence -- page 9 and 10 (can be seen using Amazon's
> "View Inside" feature).
>
> are very logical, easy to understand and to implement. They are called
> "List Sharing" and "Tree Sharing"
>
>
> The idea is that the concatenation of two lists:
>
>      zs = xs ++ ys
>
> can be implemented by copying only the first list -- xs and pointing
> from its last element to (sharing) the list ys.
>
>
> It is easy to see that using list sharing it is possible to implement
> repeated append of an element in front of a list in linear time.
>
> From the description of "virtual copy" I was left with the impression
> that it implemented exactly the list sharing technique.
>
> Similarly, adding a node to a binary search tree can be done by
> copying only the nodes that are on the way of the new node until it
> finds its place in the tree (page 13) and pointing from the new/copied
> nodes to the rest of the unchanged nodes (subtree sharing).
>
> I believe that implementing efficiently the data structures described
> by Okasaki is extremely important as this decreases significantly the
> time necessary for "update" operations.
>
>
> Thanks,
> Dimitre Novatchev
>
> On 9/7/05, Michael Kay <[hidden email]> wrote:
> > An interim response: I haven't tried tracing Saxon's
> behaviour on this yet,
> > doing that is always very time-consuming.
> >
> > The use of virtual node copies (when it kicks in at all)
> makes the cost of
> > copying a node independent of the size of the subtree
> rooted at that node.
> > However, what you are doing here is copying a sequence of
> nodes, and the
> > cost of this will still be proportional to the length of
> the sequence.
> >
> > It's not clear to me here why you are copying the nodes at
> all. Why not use
> > xsl:sequence instead of xsl:copy? (To be honest, I suspect
> that the "virtual
> > copy" optimization is essentially bringing the cost of
> xsl:copy close to
> > that of xsl:sequence in situations where the user could have written
> > xsl:sequence in the first place. So perhaps my hopes for
> it, quoted in your
> > message, were a little over-optimistic. It's still worth
> doing, though,
> > because people will write xsl:copy where they didn't need to.)
> >
> > You should be getting some benefits here from the ordinary
> lazy evaluation
> > behavior. This should mean that the result of the
> f:envAppend call is not
> > actually materialized immediately. However, Saxon puts an
> artificial limit
> > on the depth of nesting of "closures" - when they get more
> than 10 deep, the
> > sequence is materialized, because it actually takes less
> memory to store the
> > sequence of 10 items than to store a nested tree of 10
> closures. This is
> > probably less true than it was, because I'm more selective
> about which local
> > variables are saved in the closure, and it might be worth
> experimenting with
> > different cut-offs.
> >
> > More relevantly, however, Saxon currently does an
> optimization designed to
> > help head-tail recursion: when you do a delayed evaluation of
> > $x[position()>1] and it sees that $x is itself a delayed
> evaluation of
> > $y[position()>n], then it collapses the two expressions constructing
> > $y[position()>n+1]. This means that head-tail recursion is
> often linear in
> > space and time. What seems to be needed here is a similar
> collapsing of
> > levels for append operations, so that append($x, X), when
> $x is itself
> > append($y, Y), gets turned into append($y, Y, X). That
> doesn't look too
> > difficult in principle - it's a question of recognizing the
> actual patterns
> > that arise in practice.
> >
> > Michael Kay
> > http://www.saxonica.com/
> >
> > > -----Original Message-----
> > > From: [hidden email]
> > > [mailto:[hidden email]] On Behalf Of
> > > Dimitre Novatchev
> > > Sent: 03 September 2005 06:40
> > > To: [hidden email]
> > > Subject: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> big enough N
> > >
> > > Hi,
> > >
> > > Sometimes ago in the thread:
> > >   "Template recursion, StackOverflowError, saxon:while a
> nd variable
> > > assignability"
> > >
> > > Michael Kay recommended that using xsl:copy-of to append
> to a node-set
> > > will only need linear time:
> > >
> > > "This style of programming, where updating an environment
> > > involves copying
> > > the old environment using xsl:copy-of and appending a new
> > > name=value pair,
> > > should benefit greatly from the optimization introduced a
> > > couple of releases
> > > ago whereby xsl:copy-of creates a virtual copy of a tree
> > > rather than a real
> > > copy. This won't reduce the search time for getting the value of a
> > > particular variable in the store, but it will greatly reduce
> > > the copy time
> > > and the use of memory; and for an environment that only
> contains the
> > > (changing) value of one variable, this is what matters.
> > >
> > > The virtual copy is in essence just a pointer to the node
> > > that has been
> > > copied. The virtual nodes in the virtual copy are
> > > instantiated only when
> > > they are referenced, and themselves consist simply of
> pointers to the
> > > original nodes: they differ from the original nodes only in
> > > that they have
> > > different node identifiers (and thus a different position in
> > > document order)
> > > and that axes cannot navigate outside the subtree that was
> > > copied. A copy of
> > > a virtual copy is implemented as a virtual copy of the
> > > original, to avoid
> > > long chains of indirection."
> > >
> > > Indeed for small N I observe that the time needed to
> append N nodes ro
> > > the node-set increases in a linear fashion.
> > >
> > > The code below:
> > >
> > > <xsl:stylesheet version="2.0"
> > >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > >  xmlns:f="http://fxsl.sf.net/"
> > >  exclude-result-prefixes="f xs"
> > >  >
> > >       <xsl:import href="../f/func-iter.xsl"/>
> > >       <xsl:output omit-xml-declaration="yes"/>
> > >
> > >       <xsl:variable name="vEnv" as="element()*">
> > >               <x n="n1">
> > >                       <v>v1</v>
> > >               </x>
> > >       </xsl:variable>
> > >
> > >       <xsl:template match="/">
> > >               <t>
> > >                  <xsl:sequence select="f:iter(100,
> > > f:envAppend(), $vEnv)"/>
> > >               </t>
> > >       </xsl:template>
> > >
> > >       <xsl:function name="f:envAppend" as="element()*">
> > >               <xsl:param name="pEnv" as="element()*"/>
> > >               <x name="Nx">
> > >                       <v>Vz</v>
> > >               </x>
> > >               <xsl:copy-of select="$pEnv"/>
> > >       </xsl:function>
> > >
> > >       <xsl:function name="f:envAppend" as="element()">
> > >               <f:envAppend/>
> > >       </xsl:function>
> > >
> > >       <xsl:template match="f:envAppend" mode="f:FXSL">
> > >               <xsl:param name="arg1" as="element()*"/>
> > >               <xsl:sequence select="f:envAppend($arg1)"/>
> > >       </xsl:template>
> > > </xsl:stylesheet>
> > >
> > >
> > > takes 172 milliseconds to append 100 nodes.
> > >
> > > It takes 1672 milliseconds for appending 1000 nodes (specified by
> > > changing the firs argument of f:iter() above from 100 to 1000).
> > >
> > > However, the time for appending 5000 nodes is 33828 ms
> (20 times more
> > > than for 1000 nodes)
> > >
> > > and for appending 10000 nodes the time is  137140 ms (4 times more
> > > than the time for 5000 nodes).
> > >
> > > I did check that the implementation of f:iter() does not
> contribute to
> > > the observed behaviour -- iterating other functions exibits even
> > > sublinear times.
> > >
> > > For example this code:
> > >
> > > <xsl:stylesheet version="2.0"
> > >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > >  xmlns:f="http://fxsl.sf.net/"
> > >  exclude-result-prefixes="f xs"
> > >  >
> > >    <xsl:import href="../f/func-iter.xsl"/>
> > >    <xsl:import href="../f/func-Operators.xsl"/>
> > >
> > >    <xsl:output omit-xml-declaration="yes"/>
> > >   <xsl:template match="/">
> > >   <t>
> > >     <xsl:sequence select="f:iter(1000, f:add(1), 1)"/>
> > >   </t>
> > >
> > >  </xsl:template>
> > > </xsl:stylesheet>
> > >
> > > takes 672 milliseconds for 1000 additions.
> > > It takes 11141 ms for 100 thousand additions --not 100
> times more, but
> > > only 20 times more time.
> > >
> > >
> > > If virtual copy really takes linear times then what is it I'm
> > > doing wrong?
> > >
> > > --
> > > Cheers,
> > > Dimitre Novatchev
> > >
> > >
> > > -------------------------------------------------------
> > > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > > September 19-22, 2005 * San Francisco, CA * Development
> > > Lifecycle Practices
> > > Agile & Plan-Driven Development * Managing Projects & Teams *
> > > Testing & QA
> > > Security * Process Improvement & Measurement *
> > > http://www.sqe.com/bsce5sf
> > > _______________________________________________
> > > saxon-help mailing list
> > > [hidden email]
> > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > >
> >
> >
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > September 19-22, 2005 * San Francisco, CA * Development
> Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams
> * Testing & QA
> > Security * Process Improvement & Measurement *
> http://www.sqe.com/bsce5sf
> > _______________________________________________
> > saxon-help mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/saxon-help
> >
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development
> Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams *
> Testing & QA
> Security * Process Improvement & Measurement *
> http://www.sqe.com/bsce5sf
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

Michael Kay
In reply to this post by Dimitre Novatchev
> I changed xsl:copy-of to xsl:sequence and although the times
> decreased, the observed
> O(N^2) time complexity still holds for N > 1000.

Yes, I would have expected both these effects. How much was the decrease, as
a matter of interest?

I don't know why it's not quadratic from the start. It may be something to
do with the cut-off on depth of lazy-evaluation closures, but as that limit
is set to 10 I can't immediately see how this would affect it.

Michael Kay




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

Dimitre Novatchev


On 9/7/05, Michael Kay <[hidden email]> wrote:
> > I changed xsl:copy-of to xsl:sequence and although the times
> > decreased, the observed
> > O(N^2) time complexity still holds for N > 1000.
>
> Yes, I would have expected both these effects. How much was the decrease, as
> a matter of interest?

Nodes          xsl:copy-of  Time (ms)     xsl:sequence Time (ms)
===============================================================
     100             156                             93
    1000            1656                            516
    5000           33516                           5296
   10000          136500                          19266
                    




Cheers,
Dimitre Novatchev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

Dimitre Novatchev
In reply to this post by Michael Kay
On 9/7/05, Michael Kay <[hidden email]> wrote:
> Thanks for the reference. Saxon is doing a kind of list sharing when you
> create a subsequence of an existing list, but it isn't currently doing it
> for an append operation, except in certain cases where you get this effect
> implicitly as a consequence of lazy evaluation. An explicit list sharing for
> append operations seems like a good idea.

Yes, then appending N times an element at the start of a list would be O(N).

>
> The other thing that could be useful here is an extension to the
> optimization for tail-recursive functions so that a function f() whose body
> is (EXP, f()) or (f(), EXP) can be unwound in the same way as a truly
> tail-recursive function. I'll look at these possibilities.

I am sure everyone will benefit immediately from these optimisations.


Cheers,
Dimitre Novatchev


>
> Michael Kay
>
> > -----Original Message-----
> > From: [hidden email]
> > [mailto:[hidden email]] On Behalf Of
> > Dimitre Novatchev
> > Sent: 07 September 2005 12:12
> > To: [hidden email]
> > Subject: Re: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> > big enough N
> >
> > Hi Mike,
> > Thank you for your reply.
> >
> > I changed xsl:copy-of to xsl:sequence and although the times
> > decreased, the observed
> > O(N^2) time complexity still holds for N > 1000.
> >
> > I hope that an optimisation described in the introductory chapter of
> > the book "Purely Functional Data Structures" by Chris Okasaki:
> >
> >
> > http://www.amazon.com/exec/obidos/tg/detail/-/0521663504/002-7
> > 517683-2986405?v=glance
> >
> > Chapter 2 Persistence -- page 9 and 10 (can be seen using Amazon's
> > "View Inside" feature).
> >
> > are very logical, easy to understand and to implement. They are called
> > "List Sharing" and "Tree Sharing"
> >
> >
> > The idea is that the concatenation of two lists:
> >
> >      zs = xs ++ ys
> >
> > can be implemented by copying only the first list -- xs and pointing
> > from its last element to (sharing) the list ys.
> >
> >
> > It is easy to see that using list sharing it is possible to implement
> > repeated append of an element in front of a list in linear time.
> >
> > From the description of "virtual copy" I was left with the impression
> > that it implemented exactly the list sharing technique.
> >
> > Similarly, adding a node to a binary search tree can be done by
> > copying only the nodes that are on the way of the new node until it
> > finds its place in the tree (page 13) and pointing from the new/copied
> > nodes to the rest of the unchanged nodes (subtree sharing).
> >
> > I believe that implementing efficiently the data structures described
> > by Okasaki is extremely important as this decreases significantly the
> > time necessary for "update" operations.
> >
> >
> > Thanks,
> > Dimitre Novatchev
> >
> > On 9/7/05, Michael Kay <[hidden email]> wrote:
> > > An interim response: I haven't tried tracing Saxon's
> > behaviour on this yet,
> > > doing that is always very time-consuming.
> > >
> > > The use of virtual node copies (when it kicks in at all)
> > makes the cost of
> > > copying a node independent of the size of the subtree
> > rooted at that node.
> > > However, what you are doing here is copying a sequence of
> > nodes, and the
> > > cost of this will still be proportional to the length of
> > the sequence.
> > >
> > > It's not clear to me here why you are copying the nodes at
> > all. Why not use
> > > xsl:sequence instead of xsl:copy? (To be honest, I suspect
> > that the "virtual
> > > copy" optimization is essentially bringing the cost of
> > xsl:copy close to
> > > that of xsl:sequence in situations where the user could have written
> > > xsl:sequence in the first place. So perhaps my hopes for
> > it, quoted in your
> > > message, were a little over-optimistic. It's still worth
> > doing, though,
> > > because people will write xsl:copy where they didn't need to.)
> > >
> > > You should be getting some benefits here from the ordinary
> > lazy evaluation
> > > behavior. This should mean that the result of the
> > f:envAppend call is not
> > > actually materialized immediately. However, Saxon puts an
> > artificial limit
> > > on the depth of nesting of "closures" - when they get more
> > than 10 deep, the
> > > sequence is materialized, because it actually takes less
> > memory to store the
> > > sequence of 10 items than to store a nested tree of 10
> > closures. This is
> > > probably less true than it was, because I'm more selective
> > about which local
> > > variables are saved in the closure, and it might be worth
> > experimenting with
> > > different cut-offs.
> > >
> > > More relevantly, however, Saxon currently does an
> > optimization designed to
> > > help head-tail recursion: when you do a delayed evaluation of
> > > $x[position()>1] and it sees that $x is itself a delayed
> > evaluation of
> > > $y[position()>n], then it collapses the two expressions constructing
> > > $y[position()>n+1]. This means that head-tail recursion is
> > often linear in
> > > space and time. What seems to be needed here is a similar
> > collapsing of
> > > levels for append operations, so that append($x, X), when
> > $x is itself
> > > append($y, Y), gets turned into append($y, Y, X). That
> > doesn't look too
> > > difficult in principle - it's a question of recognizing the
> > actual patterns
> > > that arise in practice.
> > >
> > > Michael Kay
> > > http://www.saxonica.com/
> > >
> > > > -----Original Message-----
> > > > From: [hidden email]
> > > > [mailto:[hidden email]] On Behalf Of
> > > > Dimitre Novatchev
> > > > Sent: 03 September 2005 06:40
> > > > To: [hidden email]
> > > > Subject: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> > big enough N
> > > >
> > > > Hi,
> > > >
> > > > Sometimes ago in the thread:
> > > >   "Template recursion, StackOverflowError, saxon:while a
> > nd variable
> > > > assignability"
> > > >
> > > > Michael Kay recommended that using xsl:copy-of to append
> > to a node-set
> > > > will only need linear time:
> > > >
> > > > "This style of programming, where updating an environment
> > > > involves copying
> > > > the old environment using xsl:copy-of and appending a new
> > > > name=value pair,
> > > > should benefit greatly from the optimization introduced a
> > > > couple of releases
> > > > ago whereby xsl:copy-of creates a virtual copy of a tree
> > > > rather than a real
> > > > copy. This won't reduce the search time for getting the value of a
> > > > particular variable in the store, but it will greatly reduce
> > > > the copy time
> > > > and the use of memory; and for an environment that only
> > contains the
> > > > (changing) value of one variable, this is what matters.
> > > >
> > > > The virtual copy is in essence just a pointer to the node
> > > > that has been
> > > > copied. The virtual nodes in the virtual copy are
> > > > instantiated only when
> > > > they are referenced, and themselves consist simply of
> > pointers to the
> > > > original nodes: they differ from the original nodes only in
> > > > that they have
> > > > different node identifiers (and thus a different position in
> > > > document order)
> > > > and that axes cannot navigate outside the subtree that was
> > > > copied. A copy of
> > > > a virtual copy is implemented as a virtual copy of the
> > > > original, to avoid
> > > > long chains of indirection."
> > > >
> > > > Indeed for small N I observe that the time needed to
> > append N nodes ro
> > > > the node-set increases in a linear fashion.
> > > >
> > > > The code below:
> > > >
> > > > <xsl:stylesheet version="2.0"
> > > >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > > >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > > >  xmlns:f="http://fxsl.sf.net/"
> > > >  exclude-result-prefixes="f xs"
> > > >  >
> > > >       <xsl:import href="../f/func-iter.xsl"/>
> > > >       <xsl:output omit-xml-declaration="yes"/>
> > > >
> > > >       <xsl:variable name="vEnv" as="element()*">
> > > >               <x n="n1">
> > > >                       <v>v1</v>
> > > >               </x>
> > > >       </xsl:variable>
> > > >
> > > >       <xsl:template match="/">
> > > >               <t>
> > > >                  <xsl:sequence select="f:iter(100,
> > > > f:envAppend(), $vEnv)"/>
> > > >               </t>
> > > >       </xsl:template>
> > > >
> > > >       <xsl:function name="f:envAppend" as="element()*">
> > > >               <xsl:param name="pEnv" as="element()*"/>
> > > >               <x name="Nx">
> > > >                       <v>Vz</v>
> > > >               </x>
> > > >               <xsl:copy-of select="$pEnv"/>
> > > >       </xsl:function>
> > > >
> > > >       <xsl:function name="f:envAppend" as="element()">
> > > >               <f:envAppend/>
> > > >       </xsl:function>
> > > >
> > > >       <xsl:template match="f:envAppend" mode="f:FXSL">
> > > >               <xsl:param name="arg1" as="element()*"/>
> > > >               <xsl:sequence select="f:envAppend($arg1)"/>
> > > >       </xsl:template>
> > > > </xsl:stylesheet>
> > > >
> > > >
> > > > takes 172 milliseconds to append 100 nodes.
> > > >
> > > > It takes 1672 milliseconds for appending 1000 nodes (specified by
> > > > changing the firs argument of f:iter() above from 100 to 1000).
> > > >
> > > > However, the time for appending 5000 nodes is 33828 ms
> > (20 times more
> > > > than for 1000 nodes)
> > > >
> > > > and for appending 10000 nodes the time is  137140 ms (4 times more
> > > > than the time for 5000 nodes).
> > > >
> > > > I did check that the implementation of f:iter() does not
> > contribute to
> > > > the observed behaviour -- iterating other functions exibits even
> > > > sublinear times.
> > > >
> > > > For example this code:
> > > >
> > > > <xsl:stylesheet version="2.0"
> > > >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > > >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > > >  xmlns:f="http://fxsl.sf.net/"
> > > >  exclude-result-prefixes="f xs"
> > > >  >
> > > >    <xsl:import href="../f/func-iter.xsl"/>
> > > >    <xsl:import href="../f/func-Operators.xsl"/>
> > > >
> > > >    <xsl:output omit-xml-declaration="yes"/>
> > > >   <xsl:template match="/">
> > > >   <t>
> > > >     <xsl:sequence select="f:iter(1000, f:add(1), 1)"/>
> > > >   </t>
> > > >
> > > >  </xsl:template>
> > > > </xsl:stylesheet>
> > > >
> > > > takes 672 milliseconds for 1000 additions.
> > > > It takes 11141 ms for 100 thousand additions --not 100
> > times more, but
> > > > only 20 times more time.
> > > >
> > > >
> > > > If virtual copy really takes linear times then what is it I'm
> > > > doing wrong?
> > > >
> > > > --
> > > > Cheers,
> > > > Dimitre Novatchev
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > > > September 19-22, 2005 * San Francisco, CA * Development
> > > > Lifecycle Practices
> > > > Agile & Plan-Driven Development * Managing Projects & Teams *
> > > > Testing & QA
> > > > Security * Process Improvement & Measurement *
> > > > http://www.sqe.com/bsce5sf
> > > > _______________________________________________
> > > > saxon-help mailing list
> > > > [hidden email]
> > > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > > >
> > >
> > >
> > >
> > >
> > > -------------------------------------------------------
> > > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > > September 19-22, 2005 * San Francisco, CA * Development
> > Lifecycle Practices
> > > Agile & Plan-Driven Development * Managing Projects & Teams
> > * Testing & QA
> > > Security * Process Improvement & Measurement *
> > http://www.sqe.com/bsce5sf
> > > _______________________________________________
> > > saxon-help mailing list
> > > [hidden email]
> > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > >
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > September 19-22, 2005 * San Francisco, CA * Development
> > Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams *
> > Testing & QA
> > Security * Process Improvement & Measurement *
> > http://www.sqe.com/bsce5sf
> > _______________________________________________
> > saxon-help mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/saxon-help
> >
>
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>


--
Cheers,
Dimitre Novatchev
---------------------------------------
Harry did not ask how Dumbledore knew; ...but Harry had long since
learned that bangs and smoke were more often the marks of ineptitude
than expertise.


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

Michael Kay
I've done an experimental implementation of a sequence implementation that
allows content sharing. It seems to be working on the test cases run so far,
and runs twice as fast as the old code for a test case consisting of a
simple recursive function that builds a Fibonacci sequence:

declare variable $limit external;

declare function local:fibonacci($seed as xs:integer*) {
  let $n := count($seed)
  let $next := ($seed, ($seed[$n] + $seed[$n - 1]))
  return
     if ($n < $limit)
     then local:fibonacci($next)
     else $next
};

<a>{local:fibonacci((1,1))}</a>

As it happens this example is still quadratic for sequences longer than 1000
or so, this being caused entirely by the cost of doing arithmetic on very
large integers. If you make a trivial change to the function like replacing
($seed[$n - 1]) by (1), it becomes linear, generating over 100
items/millisecond.

The sequence sharing only works for sequences that share their initial
subsequence, that is, for appending at the end. Appending at the start is
still likely be quadratic. A more general solution would have ended up
effectively representing sequences as a linked list, with the result that
operations like count($seed) and $seed[$n] could not be done in constant
time. So it's not going to work on Dimitre's function

<xsl:function name="f:envAppend" as="element()*">
          <xsl:param name="pEnv" as="element()*"/>
          <x name="Nx">
             <v>Vz</v>
          </x>
          <xsl:copy-of select="$pEnv"/>
</xsl:function>

as written.

Incidentally, the reason the performance appears to be linear at first and
then become quadratic is probably that the cost function is (An + Bn*n),
where the constant A is much larger than B.

Michael Kay



> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> Dimitre Novatchev
> Sent: 07 September 2005 21:17
> To: [hidden email]
> Subject: Re: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> big enough N
>
> On 9/7/05, Michael Kay <[hidden email]> wrote:
> > Thanks for the reference. Saxon is doing a kind of list
> sharing when you
> > create a subsequence of an existing list, but it isn't
> currently doing it
> > for an append operation, except in certain cases where you
> get this effect
> > implicitly as a consequence of lazy evaluation. An explicit
> list sharing for
> > append operations seems like a good idea.
>
> Yes, then appending N times an element at the start of a list
> would be O(N).
>
> >
> > The other thing that could be useful here is an extension to the
> > optimization for tail-recursive functions so that a
> function f() whose body
> > is (EXP, f()) or (f(), EXP) can be unwound in the same way
> as a truly
> > tail-recursive function. I'll look at these possibilities.
>
> I am sure everyone will benefit immediately from these optimisations.
>
>
> Cheers,
> Dimitre Novatchev
>
>
> >
> > Michael Kay
> >
> > > -----Original Message-----
> > > From: [hidden email]
> > > [mailto:[hidden email]] On Behalf Of
> > > Dimitre Novatchev
> > > Sent: 07 September 2005 12:12
> > > To: [hidden email]
> > > Subject: Re: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> > > big enough N
> > >
> > > Hi Mike,
> > > Thank you for your reply.
> > >
> > > I changed xsl:copy-of to xsl:sequence and although the times
> > > decreased, the observed
> > > O(N^2) time complexity still holds for N > 1000.
> > >
> > > I hope that an optimisation described in the introductory
> chapter of
> > > the book "Purely Functional Data Structures" by Chris Okasaki:
> > >
> > >
> > > http://www.amazon.com/exec/obidos/tg/detail/-/0521663504/002-7
> > > 517683-2986405?v=glance
> > >
> > > Chapter 2 Persistence -- page 9 and 10 (can be seen using Amazon's
> > > "View Inside" feature).
> > >
> > > are very logical, easy to understand and to implement.
> They are called
> > > "List Sharing" and "Tree Sharing"
> > >
> > >
> > > The idea is that the concatenation of two lists:
> > >
> > >      zs = xs ++ ys
> > >
> > > can be implemented by copying only the first list -- xs
> and pointing
> > > from its last element to (sharing) the list ys.
> > >
> > >
> > > It is easy to see that using list sharing it is possible
> to implement
> > > repeated append of an element in front of a list in linear time.
> > >
> > > From the description of "virtual copy" I was left with
> the impression
> > > that it implemented exactly the list sharing technique.
> > >
> > > Similarly, adding a node to a binary search tree can be done by
> > > copying only the nodes that are on the way of the new
> node until it
> > > finds its place in the tree (page 13) and pointing from
> the new/copied
> > > nodes to the rest of the unchanged nodes (subtree sharing).
> > >
> > > I believe that implementing efficiently the data
> structures described
> > > by Okasaki is extremely important as this decreases
> significantly the
> > > time necessary for "update" operations.
> > >
> > >
> > > Thanks,
> > > Dimitre Novatchev
> > >
> > > On 9/7/05, Michael Kay <[hidden email]> wrote:
> > > > An interim response: I haven't tried tracing Saxon's
> > > behaviour on this yet,
> > > > doing that is always very time-consuming.
> > > >
> > > > The use of virtual node copies (when it kicks in at all)
> > > makes the cost of
> > > > copying a node independent of the size of the subtree
> > > rooted at that node.
> > > > However, what you are doing here is copying a sequence of
> > > nodes, and the
> > > > cost of this will still be proportional to the length of
> > > the sequence.
> > > >
> > > > It's not clear to me here why you are copying the nodes at
> > > all. Why not use
> > > > xsl:sequence instead of xsl:copy? (To be honest, I suspect
> > > that the "virtual
> > > > copy" optimization is essentially bringing the cost of
> > > xsl:copy close to
> > > > that of xsl:sequence in situations where the user could
> have written
> > > > xsl:sequence in the first place. So perhaps my hopes for
> > > it, quoted in your
> > > > message, were a little over-optimistic. It's still worth
> > > doing, though,
> > > > because people will write xsl:copy where they didn't need to.)
> > > >
> > > > You should be getting some benefits here from the ordinary
> > > lazy evaluation
> > > > behavior. This should mean that the result of the
> > > f:envAppend call is not
> > > > actually materialized immediately. However, Saxon puts an
> > > artificial limit
> > > > on the depth of nesting of "closures" - when they get more
> > > than 10 deep, the
> > > > sequence is materialized, because it actually takes less
> > > memory to store the
> > > > sequence of 10 items than to store a nested tree of 10
> > > closures. This is
> > > > probably less true than it was, because I'm more selective
> > > about which local
> > > > variables are saved in the closure, and it might be worth
> > > experimenting with
> > > > different cut-offs.
> > > >
> > > > More relevantly, however, Saxon currently does an
> > > optimization designed to
> > > > help head-tail recursion: when you do a delayed evaluation of
> > > > $x[position()>1] and it sees that $x is itself a delayed
> > > evaluation of
> > > > $y[position()>n], then it collapses the two expressions
> constructing
> > > > $y[position()>n+1]. This means that head-tail recursion is
> > > often linear in
> > > > space and time. What seems to be needed here is a similar
> > > collapsing of
> > > > levels for append operations, so that append($x, X), when
> > > $x is itself
> > > > append($y, Y), gets turned into append($y, Y, X). That
> > > doesn't look too
> > > > difficult in principle - it's a question of recognizing the
> > > actual patterns
> > > > that arise in practice.
> > > >
> > > > Michael Kay
> > > > http://www.saxonica.com/
> > > >
> > > > > -----Original Message-----
> > > > > From: [hidden email]
> > > > > [mailto:[hidden email]] On Behalf Of
> > > > > Dimitre Novatchev
> > > > > Sent: 03 September 2005 06:40
> > > > > To: [hidden email]
> > > > > Subject: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> > > big enough N
> > > > >
> > > > > Hi,
> > > > >
> > > > > Sometimes ago in the thread:
> > > > >   "Template recursion, StackOverflowError, saxon:while a
> > > nd variable
> > > > > assignability"
> > > > >
> > > > > Michael Kay recommended that using xsl:copy-of to append
> > > to a node-set
> > > > > will only need linear time:
> > > > >
> > > > > "This style of programming, where updating an environment
> > > > > involves copying
> > > > > the old environment using xsl:copy-of and appending a new
> > > > > name=value pair,
> > > > > should benefit greatly from the optimization introduced a
> > > > > couple of releases
> > > > > ago whereby xsl:copy-of creates a virtual copy of a tree
> > > > > rather than a real
> > > > > copy. This won't reduce the search time for getting
> the value of a
> > > > > particular variable in the store, but it will greatly reduce
> > > > > the copy time
> > > > > and the use of memory; and for an environment that only
> > > contains the
> > > > > (changing) value of one variable, this is what matters.
> > > > >
> > > > > The virtual copy is in essence just a pointer to the node
> > > > > that has been
> > > > > copied. The virtual nodes in the virtual copy are
> > > > > instantiated only when
> > > > > they are referenced, and themselves consist simply of
> > > pointers to the
> > > > > original nodes: they differ from the original nodes only in
> > > > > that they have
> > > > > different node identifiers (and thus a different position in
> > > > > document order)
> > > > > and that axes cannot navigate outside the subtree that was
> > > > > copied. A copy of
> > > > > a virtual copy is implemented as a virtual copy of the
> > > > > original, to avoid
> > > > > long chains of indirection."
> > > > >
> > > > > Indeed for small N I observe that the time needed to
> > > append N nodes ro
> > > > > the node-set increases in a linear fashion.
> > > > >
> > > > > The code below:
> > > > >
> > > > > <xsl:stylesheet version="2.0"
> > > > >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > > > >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > > > >  xmlns:f="http://fxsl.sf.net/"
> > > > >  exclude-result-prefixes="f xs"
> > > > >  >
> > > > >       <xsl:import href="../f/func-iter.xsl"/>
> > > > >       <xsl:output omit-xml-declaration="yes"/>
> > > > >
> > > > >       <xsl:variable name="vEnv" as="element()*">
> > > > >               <x n="n1">
> > > > >                       <v>v1</v>
> > > > >               </x>
> > > > >       </xsl:variable>
> > > > >
> > > > >       <xsl:template match="/">
> > > > >               <t>
> > > > >                  <xsl:sequence select="f:iter(100,
> > > > > f:envAppend(), $vEnv)"/>
> > > > >               </t>
> > > > >       </xsl:template>
> > > > >
> > > > >       <xsl:function name="f:envAppend" as="element()*">
> > > > >               <xsl:param name="pEnv" as="element()*"/>
> > > > >               <x name="Nx">
> > > > >                       <v>Vz</v>
> > > > >               </x>
> > > > >               <xsl:copy-of select="$pEnv"/>
> > > > >       </xsl:function>
> > > > >
> > > > >       <xsl:function name="f:envAppend" as="element()">
> > > > >               <f:envAppend/>
> > > > >       </xsl:function>
> > > > >
> > > > >       <xsl:template match="f:envAppend" mode="f:FXSL">
> > > > >               <xsl:param name="arg1" as="element()*"/>
> > > > >               <xsl:sequence select="f:envAppend($arg1)"/>
> > > > >       </xsl:template>
> > > > > </xsl:stylesheet>
> > > > >
> > > > >
> > > > > takes 172 milliseconds to append 100 nodes.
> > > > >
> > > > > It takes 1672 milliseconds for appending 1000 nodes
> (specified by
> > > > > changing the firs argument of f:iter() above from 100
> to 1000).
> > > > >
> > > > > However, the time for appending 5000 nodes is 33828 ms
> > > (20 times more
> > > > > than for 1000 nodes)
> > > > >
> > > > > and for appending 10000 nodes the time is  137140 ms
> (4 times more
> > > > > than the time for 5000 nodes).
> > > > >
> > > > > I did check that the implementation of f:iter() does not
> > > contribute to
> > > > > the observed behaviour -- iterating other functions
> exibits even
> > > > > sublinear times.
> > > > >
> > > > > For example this code:
> > > > >
> > > > > <xsl:stylesheet version="2.0"
> > > > >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > > > >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > > > >  xmlns:f="http://fxsl.sf.net/"
> > > > >  exclude-result-prefixes="f xs"
> > > > >  >
> > > > >    <xsl:import href="../f/func-iter.xsl"/>
> > > > >    <xsl:import href="../f/func-Operators.xsl"/>
> > > > >
> > > > >    <xsl:output omit-xml-declaration="yes"/>
> > > > >   <xsl:template match="/">
> > > > >   <t>
> > > > >     <xsl:sequence select="f:iter(1000, f:add(1), 1)"/>
> > > > >   </t>
> > > > >
> > > > >  </xsl:template>
> > > > > </xsl:stylesheet>
> > > > >
> > > > > takes 672 milliseconds for 1000 additions.
> > > > > It takes 11141 ms for 100 thousand additions --not 100
> > > times more, but
> > > > > only 20 times more time.
> > > > >
> > > > >
> > > > > If virtual copy really takes linear times then what is it I'm
> > > > > doing wrong?
> > > > >
> > > > > --
> > > > > Cheers,
> > > > > Dimitre Novatchev
> > > > >
> > > > >
> > > > > -------------------------------------------------------
> > > > > SF.Net email is Sponsored by the Better Software
> Conference & EXPO
> > > > > September 19-22, 2005 * San Francisco, CA * Development
> > > > > Lifecycle Practices
> > > > > Agile & Plan-Driven Development * Managing Projects & Teams *
> > > > > Testing & QA
> > > > > Security * Process Improvement & Measurement *
> > > > > http://www.sqe.com/bsce5sf
> > > > > _______________________________________________
> > > > > saxon-help mailing list
> > > > > [hidden email]
> > > > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > SF.Net email is Sponsored by the Better Software
> Conference & EXPO
> > > > September 19-22, 2005 * San Francisco, CA * Development
> > > Lifecycle Practices
> > > > Agile & Plan-Driven Development * Managing Projects & Teams
> > > * Testing & QA
> > > > Security * Process Improvement & Measurement *
> > > http://www.sqe.com/bsce5sf
> > > > _______________________________________________
> > > > saxon-help mailing list
> > > > [hidden email]
> > > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > > >
> > >
> > >
> > > -------------------------------------------------------
> > > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > > September 19-22, 2005 * San Francisco, CA * Development
> > > Lifecycle Practices
> > > Agile & Plan-Driven Development * Managing Projects & Teams *
> > > Testing & QA
> > > Security * Process Improvement & Measurement *
> > > http://www.sqe.com/bsce5sf
> > > _______________________________________________
> > > saxon-help mailing list
> > > [hidden email]
> > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > >
> >
> >
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > September 19-22, 2005 * San Francisco, CA * Development
> Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams
> * Testing & QA
> > Security * Process Improvement & Measurement *
> http://www.sqe.com/bsce5sf
> > _______________________________________________
> > saxon-help mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/saxon-help
> >
>
>
> --
> Cheers,
> Dimitre Novatchev
> ---------------------------------------
> Harry did not ask how Dumbledore knew; ...but Harry had long since
> learned that bangs and smoke were more often the marks of ineptitude
> than expertise.
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development
> Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams *
> Testing & QA
> Security * Process Improvement & Measurement *
> http://www.sqe.com/bsce5sf
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

Dimitre Novatchev
That is great, Mike. Thank you for this!

I don't have the code of your implementation, but it seems to me that
if it is possible to implement initial subsequence sharing, then it
should be possible to implement tail-subsequence sharing -- using
almost a "symmetric" implementation.

It may be a matter of taste, but it seems to me that people would feel
a little bit awkward to have to look for the recently added elements
to the list, starting from the end and iterating in a backwards
direction.

Just an idea:
I imagine having a big pool to place (references to) the elements of a
sequence. While there is a space in the pool, new elements simply
occupy the first available adjacent space. On overflow (only), the
pool is copied into a bigger (let's say twice) pool.

In case the implementation places list elements in the pool from left
to right, then we have initial sublist sharing.

In case list elements are placed from right to left, then we have tail
sublist sharing.

We could even start placing the first element at the center of the
pool. Then we could have both of these types of sharing.

Cheers,
Dimitre Novatchev


On 9/9/05, Michael Kay <[hidden email]> wrote:

> I've done an experimental implementation of a sequence implementation that
> allows content sharing. It seems to be working on the test cases run so far,
> and runs twice as fast as the old code for a test case consisting of a
> simple recursive function that builds a Fibonacci sequence:
>
> declare variable $limit external;
>
> declare function local:fibonacci($seed as xs:integer*) {
>  let $n := count($seed)
>  let $next := ($seed, ($seed[$n] + $seed[$n - 1]))
>  return
>     if ($n < $limit)
>     then local:fibonacci($next)
>     else $next
> };
>
> <a>{local:fibonacci((1,1))}</a>
>
> As it happens this example is still quadratic for sequences longer than 1000
> or so, this being caused entirely by the cost of doing arithmetic on very
> large integers. If you make a trivial change to the function like replacing
> ($seed[$n - 1]) by (1), it becomes linear, generating over 100
> items/millisecond.
>
> The sequence sharing only works for sequences that share their initial
> subsequence, that is, for appending at the end. Appending at the start is
> still likely be quadratic. A more general solution would have ended up
> effectively representing sequences as a linked list, with the result that
> operations like count($seed) and $seed[$n] could not be done in constant
> time. So it's not going to work on Dimitre's function
>
> <xsl:function name="f:envAppend" as="element()*">
>          <xsl:param name="pEnv" as="element()*"/>
>          <x name="Nx">
>             <v>Vz</v>
>          </x>
>          <xsl:copy-of select="$pEnv"/>
> </xsl:function>
>
> as written.
>
> Incidentally, the reason the performance appears to be linear at first and
> then become quadratic is probably that the cost function is (An + Bn*n),
> where the constant A is much larger than B.
>
> Michael Kay
>
>
>
> > -----Original Message-----
> > From: [hidden email]
> > [mailto:[hidden email]] On Behalf Of
> > Dimitre Novatchev
> > Sent: 07 September 2005 21:17
> > To: [hidden email]
> > Subject: Re: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> > big enough N
> >
> > On 9/7/05, Michael Kay <[hidden email]> wrote:
> > > Thanks for the reference. Saxon is doing a kind of list
> > sharing when you
> > > create a subsequence of an existing list, but it isn't
> > currently doing it
> > > for an append operation, except in certain cases where you
> > get this effect
> > > implicitly as a consequence of lazy evaluation. An explicit
> > list sharing for
> > > append operations seems like a good idea.
> >
> > Yes, then appending N times an element at the start of a list
> > would be O(N).
> >
> > >
> > > The other thing that could be useful here is an extension to the
> > > optimization for tail-recursive functions so that a
> > function f() whose body
> > > is (EXP, f()) or (f(), EXP) can be unwound in the same way
> > as a truly
> > > tail-recursive function. I'll look at these possibilities.
> >
> > I am sure everyone will benefit immediately from these optimisations.
> >
> >
> > Cheers,
> > Dimitre Novatchev
> >
> >
> > >
> > > Michael Kay
> > >
> > > > -----Original Message-----
> > > > From: [hidden email]
> > > > [mailto:[hidden email]] On Behalf Of
> > > > Dimitre Novatchev
> > > > Sent: 07 September 2005 12:12
> > > > To: [hidden email]
> > > > Subject: Re: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> > > > big enough N
> > > >
> > > > Hi Mike,
> > > > Thank you for your reply.
> > > >
> > > > I changed xsl:copy-of to xsl:sequence and although the times
> > > > decreased, the observed
> > > > O(N^2) time complexity still holds for N > 1000.
> > > >
> > > > I hope that an optimisation described in the introductory
> > chapter of
> > > > the book "Purely Functional Data Structures" by Chris Okasaki:
> > > >
> > > >
> > > > http://www.amazon.com/exec/obidos/tg/detail/-/0521663504/002-7
> > > > 517683-2986405?v=glance
> > > >
> > > > Chapter 2 Persistence -- page 9 and 10 (can be seen using Amazon's
> > > > "View Inside" feature).
> > > >
> > > > are very logical, easy to understand and to implement.
> > They are called
> > > > "List Sharing" and "Tree Sharing"
> > > >
> > > >
> > > > The idea is that the concatenation of two lists:
> > > >
> > > >      zs = xs ++ ys
> > > >
> > > > can be implemented by copying only the first list -- xs
> > and pointing
> > > > from its last element to (sharing) the list ys.
> > > >
> > > >
> > > > It is easy to see that using list sharing it is possible
> > to implement
> > > > repeated append of an element in front of a list in linear time.
> > > >
> > > > From the description of "virtual copy" I was left with
> > the impression
> > > > that it implemented exactly the list sharing technique.
> > > >
> > > > Similarly, adding a node to a binary search tree can be done by
> > > > copying only the nodes that are on the way of the new
> > node until it
> > > > finds its place in the tree (page 13) and pointing from
> > the new/copied
> > > > nodes to the rest of the unchanged nodes (subtree sharing).
> > > >
> > > > I believe that implementing efficiently the data
> > structures described
> > > > by Okasaki is extremely important as this decreases
> > significantly the
> > > > time necessary for "update" operations.
> > > >
> > > >
> > > > Thanks,
> > > > Dimitre Novatchev
> > > >
> > > > On 9/7/05, Michael Kay <[hidden email]> wrote:
> > > > > An interim response: I haven't tried tracing Saxon's
> > > > behaviour on this yet,
> > > > > doing that is always very time-consuming.
> > > > >
> > > > > The use of virtual node copies (when it kicks in at all)
> > > > makes the cost of
> > > > > copying a node independent of the size of the subtree
> > > > rooted at that node.
> > > > > However, what you are doing here is copying a sequence of
> > > > nodes, and the
> > > > > cost of this will still be proportional to the length of
> > > > the sequence.
> > > > >
> > > > > It's not clear to me here why you are copying the nodes at
> > > > all. Why not use
> > > > > xsl:sequence instead of xsl:copy? (To be honest, I suspect
> > > > that the "virtual
> > > > > copy" optimization is essentially bringing the cost of
> > > > xsl:copy close to
> > > > > that of xsl:sequence in situations where the user could
> > have written
> > > > > xsl:sequence in the first place. So perhaps my hopes for
> > > > it, quoted in your
> > > > > message, were a little over-optimistic. It's still worth
> > > > doing, though,
> > > > > because people will write xsl:copy where they didn't need to.)
> > > > >
> > > > > You should be getting some benefits here from the ordinary
> > > > lazy evaluation
> > > > > behavior. This should mean that the result of the
> > > > f:envAppend call is not
> > > > > actually materialized immediately. However, Saxon puts an
> > > > artificial limit
> > > > > on the depth of nesting of "closures" - when they get more
> > > > than 10 deep, the
> > > > > sequence is materialized, because it actually takes less
> > > > memory to store the
> > > > > sequence of 10 items than to store a nested tree of 10
> > > > closures. This is
> > > > > probably less true than it was, because I'm more selective
> > > > about which local
> > > > > variables are saved in the closure, and it might be worth
> > > > experimenting with
> > > > > different cut-offs.
> > > > >
> > > > > More relevantly, however, Saxon currently does an
> > > > optimization designed to
> > > > > help head-tail recursion: when you do a delayed evaluation of
> > > > > $x[position()>1] and it sees that $x is itself a delayed
> > > > evaluation of
> > > > > $y[position()>n], then it collapses the two expressions
> > constructing
> > > > > $y[position()>n+1]. This means that head-tail recursion is
> > > > often linear in
> > > > > space and time. What seems to be needed here is a similar
> > > > collapsing of
> > > > > levels for append operations, so that append($x, X), when
> > > > $x is itself
> > > > > append($y, Y), gets turned into append($y, Y, X). That
> > > > doesn't look too
> > > > > difficult in principle - it's a question of recognizing the
> > > > actual patterns
> > > > > that arise in practice.
> > > > >
> > > > > Michael Kay
> > > > > http://www.saxonica.com/
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: [hidden email]
> > > > > > [mailto:[hidden email]] On Behalf Of
> > > > > > Dimitre Novatchev
> > > > > > Sent: 03 September 2005 06:40
> > > > > > To: [hidden email]
> > > > > > Subject: [saxon] Saxon 8.5.1 virtual copy is O(N^2) for
> > > > big enough N
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Sometimes ago in the thread:
> > > > > >   "Template recursion, StackOverflowError, saxon:while a
> > > > nd variable
> > > > > > assignability"
> > > > > >
> > > > > > Michael Kay recommended that using xsl:copy-of to append
> > > > to a node-set
> > > > > > will only need linear time:
> > > > > >
> > > > > > "This style of programming, where updating an environment
> > > > > > involves copying
> > > > > > the old environment using xsl:copy-of and appending a new
> > > > > > name=value pair,
> > > > > > should benefit greatly from the optimization introduced a
> > > > > > couple of releases
> > > > > > ago whereby xsl:copy-of creates a virtual copy of a tree
> > > > > > rather than a real
> > > > > > copy. This won't reduce the search time for getting
> > the value of a
> > > > > > particular variable in the store, but it will greatly reduce
> > > > > > the copy time
> > > > > > and the use of memory; and for an environment that only
> > > > contains the
> > > > > > (changing) value of one variable, this is what matters.
> > > > > >
> > > > > > The virtual copy is in essence just a pointer to the node
> > > > > > that has been
> > > > > > copied. The virtual nodes in the virtual copy are
> > > > > > instantiated only when
> > > > > > they are referenced, and themselves consist simply of
> > > > pointers to the
> > > > > > original nodes: they differ from the original nodes only in
> > > > > > that they have
> > > > > > different node identifiers (and thus a different position in
> > > > > > document order)
> > > > > > and that axes cannot navigate outside the subtree that was
> > > > > > copied. A copy of
> > > > > > a virtual copy is implemented as a virtual copy of the
> > > > > > original, to avoid
> > > > > > long chains of indirection."
> > > > > >
> > > > > > Indeed for small N I observe that the time needed to
> > > > append N nodes ro
> > > > > > the node-set increases in a linear fashion.
> > > > > >
> > > > > > The code below:
> > > > > >
> > > > > > <xsl:stylesheet version="2.0"
> > > > > >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > > > > >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > > > > >  xmlns:f="http://fxsl.sf.net/"
> > > > > >  exclude-result-prefixes="f xs"
> > > > > >  >
> > > > > >       <xsl:import href="../f/func-iter.xsl"/>
> > > > > >       <xsl:output omit-xml-declaration="yes"/>
> > > > > >
> > > > > >       <xsl:variable name="vEnv" as="element()*">
> > > > > >               <x n="n1">
> > > > > >                       <v>v1</v>
> > > > > >               </x>
> > > > > >       </xsl:variable>
> > > > > >
> > > > > >       <xsl:template match="/">
> > > > > >               <t>
> > > > > >                  <xsl:sequence select="f:iter(100,
> > > > > > f:envAppend(), $vEnv)"/>
> > > > > >               </t>
> > > > > >       </xsl:template>
> > > > > >
> > > > > >       <xsl:function name="f:envAppend" as="element()*">
> > > > > >               <xsl:param name="pEnv" as="element()*"/>
> > > > > >               <x name="Nx">
> > > > > >                       <v>Vz</v>
> > > > > >               </x>
> > > > > >               <xsl:copy-of select="$pEnv"/>
> > > > > >       </xsl:function>
> > > > > >
> > > > > >       <xsl:function name="f:envAppend" as="element()">
> > > > > >               <f:envAppend/>
> > > > > >       </xsl:function>
> > > > > >
> > > > > >       <xsl:template match="f:envAppend" mode="f:FXSL">
> > > > > >               <xsl:param name="arg1" as="element()*"/>
> > > > > >               <xsl:sequence select="f:envAppend($arg1)"/>
> > > > > >       </xsl:template>
> > > > > > </xsl:stylesheet>
> > > > > >
> > > > > >
> > > > > > takes 172 milliseconds to append 100 nodes.
> > > > > >
> > > > > > It takes 1672 milliseconds for appending 1000 nodes
> > (specified by
> > > > > > changing the firs argument of f:iter() above from 100
> > to 1000).
> > > > > >
> > > > > > However, the time for appending 5000 nodes is 33828 ms
> > > > (20 times more
> > > > > > than for 1000 nodes)
> > > > > >
> > > > > > and for appending 10000 nodes the time is  137140 ms
> > (4 times more
> > > > > > than the time for 5000 nodes).
> > > > > >
> > > > > > I did check that the implementation of f:iter() does not
> > > > contribute to
> > > > > > the observed behaviour -- iterating other functions
> > exibits even
> > > > > > sublinear times.
> > > > > >
> > > > > > For example this code:
> > > > > >
> > > > > > <xsl:stylesheet version="2.0"
> > > > > >  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > > > > >  xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > > > > >  xmlns:f="http://fxsl.sf.net/"
> > > > > >  exclude-result-prefixes="f xs"
> > > > > >  >
> > > > > >    <xsl:import href="../f/func-iter.xsl"/>
> > > > > >    <xsl:import href="../f/func-Operators.xsl"/>
> > > > > >
> > > > > >    <xsl:output omit-xml-declaration="yes"/>
> > > > > >   <xsl:template match="/">
> > > > > >   <t>
> > > > > >     <xsl:sequence select="f:iter(1000, f:add(1), 1)"/>
> > > > > >   </t>
> > > > > >
> > > > > >  </xsl:template>
> > > > > > </xsl:stylesheet>
> > > > > >
> > > > > > takes 672 milliseconds for 1000 additions.
> > > > > > It takes 11141 ms for 100 thousand additions --not 100
> > > > times more, but
> > > > > > only 20 times more time.
> > > > > >
> > > > > >
> > > > > > If virtual copy really takes linear times then what is it I'm
> > > > > > doing wrong?
> > > > > >
> > > > > > --
> > > > > > Cheers,
> > > > > > Dimitre Novatchev
> > > > > >
> > > > > >
> > > > > > -------------------------------------------------------
> > > > > > SF.Net email is Sponsored by the Better Software
> > Conference & EXPO
> > > > > > September 19-22, 2005 * San Francisco, CA * Development
> > > > > > Lifecycle Practices
> > > > > > Agile & Plan-Driven Development * Managing Projects & Teams *
> > > > > > Testing & QA
> > > > > > Security * Process Improvement & Measurement *
> > > > > > http://www.sqe.com/bsce5sf
> > > > > > _______________________________________________
> > > > > > saxon-help mailing list
> > > > > > [hidden email]
> > > > > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > -------------------------------------------------------
> > > > > SF.Net email is Sponsored by the Better Software
> > Conference & EXPO
> > > > > September 19-22, 2005 * San Francisco, CA * Development
> > > > Lifecycle Practices
> > > > > Agile & Plan-Driven Development * Managing Projects & Teams
> > > > * Testing & QA
> > > > > Security * Process Improvement & Measurement *
> > > > http://www.sqe.com/bsce5sf
> > > > > _______________________________________________
> > > > > saxon-help mailing list
> > > > > [hidden email]
> > > > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > > > >
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > > > September 19-22, 2005 * San Francisco, CA * Development
> > > > Lifecycle Practices
> > > > Agile & Plan-Driven Development * Managing Projects & Teams *
> > > > Testing & QA
> > > > Security * Process Improvement & Measurement *
> > > > http://www.sqe.com/bsce5sf
> > > > _______________________________________________
> > > > saxon-help mailing list
> > > > [hidden email]
> > > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > > >
> > >
> > >
> > >
> > >
> > > -------------------------------------------------------
> > > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > > September 19-22, 2005 * San Francisco, CA * Development
> > Lifecycle Practices
> > > Agile & Plan-Driven Development * Managing Projects & Teams
> > * Testing & QA
> > > Security * Process Improvement & Measurement *
> > http://www.sqe.com/bsce5sf
> > > _______________________________________________
> > > saxon-help mailing list
> > > [hidden email]
> > > https://lists.sourceforge.net/lists/listinfo/saxon-help
> > >
> >
> >
> > --
> > Cheers,
> > Dimitre Novatchev
> > ---------------------------------------
> > Harry did not ask how Dumbledore knew; ...but Harry had long since
> > learned that bangs and smoke were more often the marks of ineptitude
> > than expertise.
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > September 19-22, 2005 * San Francisco, CA * Development
> > Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams *
> > Testing & QA
> > Security * Process Improvement & Measurement *
> > http://www.sqe.com/bsce5sf
> > _______________________________________________
> > saxon-help mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/saxon-help
> >
>
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>


--
Cheers,
Dimitre Novatchev
---------------------------------------
Harry did not ask how Dumbledore knew; ...but Harry had long since
learned that bangs and smoke were more often the marks of ineptitude
than expertise.


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Saxon 8.5.1 virtual copy is O(N^2) for big enough N

Dimitre Novatchev
> Just an idea:
> I imagine having a big pool to place (references to) the elements of a
> sequence. While there is a space in the pool, new elements simply
> occupy the first available adjacent space. On overflow (only), the
> pool is copied into a bigger (let's say twice) pool.
>
> In case the implementation places list elements in the pool from left
> to right, then we have initial sublist sharing.
>
> In case list elements are placed from right to left, then we have tail
> sublist sharing.
>
> We could even start placing the first element at the center of the
> pool. Then we could have both of these types of sharing.

Unfortunately, the above describes a *destructive* update, which does
not preserve the original list.

Certainly, there are other ways to implement sharable lists, with the
property that the list "knows" its count, it has references to its
elements and any such reference can be a reference to another list,
and the list has an index, which allows very fast location of list[i].

Cheers,
Dimitre Novatchev


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Loading...