Quantcast

.net components, performance question

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

.net components, performance question

Stephen Caffo

I have a fairly large transformation that runs very quickly (about 4 sec) when called from the command line with transform.exe.  In fact, it did a lot better than Altova’s engine with the same transformation.

 

But when I try the same transformation with the .net Saxon components (code below), it runs for over 70 sec.  Am I doing something wrong?  Do I need to buy the “SA” version to get the faster .net components or something?

 

Thanks!

Steve

 

// Create a Processor instance.

Processor processor = new Processor();

 

// Load the source document

XdmNode input = processor.NewDocumentBuilder().Build(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshots.xml"));

 

// Create a transformer for the stylesheet.

XsltTransformer transformer = processor.NewXsltCompiler().Compile(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshot_In1.xslt")).Load();

 

// Set the root node of the source document to be the initial context node

transformer.InitialContextNode = input;

 

// Create a serializer

String outfile = @"C:\TestSaxon\Output.xml";

Serializer serializer = new Serializer();

serializer.SetOutputStream(new FileStream(outfile, FileMode.Create, FileAccess.Write));

 

// Transform the source XML to System.out.

transformer.Run(serializer);


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Michael Kay
I can't think of any obvious reason for the difference. One possibility is that for some reason it's going to the web to fetch a DTD or something in one case but not the other. (When running from the command line, I believe the Java XML parser is used, whereas by default when running from the API, the .NET XML parser is used. That's just a conjectecture.) It's certainly nothing to do with whether or not you're using Saxon-SA.
 
Michael Kay
http://www.saxonica.com/


From: Stephen Caffo [mailto:[hidden email]]
Sent: 05 November 2008 15:56
To: [hidden email]
Subject: [saxon] .net components, performance question

I have a fairly large transformation that runs very quickly (about 4 sec) when called from the command line with transform.exe.  In fact, it did a lot better than Altova’s engine with the same transformation.

 

But when I try the same transformation with the .net Saxon components (code below), it runs for over 70 sec.  Am I doing something wrong?  Do I need to buy the “SA” version to get the faster .net components or something?

 

Thanks!

Steve

 

// Create a Processor instance.

Processor processor = new Processor();

 

// Load the source document

XdmNode input = processor.NewDocumentBuilder().Build(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshots.xml"));

 

// Create a transformer for the stylesheet.

XsltTransformer transformer = processor.NewXsltCompiler().Compile(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshot_In1.xslt")).Load();

 

// Set the root node of the source document to be the initial context node

transformer.InitialContextNode = input;

 

// Create a serializer

String outfile = @"C:\TestSaxon\Output.xml";

Serializer serializer = new Serializer();

serializer.SetOutputStream(new FileStream(outfile, FileMode.Create, FileAccess.Write));

 

// Transform the source XML to System.out.

transformer.Run(serializer);


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Michael Kay
In reply to this post by Stephen Caffo
My initial conclusion on this is that it's caused by the fact that when running from the command line, the "allNodesUntyped" flag is set on the Configuration, whereas this is not the case when running using the API.
 
This flag is known to cause a performance difference, but I have never seen it cause such an enormous performance difference (I would expect something more like 10%), and I haven't yet determined why this should happen here. It's probably because the knowledge that all nodes are untyped enables some powerful optimization to take place; but that's conjecture at the moment.
 
Since you're using Saxon-B, Saxon should know in advance that all nodes are untyped, and should be able to set this flag automatically, so I'll be looking into that as part of the solution.
 
You can set this flag from the .NET API using
 
processor.Implementation.setAllNodesUntyped(true);
 
but you will need to add a couple of references to your build (saxon9.dll and the OpenJDK dll).
 
Given the fact that type information theoretically helps performance, it might seem odd that Saxon performs faster if it knows that all nodes will be untyped. The solution to this paradox is that "untyped" is really just another type, and Saxon can take advantage of knowing that nodes are untyped just as it can use the fact that they are typed. The worst of all possible worlds is not knowing in advance. In particular, Saxon can often generate better code if it knows that elements and attributes will not be list-valued (which in Saxon-B, of course, will always be the case.)
 
I'm going to investigate further to see quite how it happens that this flag setting makes such a big difference.
 
Michael Kay


From: Stephen Caffo [mailto:[hidden email]]
Sent: 05 November 2008 15:56
To: [hidden email]
Subject: [saxon] .net components, performance question

I have a fairly large transformation that runs very quickly (about 4 sec) when called from the command line with transform.exe.  In fact, it did a lot better than Altova’s engine with the same transformation.

 

But when I try the same transformation with the .net Saxon components (code below), it runs for over 70 sec.  Am I doing something wrong?  Do I need to buy the “SA” version to get the faster .net components or something?

 

Thanks!

Steve

 

// Create a Processor instance.

Processor processor = new Processor();

 

// Load the source document

XdmNode input = processor.NewDocumentBuilder().Build(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshots.xml"));

 

// Create a transformer for the stylesheet.

XsltTransformer transformer = processor.NewXsltCompiler().Compile(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshot_In1.xslt")).Load();

 

// Set the root node of the source document to be the initial context node

transformer.InitialContextNode = input;

 

// Create a serializer

String outfile = @"C:\TestSaxon\Output.xml";

Serializer serializer = new Serializer();

serializer.SetOutputStream(new FileStream(outfile, FileMode.Create, FileAccess.Write));

 

// Transform the source XML to System.out.

transformer.Run(serializer);


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Stephen Caffo

Wow – way faster with that switch on (i.e. processor.Implementation.setAllNodesUntyped(true); )

 

 Thanks for the tip.  I guess I’ll try the SA version too since schema types seems to be very important to the processing.  Thanks so much for your quick reply and for investigating this.

 

Steve

 

 

From: Michael Kay [mailto:[hidden email]]
Sent: Friday, November 07, 2008 10:05 AM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

 

My initial conclusion on this is that it's caused by the fact that when running from the command line, the "allNodesUntyped" flag is set on the Configuration, whereas this is not the case when running using the API.

 

This flag is known to cause a performance difference, but I have never seen it cause such an enormous performance difference (I would expect something more like 10%), and I haven't yet determined why this should happen here. It's probably because the knowledge that all nodes are untyped enables some powerful optimization to take place; but that's conjecture at the moment.

 

Since you're using Saxon-B, Saxon should know in advance that all nodes are untyped, and should be able to set this flag automatically, so I'll be looking into that as part of the solution.

 

You can set this flag from the .NET API using

 

processor.Implementation.setAllNodesUntyped(true);

 

but you will need to add a couple of references to your build (saxon9.dll and the OpenJDK dll).

 

Given the fact that type information theoretically helps performance, it might seem odd that Saxon performs faster if it knows that all nodes will be untyped. The solution to this paradox is that "untyped" is really just another type, and Saxon can take advantage of knowing that nodes are untyped just as it can use the fact that they are typed. The worst of all possible worlds is not knowing in advance. In particular, Saxon can often generate better code if it knows that elements and attributes will not be list-valued (which in Saxon-B, of course, will always be the case.)

 

I'm going to investigate further to see quite how it happens that this flag setting makes such a big difference.

 

Michael Kay

http://www.saxonica.com/

 


From: Stephen Caffo [mailto:[hidden email]]
Sent: 05 November 2008 15:56
To: [hidden email]
Subject: [saxon] .net components, performance question

I have a fairly large transformation that runs very quickly (about 4 sec) when called from the command line with transform.exe.  In fact, it did a lot better than Altova’s engine with the same transformation.

 

But when I try the same transformation with the .net Saxon components (code below), it runs for over 70 sec.  Am I doing something wrong?  Do I need to buy the “SA” version to get the faster .net components or something?

 

Thanks!

Steve

 

// Create a Processor instance.

Processor processor = new Processor();

 

// Load the source document

XdmNode input = processor.NewDocumentBuilder().Build(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshots.xml"));

 

// Create a transformer for the stylesheet.

XsltTransformer transformer = processor.NewXsltCompiler().Compile(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshot_In1.xslt")).Load();

 

// Set the root node of the source document to be the initial context node

transformer.InitialContextNode = input;

 

// Create a serializer

String outfile = @"C:\TestSaxon\Output.xml";

Serializer serializer = new Serializer();

serializer.SetOutputStream(new FileStream(outfile, FileMode.Create, FileAccess.Write));

 

// Transform the source XML to System.out.

transformer.Run(serializer);


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Michael Kay
I've produced a patch so that the allNodesUntyped() switch will be set automatically with Saxon-B on .NET, as it already is on Java.
 
I've also been doing further investigations on this.
 
In my measurements, I'm seeing the allNodesUntyped() setting giving a 4-fold improvement on Java and an 8-fold improvement on .NET (which is lower than the difference you observed, I think). I don't know why there's a difference between the two platforms, and I don't know why I'm seeing a smaller difference than you are - unfortunately I don't have very good tooling for performance profiling of the code on .NET.
 
The performance is dominated by the filter expression
 
$MappingElements[($Mapping = "XSDtoSQL" and @xsd=$PathToCurrentNode) or ($Mapping = "SQLtoXSD" and @sql=$PathToCurrentNode)]
 
You should be able to improve this fairly trivially by defining a key. However, I've been exploring why Saxon-SA doesn't generate an index for this automatically. There seem to be two reasons: firstly, an expression of the form (A or false()) isn't being simplified to (A), and secondly, indexes aren't created for filter expressions applied to global variables. The first is trivial to fix, the second is a bit more work. Both will probably be optimizer improvements in the next release - I try to resist the temptation to introduce new optimizations in maintenance releases.
 
I've decided that the allNodesUntyped() option is so important here that in future I'm going to make it the default for all stylesheets that don't have an xsl:import-schema declaration. If you want to handle schema-validated input and you don't want to have an import-schema in your stylesheet, you will need to set this option the other way. I'm also making it an option on the XsltCompiler object.
 
This leaves the question of why the code for the comparison (@xsd=$PathToCurrentNode) is so much faster when it is known in advance that the attribute will be untyped. (It would also be faster if it's known in advance that it will be typed; what is expensive is not knowing either way). Because @xsd could be list-valued, Saxon is actually generating the code (some $d in data(@xsd) satisfies $d eq $PathToCurrentNode), and I think the costs are probably going into the binding and deferencing of the extra local variable. There are other reasons why this expansion is unsatisfactory - for example, it makes it more difficult to introduce indexing - and I'm going to take another look at whether there are better ways of doing it.
 
(Again, you can get the faster code by writing the expression as ((string(@xsd) = $PathToCurrentNode). My concern is getting better performance for the code as written.)
 
Generally, there would be much more scope for the optimizer if you declared the types of variables and parameters, and if you made PathToNode and PathToCurrentNode work entirely in terms of strings rather than temporary trees containing lots of text nodes. That is:
 
<xsl:variable name="PathToCurrentNode" as="xs:string" select="f:PathToNode(.)"/>
 
<xsl:function name="f:PathToNode" as="xs:string">
  <xsl:param name="node" as="node()"/>
  <xsl:sequence select="ancestor-or-self::*/concat(name(), '/')"/>
</xsl:function>
 
This is proving a productive test case for performance tuning and I'll keep you posted on the outcome.
 
Michael Kay
http://www.saxonica.com/


From: Stephen Caffo [mailto:[hidden email]]
Sent: 10 November 2008 15:49
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question

Wow – way faster with that switch on (i.e. processor.Implementation.setAllNodesUntyped(true); )

 

 Thanks for the tip.  I guess I’ll try the SA version too since schema types seems to be very important to the processing.  Thanks so much for your quick reply and for investigating this.

 

Steve

 

 

From: Michael Kay [mailto:[hidden email]]
Sent: Friday, November 07, 2008 10:05 AM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

 

My initial conclusion on this is that it's caused by the fact that when running from the command line, the "allNodesUntyped" flag is set on the Configuration, whereas this is not the case when running using the API.

 

This flag is known to cause a performance difference, but I have never seen it cause such an enormous performance difference (I would expect something more like 10%), and I haven't yet determined why this should happen here. It's probably because the knowledge that all nodes are untyped enables some powerful optimization to take place; but that's conjecture at the moment.

 

Since you're using Saxon-B, Saxon should know in advance that all nodes are untyped, and should be able to set this flag automatically, so I'll be looking into that as part of the solution.

 

You can set this flag from the .NET API using

 

processor.Implementation.setAllNodesUntyped(true);

 

but you will need to add a couple of references to your build (saxon9.dll and the OpenJDK dll).

 

Given the fact that type information theoretically helps performance, it might seem odd that Saxon performs faster if it knows that all nodes will be untyped. The solution to this paradox is that "untyped" is really just another type, and Saxon can take advantage of knowing that nodes are untyped just as it can use the fact that they are typed. The worst of all possible worlds is not knowing in advance. In particular, Saxon can often generate better code if it knows that elements and attributes will not be list-valued (which in Saxon-B, of course, will always be the case.)

 

I'm going to investigate further to see quite how it happens that this flag setting makes such a big difference.

 

Michael Kay

http://www.saxonica.com/

 


From: Stephen Caffo [mailto:[hidden email]]
Sent: 05 November 2008 15:56
To: [hidden email]
Subject: [saxon] .net components, performance question

I have a fairly large transformation that runs very quickly (about 4 sec) when called from the command line with transform.exe.  In fact, it did a lot better than Altova’s engine with the same transformation.

 

But when I try the same transformation with the .net Saxon components (code below), it runs for over 70 sec.  Am I doing something wrong?  Do I need to buy the “SA” version to get the faster .net components or something?

 

Thanks!

Steve

 

// Create a Processor instance.

Processor processor = new Processor();

 

// Load the source document

XdmNode input = processor.NewDocumentBuilder().Build(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshots.xml"));

 

// Create a transformer for the stylesheet.

XsltTransformer transformer = processor.NewXsltCompiler().Compile(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshot_In1.xslt")).Load();

 

// Set the root node of the source document to be the initial context node

transformer.InitialContextNode = input;

 

// Create a serializer

String outfile = @"C:\TestSaxon\Output.xml";

Serializer serializer = new Serializer();

serializer.SetOutputStream(new FileStream(outfile, FileMode.Create, FileAccess.Write));

 

// Transform the source XML to System.out.

transformer.Run(serializer);


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Michael Kay
I've now implemented a couple more changes:
 
(a) expressions of the form (A and true()), or (A or false()) are now rewritten as A. This enables the filter expression to be recognized as one that can benefit from indexing.
 
(b) a filter expression $a[X = y] in Saxon-SA is now indexed when $a is a global variable reference. Previously this was happening only for references to local variables.
 
Together these changes bring the execution time down from 3 seconds to 700ms. (And more importantly, the performance will scale better as the data size increases.)
 
(c) The code
 
      <xsl:variable name="PathToCurrentNode">
        <xsl:call-template name="PathToNode"/>
      </xsl:variable>
 
was causing a lot of inefficiency because it creates a temporary tree. I'm now recognizing this as a "text only tree", by looking at what the called template actually returns; there is existing code for handling a "text only tree" as if it were a string if all references to the variable atomize the content. This brings the execution time down to 180ms. Of course you can achieve the same saving yourself by coding the variable as a string directly, rather than as a temporary tree.
 
Although it's no longer affecting the result of this stylesheet, I'm now going to take a look at the code for (@a = 'abcd') where the possibility that the attribute might be list-valued is seriously degrading performance in the 99% of cases where it isn't.
 
Michael Kay


From: Michael Kay [mailto:[hidden email]]
Sent: 11 November 2008 00:09
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

I've produced a patch so that the allNodesUntyped() switch will be set automatically with Saxon-B on .NET, as it already is on Java.
 
I've also been doing further investigations on this.
 
In my measurements, I'm seeing the allNodesUntyped() setting giving a 4-fold improvement on Java and an 8-fold improvement on .NET (which is lower than the difference you observed, I think). I don't know why there's a difference between the two platforms, and I don't know why I'm seeing a smaller difference than you are - unfortunately I don't have very good tooling for performance profiling of the code on .NET.
 
The performance is dominated by the filter expression
 
$MappingElements[($Mapping = "XSDtoSQL" and @xsd=$PathToCurrentNode) or ($Mapping = "SQLtoXSD" and @sql=$PathToCurrentNode)]
 
You should be able to improve this fairly trivially by defining a key. However, I've been exploring why Saxon-SA doesn't generate an index for this automatically. There seem to be two reasons: firstly, an expression of the form (A or false()) isn't being simplified to (A), and secondly, indexes aren't created for filter expressions applied to global variables. The first is trivial to fix, the second is a bit more work. Both will probably be optimizer improvements in the next release - I try to resist the temptation to introduce new optimizations in maintenance releases.
 
I've decided that the allNodesUntyped() option is so important here that in future I'm going to make it the default for all stylesheets that don't have an xsl:import-schema declaration. If you want to handle schema-validated input and you don't want to have an import-schema in your stylesheet, you will need to set this option the other way. I'm also making it an option on the XsltCompiler object.
 
This leaves the question of why the code for the comparison (@xsd=$PathToCurrentNode) is so much faster when it is known in advance that the attribute will be untyped. (It would also be faster if it's known in advance that it will be typed; what is expensive is not knowing either way). Because @xsd could be list-valued, Saxon is actually generating the code (some $d in data(@xsd) satisfies $d eq $PathToCurrentNode), and I think the costs are probably going into the binding and deferencing of the extra local variable. There are other reasons why this expansion is unsatisfactory - for example, it makes it more difficult to introduce indexing - and I'm going to take another look at whether there are better ways of doing it.
 
(Again, you can get the faster code by writing the expression as ((string(@xsd) = $PathToCurrentNode). My concern is getting better performance for the code as written.)
 
Generally, there would be much more scope for the optimizer if you declared the types of variables and parameters, and if you made PathToNode and PathToCurrentNode work entirely in terms of strings rather than temporary trees containing lots of text nodes. That is:
 
<xsl:variable name="PathToCurrentNode" as="xs:string" select="f:PathToNode(.)"/>
 
<xsl:function name="f:PathToNode" as="xs:string">
  <xsl:param name="node" as="node()"/>
  <xsl:sequence select="ancestor-or-self::*/concat(name(), '/')"/>
</xsl:function>
 
This is proving a productive test case for performance tuning and I'll keep you posted on the outcome.
 
Michael Kay
http://www.saxonica.com/


From: Stephen Caffo [mailto:[hidden email]]
Sent: 10 November 2008 15:49
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question

Wow – way faster with that switch on (i.e. processor.Implementation.setAllNodesUntyped(true); )

 

 Thanks for the tip.  I guess I’ll try the SA version too since schema types seems to be very important to the processing.  Thanks so much for your quick reply and for investigating this.

 

Steve

 

 

From: Michael Kay [mailto:[hidden email]]
Sent: Friday, November 07, 2008 10:05 AM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

 

My initial conclusion on this is that it's caused by the fact that when running from the command line, the "allNodesUntyped" flag is set on the Configuration, whereas this is not the case when running using the API.

 

This flag is known to cause a performance difference, but I have never seen it cause such an enormous performance difference (I would expect something more like 10%), and I haven't yet determined why this should happen here. It's probably because the knowledge that all nodes are untyped enables some powerful optimization to take place; but that's conjecture at the moment.

 

Since you're using Saxon-B, Saxon should know in advance that all nodes are untyped, and should be able to set this flag automatically, so I'll be looking into that as part of the solution.

 

You can set this flag from the .NET API using

 

processor.Implementation.setAllNodesUntyped(true);

 

but you will need to add a couple of references to your build (saxon9.dll and the OpenJDK dll).

 

Given the fact that type information theoretically helps performance, it might seem odd that Saxon performs faster if it knows that all nodes will be untyped. The solution to this paradox is that "untyped" is really just another type, and Saxon can take advantage of knowing that nodes are untyped just as it can use the fact that they are typed. The worst of all possible worlds is not knowing in advance. In particular, Saxon can often generate better code if it knows that elements and attributes will not be list-valued (which in Saxon-B, of course, will always be the case.)

 

I'm going to investigate further to see quite how it happens that this flag setting makes such a big difference.

 

Michael Kay

http://www.saxonica.com/

 


From: Stephen Caffo [mailto:[hidden email]]
Sent: 05 November 2008 15:56
To: [hidden email]
Subject: [saxon] .net components, performance question

I have a fairly large transformation that runs very quickly (about 4 sec) when called from the command line with transform.exe.  In fact, it did a lot better than Altova’s engine with the same transformation.

 

But when I try the same transformation with the .net Saxon components (code below), it runs for over 70 sec.  Am I doing something wrong?  Do I need to buy the “SA” version to get the faster .net components or something?

 

Thanks!

Steve

 

// Create a Processor instance.

Processor processor = new Processor();

 

// Load the source document

XdmNode input = processor.NewDocumentBuilder().Build(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshots.xml"));

 

// Create a transformer for the stylesheet.

XsltTransformer transformer = processor.NewXsltCompiler().Compile(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshot_In1.xslt")).Load();

 

// Set the root node of the source document to be the initial context node

transformer.InitialContextNode = input;

 

// Create a serializer

String outfile = @"C:\TestSaxon\Output.xml";

Serializer serializer = new Serializer();

serializer.SetOutputStream(new FileStream(outfile, FileMode.Create, FileAccess.Write));

 

// Transform the source XML to System.out.

transformer.Run(serializer);


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Stephen Caffo
In reply to this post by Michael Kay

Wow – fantastic again.  Glad I could find something challenging for you :)  I’ll update my parameters to use types (i.e. string)

 

From: Michael Kay [mailto:[hidden email]]
Sent: Monday, November 10, 2008 7:09 PM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

 

I've produced a patch so that the allNodesUntyped() switch will be set automatically with Saxon-B on .NET, as it already is on Java.

 

I've also been doing further investigations on this.

 

In my measurements, I'm seeing the allNodesUntyped() setting giving a 4-fold improvement on Java and an 8-fold improvement on .NET (which is lower than the difference you observed, I think). I don't know why there's a difference between the two platforms, and I don't know why I'm seeing a smaller difference than you are - unfortunately I don't have very good tooling for performance profiling of the code on .NET.

 

The performance is dominated by the filter expression

 

$MappingElements[($Mapping = "XSDtoSQL" and @xsd=$PathToCurrentNode) or ($Mapping = "SQLtoXSD" and @sql=$PathToCurrentNode)]

 

You should be able to improve this fairly trivially by defining a key. However, I've been exploring why Saxon-SA doesn't generate an index for this automatically. There seem to be two reasons: firstly, an expression of the form (A or false()) isn't being simplified to (A), and secondly, indexes aren't created for filter expressions applied to global variables. The first is trivial to fix, the second is a bit more work. Both will probably be optimizer improvements in the next release - I try to resist the temptation to introduce new optimizations in maintenance releases.

 

I've decided that the allNodesUntyped() option is so important here that in future I'm going to make it the default for all stylesheets that don't have an xsl:import-schema declaration. If you want to handle schema-validated input and you don't want to have an import-schema in your stylesheet, you will need to set this option the other way. I'm also making it an option on the XsltCompiler object.

 

This leaves the question of why the code for the comparison (@xsd=$PathToCurrentNode) is so much faster when it is known in advance that the attribute will be untyped. (It would also be faster if it's known in advance that it will be typed; what is expensive is not knowing either way). Because @xsd could be list-valued, Saxon is actually generating the code (some $d in data(@xsd) satisfies $d eq $PathToCurrentNode), and I think the costs are probably going into the binding and deferencing of the extra local variable. There are other reasons why this expansion is unsatisfactory - for example, it makes it more difficult to introduce indexing - and I'm going to take another look at whether there are better ways of doing it.

 

(Again, you can get the faster code by writing the expression as ((string(@xsd) = $PathToCurrentNode). My concern is getting better performance for the code as written.)

 

Generally, there would be much more scope for the optimizer if you declared the types of variables and parameters, and if you made PathToNode and PathToCurrentNode work entirely in terms of strings rather than temporary trees containing lots of text nodes. That is:

 

<xsl:variable name="PathToCurrentNode" as="xs:string" select="f:PathToNode(.)"/>

 

<xsl:function name="f:PathToNode" as="xs:string">

  <xsl:param name="node" as="node()"/>

  <xsl:sequence select="ancestor-or-self::*/concat(name(), '/')"/>

</xsl:function>

 

This is proving a productive test case for performance tuning and I'll keep you posted on the outcome.

 

Michael Kay

http://www.saxonica.com/

 


From: Stephen Caffo [mailto:[hidden email]]
Sent: 10 November 2008 15:49
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question

Wow – way faster with that switch on (i.e. processor.Implementation.setAllNodesUntyped(true); )

 

 Thanks for the tip.  I guess I’ll try the SA version too since schema types seems to be very important to the processing.  Thanks so much for your quick reply and for investigating this.

 

Steve

 

 

From: Michael Kay [mailto:[hidden email]]
Sent: Friday, November 07, 2008 10:05 AM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

 

My initial conclusion on this is that it's caused by the fact that when running from the command line, the "allNodesUntyped" flag is set on the Configuration, whereas this is not the case when running using the API.

 

This flag is known to cause a performance difference, but I have never seen it cause such an enormous performance difference (I would expect something more like 10%), and I haven't yet determined why this should happen here. It's probably because the knowledge that all nodes are untyped enables some powerful optimization to take place; but that's conjecture at the moment.

 

Since you're using Saxon-B, Saxon should know in advance that all nodes are untyped, and should be able to set this flag automatically, so I'll be looking into that as part of the solution.

 

You can set this flag from the .NET API using

 

processor.Implementation.setAllNodesUntyped(true);

 

but you will need to add a couple of references to your build (saxon9.dll and the OpenJDK dll).

 

Given the fact that type information theoretically helps performance, it might seem odd that Saxon performs faster if it knows that all nodes will be untyped. The solution to this paradox is that "untyped" is really just another type, and Saxon can take advantage of knowing that nodes are untyped just as it can use the fact that they are typed. The worst of all possible worlds is not knowing in advance. In particular, Saxon can often generate better code if it knows that elements and attributes will not be list-valued (which in Saxon-B, of course, will always be the case.)

 

I'm going to investigate further to see quite how it happens that this flag setting makes such a big difference.

 

Michael Kay

http://www.saxonica.com/

 


From: Stephen Caffo [mailto:[hidden email]]
Sent: 05 November 2008 15:56
To: [hidden email]
Subject: [saxon] .net components, performance question

I have a fairly large transformation that runs very quickly (about 4 sec) when called from the command line with transform.exe.  In fact, it did a lot better than Altova’s engine with the same transformation.

 

But when I try the same transformation with the .net Saxon components (code below), it runs for over 70 sec.  Am I doing something wrong?  Do I need to buy the “SA” version to get the faster .net components or something?

 

Thanks!

Steve

 

// Create a Processor instance.

Processor processor = new Processor();

 

// Load the source document

XdmNode input = processor.NewDocumentBuilder().Build(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshots.xml"));

 

// Create a transformer for the stylesheet.

XsltTransformer transformer = processor.NewXsltCompiler().Compile(new Uri(@"C:\TestSaxon\PortfolioSnapshot\PortfolioSnapshot_In1.xslt")).Load();

 

// Set the root node of the source document to be the initial context node

transformer.InitialContextNode = input;

 

// Create a serializer

String outfile = @"C:\TestSaxon\Output.xml";

Serializer serializer = new Serializer();

serializer.SetOutputStream(new FileStream(outfile, FileMode.Create, FileAccess.Write));

 

// Transform the source XML to System.out.

transformer.Run(serializer);


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Michael Kay
In reply to this post by Michael Kay
I have now finally established why the stylesheet runs so much faster when it is known in advance that all nodes will be untyped. It is not, as I thought, because atomizing the nodes is significantly faster, or because the logic for doing a sequence comparison is slower than a singleton comparison in the case where the sequence turns out to be a singleton. Rather it is because when nodes are untyped, a dedicated "comparer" is allocated at compile-time, whose task is to compare strings using the Unicode codepoint collation; whereas when it is not known what type the nodes will be, a generic "comparer" is allocated at compile time, which then does some complex run-time decision making to decide how to perform the comparison, and (crucially) ends up choosing a less than optimum strategy.
 
It actually relates to the problem described here:
 
 
(I enjoyed the title of that blog...)
 
In fact, I actually describe the bug in the blog posting! "That means implementing a comparesEqual() method in the collator that's separate from the compare() method, and changing ValueComparisons to use this method rather than calling the general compare() method and testing the result against zero."
 
But on this path, I'm not using a ValueComparison, I'm using code that still uses the general compare() method, which because of the UTF-16 problem described in the blog posting, is looking at the characters in the string one-by-one rather than doing a string compare.
 
Once identified, the problem turns out to be quite easy to fix. At any rate, to fix the main problem, which is choosing an efficient strategy for doing the comparisons. There's still a small overhead because the decision making is done at run-time rather than at compile time, but that's almost unnoticeable.
 
It's also not all that surprising that the overhead of doing this low-level manipulation of strings should be higher on the .NET platform than on Java.
 
Meanwhile, until the fix appears in 9.2, please note that doing [string(@x) eq 'abcd'] can be significantly faster than [@a = 'abcd'].
 
Michael Kay
 
 

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Stephen Caffo

Thank you so much.  Your quick responsiveness is almost inhuman considering the volume of work/email you must process daily!

 

Steve

 

 

From: Michael Kay [mailto:[hidden email]]
Sent: Tuesday, November 11, 2008 5:43 PM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

 

I have now finally established why the stylesheet runs so much faster when it is known in advance that all nodes will be untyped. It is not, as I thought, because atomizing the nodes is significantly faster, or because the logic for doing a sequence comparison is slower than a singleton comparison in the case where the sequence turns out to be a singleton. Rather it is because when nodes are untyped, a dedicated "comparer" is allocated at compile-time, whose task is to compare strings using the Unicode codepoint collation; whereas when it is not known what type the nodes will be, a generic "comparer" is allocated at compile time, which then does some complex run-time decision making to decide how to perform the comparison, and (crucially) ends up choosing a less than optimum strategy.

 

It actually relates to the problem described here:

 

 

(I enjoyed the title of that blog...)

 

In fact, I actually describe the bug in the blog posting! "That means implementing a comparesEqual() method in the collator that's separate from the compare() method, and changing ValueComparisons to use this method rather than calling the general compare() method and testing the result against zero."

 

But on this path, I'm not using a ValueComparison, I'm using code that still uses the general compare() method, which because of the UTF-16 problem described in the blog posting, is looking at the characters in the string one-by-one rather than doing a string compare.

 

Once identified, the problem turns out to be quite easy to fix. At any rate, to fix the main problem, which is choosing an efficient strategy for doing the comparisons. There's still a small overhead because the decision making is done at run-time rather than at compile time, but that's almost unnoticeable.

 

It's also not all that surprising that the overhead of doing this low-level manipulation of strings should be higher on the .NET platform than on Java.

 

Meanwhile, until the fix appears in 9.2, please note that doing [string(@x) eq 'abcd'] can be significantly faster than [@a = 'abcd'].

 

Michael Kay

 

 


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Stephen Caffo

Ok, follow up question.  As my files get larger, my processing time (even from the command line transform.exe) is slowing way, way down.  Is there a better way to do this:

 

 

<!--Load all the mapping elements-->

<xsl:variable name="MappingElements" select="document('PortfolioSnapshotMapping_In.xslt')/descendant::map:Mapping"/>

 

<!--process each element in the large xml file, and look up the mapping element-->

<xsl:variable name="MappingElement" select='$MappingElements[($Mapping eq "XSDtoSQL" and string(@xsd) eq $PathToCurrentNode) or ($Mapping eq "SQLtoXSD" and string(@sql) eq $PathToCurrentNode)]'/>

 

Steve

 

 

From: Stephen Caffo [mailto:[hidden email]]
Sent: Wednesday, November 12, 2008 12:00 PM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question

 

Thank you so much.  Your quick responsiveness is almost inhuman considering the volume of work/email you must process daily!

 

Steve

 

 

From: Michael Kay [mailto:[hidden email]]
Sent: Tuesday, November 11, 2008 5:43 PM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

 

I have now finally established why the stylesheet runs so much faster when it is known in advance that all nodes will be untyped. It is not, as I thought, because atomizing the nodes is significantly faster, or because the logic for doing a sequence comparison is slower than a singleton comparison in the case where the sequence turns out to be a singleton. Rather it is because when nodes are untyped, a dedicated "comparer" is allocated at compile-time, whose task is to compare strings using the Unicode codepoint collation; whereas when it is not known what type the nodes will be, a generic "comparer" is allocated at compile time, which then does some complex run-time decision making to decide how to perform the comparison, and (crucially) ends up choosing a less than optimum strategy.

 

It actually relates to the problem described here:

 

 

(I enjoyed the title of that blog...)

 

In fact, I actually describe the bug in the blog posting! "That means implementing a comparesEqual() method in the collator that's separate from the compare() method, and changing ValueComparisons to use this method rather than calling the general compare() method and testing the result against zero."

 

But on this path, I'm not using a ValueComparison, I'm using code that still uses the general compare() method, which because of the UTF-16 problem described in the blog posting, is looking at the characters in the string one-by-one rather than doing a string compare.

 

Once identified, the problem turns out to be quite easy to fix. At any rate, to fix the main problem, which is choosing an efficient strategy for doing the comparisons. There's still a small overhead because the decision making is done at run-time rather than at compile time, but that's almost unnoticeable.

 

It's also not all that surprising that the overhead of doing this low-level manipulation of strings should be higher on the .NET platform than on Java.

 

Meanwhile, until the fix appears in 9.2, please note that doing [string(@x) eq 'abcd'] can be significantly faster than [@a = 'abcd'].

 

Michael Kay

 

 


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Michael Kay
You can either use Saxon-SA, which will optimize this automatically, or you can use keys.
 
<xsl:key name="mapxsd" match="map:Mapping" use="@xsd"/>
<xsl:key name="mapsql" match="map:Mapping" use="@sql"/>
 
then
<xsl:variable name="MappingDocument" select="document('....')"/>
<xsl:variable name="MappingElement" select="
    key('mapxsd', $PathToCurrentNode, $MappingDocument)[$Mapping eq "XSDtoSQL"] |
    key('mapsql', $PathToCurrentNode, $MappingDocument)[$Mapping eq "SQLtoXSD"] "/>
 
Michael Kay
http://www.saxonica.com/


From: Stephen Caffo [mailto:[hidden email]]
Sent: 12 November 2008 22:49
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question

Ok, follow up question.  As my files get larger, my processing time (even from the command line transform.exe) is slowing way, way down.  Is there a better way to do this:

 

 

<!--Load all the mapping elements-->

<xsl:variable name="MappingElements" select="document('PortfolioSnapshotMapping_In.xslt')/descendant::map:Mapping"/>

 

<!--process each element in the large xml file, and look up the mapping element-->

<xsl:variable name="MappingElement" select='$MappingElements[($Mapping eq "XSDtoSQL" and string(@xsd) eq $PathToCurrentNode) or ($Mapping eq "SQLtoXSD" and string(@sql) eq $PathToCurrentNode)]'/>

 

Steve

 

 

From: Stephen Caffo [mailto:[hidden email]]
Sent: Wednesday, November 12, 2008 12:00 PM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question

 

Thank you so much.  Your quick responsiveness is almost inhuman considering the volume of work/email you must process daily!

 

Steve

 

 

From: Michael Kay [mailto:[hidden email]]
Sent: Tuesday, November 11, 2008 5:43 PM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question

 

I have now finally established why the stylesheet runs so much faster when it is known in advance that all nodes will be untyped. It is not, as I thought, because atomizing the nodes is significantly faster, or because the logic for doing a sequence comparison is slower than a singleton comparison in the case where the sequence turns out to be a singleton. Rather it is because when nodes are untyped, a dedicated "comparer" is allocated at compile-time, whose task is to compare strings using the Unicode codepoint collation; whereas when it is not known what type the nodes will be, a generic "comparer" is allocated at compile time, which then does some complex run-time decision making to decide how to perform the comparison, and (crucially) ends up choosing a less than optimum strategy.

 

It actually relates to the problem described here:

 

 

(I enjoyed the title of that blog...)

 

In fact, I actually describe the bug in the blog posting! "That means implementing a comparesEqual() method in the collator that's separate from the compare() method, and changing ValueComparisons to use this method rather than calling the general compare() method and testing the result against zero."

 

But on this path, I'm not using a ValueComparison, I'm using code that still uses the general compare() method, which because of the UTF-16 problem described in the blog posting, is looking at the characters in the string one-by-one rather than doing a string compare.

 

Once identified, the problem turns out to be quite easy to fix. At any rate, to fix the main problem, which is choosing an efficient strategy for doing the comparisons. There's still a small overhead because the decision making is done at run-time rather than at compile time, but that's almost unnoticeable.

 

It's also not all that surprising that the overhead of doing this low-level manipulation of strings should be higher on the .NET platform than on Java.

 

Meanwhile, until the fix appears in 9.2, please note that doing [string(@x) eq 'abcd'] can be significantly faster than [@a = 'abcd'].

 

Michael Kay

 

 


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Trevor Nash
In reply to this post by Stephen Caffo
Try <xsl:key> (though I'm not sure what $PathToCurrentNode is).

This and many other answers are explained in Dr Kay's excellent book.

Regards,
Trevor Nash

Stephen Caffo wrote:

>
> Ok, follow up question.  As my files get larger, my processing time
> (even from the command line transform.exe) is slowing way, way down.  
> Is there a better way to do this:
>
>  
>
>  
>
> <!--Load all the mapping elements-->
>
> <xsl:variable name="MappingElements"
> select="document('PortfolioSnapshotMapping_In.xslt')/descendant::map:Mapping"/>
>
>  
>
> <!--process each element in the large xml file, and look up the
> mapping element-->
>
> <xsl:variable name="MappingElement" select='$MappingElements[($Mapping
> eq "XSDtoSQL" and string(@xsd) eq $PathToCurrentNode) or ($Mapping eq
> "SQLtoXSD" and string(@sql) eq $PathToCurrentNode)]'/>
>
>  
>
> Steve
>
>  
>
>  
>
> *From:* Stephen Caffo [mailto:[hidden email]]
> *Sent:* Wednesday, November 12, 2008 12:00 PM
> *To:* Mailing list for the SAXON XSLT and XQuery processor
> *Subject:* Re: [saxon] .net components, performance question
>
>  
>
> Thank you so much.  Your quick responsiveness is almost inhuman
> considering the volume of work/email you must process daily!
>
>  
>
> Steve
>
>  
>
>  
>
> *From:* Michael Kay [mailto:[hidden email]]
> *Sent:* Tuesday, November 11, 2008 5:43 PM
> *To:* 'Mailing list for the SAXON XSLT and XQuery processor'
> *Subject:* Re: [saxon] .net components, performance question
>
>  
>
> I have now finally established why the stylesheet runs so much faster
> when it is known in advance that all nodes will be untyped. It is not,
> as I thought, because atomizing the nodes is significantly faster, or
> because the logic for doing a sequence comparison is slower than a
> singleton comparison in the case where the sequence turns out to be a
> singleton. Rather it is because when nodes are untyped, a dedicated
> "comparer" is allocated at compile-time, whose task is to compare
> strings using the Unicode codepoint collation; whereas when it is not
> known what type the nodes will be, a generic "comparer" is allocated
> at compile time, which then does some complex run-time decision making
> to decide how to perform the comparison, and (crucially) ends up
> choosing a less than optimum strategy.
>
>  
>
> It actually relates to the problem described here:
>
>  
>
> http://saxonica.blogharbor.com/blog/_archives/2006/8/13/2226871.html
>
>  
>
> (I enjoyed the title of that blog...)
>
>  
>
> In fact, I actually describe the bug in the blog posting! "That means
> implementing a comparesEqual() method in the collator that's separate
> from the compare() method, and changing ValueComparisons to use this
> method rather than calling the general compare() method and testing
> the result against zero."
>
>  
>
> But on this path, I'm not using a ValueComparison, I'm using code that
> still uses the general compare() method, which because of the UTF-16
> problem described in the blog posting, is looking at the characters in
> the string one-by-one rather than doing a string compare.
>
>  
>
> Once identified, the problem turns out to be quite easy to fix. At any
> rate, to fix the main problem, which is choosing an efficient strategy
> for doing the comparisons. There's still a small overhead because the
> decision making is done at run-time rather than at compile time, but
> that's almost unnoticeable.
>
>  
>
> It's also not all that surprising that the overhead of doing this
> low-level manipulation of strings should be higher on the .NET
> platform than on Java.
>
>  
>
> Meanwhile, until the fix appears in 9.2, please note that doing
> [string(@x) eq 'abcd'] can be significantly faster than [@a = 'abcd'].
>
>  
>
> Michael Kay
>
> http://www.saxonica.com/
>
>  
>
>  
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> ------------------------------------------------------------------------
>
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 

--
Melvaig Technologies Limited
voice:     +44 (0) 1445 771363
email:     [hidden email]       web:       http://www.melvaig.co.uk

Registered in Scotland No 194737
5 Melvaig, Gairloch, Ross-shire IV21 2EA


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Stephen Caffo
In reply to this post by Stephen Caffo
Ah, ok - I see now that was (another) xslt question, not a saxon question.  Sorry about that.

With regards to saxon sa - we have a web app that we sell to our clients.  We're going to use xml/xslt to import/export data from our relational database.  All our xml has schema defined for it.  Users can only transform our data using our web app - meaning it's not some kind of generic xslt tool.

And say we have 30 clients that install our web app on their lan.  Each client has an average of 10 users that use that application.  Plus we have 5 developers on our team.

How does the sa licensing work for that situation ?

Steve


-----Original Message-----
From: Michael Kay <[hidden email]>
Sent: Wednesday, November 12, 2008 6:41 PM
To: Mailing list for the SAXON XSLT and XQuery processor <[hidden email]>
Subject: Re: [saxon] .net components, performance question

You can either use Saxon-SA, which will optimize this automatically, or you can use keys.
 
<xsl:key name="mapxsd" match="map:Mapping" use="@xsd"/>
<xsl:key name="mapsql" match="map:Mapping" use="@sql"/>
 
then
<xsl:variable name="MappingDocument" select="document('....')"/>
<xsl:variable name="MappingElement" select="
    key('mapxsd', $PathToCurrentNode, $MappingDocument)[$Mapping eq "XSDtoSQL"] |
    key('mapsql', $PathToCurrentNode, $MappingDocument)[$Mapping eq "SQLtoXSD"] "/>
 
Michael Kay
http://www.saxonica.com/
From: Stephen Caffo [mailto:[hidden email]]
Sent: 12 November 2008 22:49
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question

Ok, follow up question.  As my files get larger, my processing time (even from the command line transform.exe) is slowing way, way down.  Is there a better way to do this:
 
 
<!--Load all the mapping elements-->
<xsl:variable name="MappingElements" select="document('PortfolioSnapshotMapping_In.xslt')/descendant::map:Mapping"/>
 
<!--process each element in the large xml file, and look up the mapping element-->
<xsl:variable name="MappingElement" select='$MappingElements[($Mapping eq "XSDtoSQL" and string(@xsd) eq $PathToCurrentNode) or ($Mapping eq "SQLtoXSD" and string(@sql) eq $PathToCurrentNode)]'/>
 
Steve
 
 
From: Stephen Caffo [mailto:[hidden email]]
Sent: Wednesday, November 12, 2008 12:00 PM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question
 
Thank you so much.  Your quick responsiveness is almost inhuman considering the volume of work/email you must process daily!
 
Steve
 
 
From: Michael Kay [mailto:[hidden email]]
Sent: Tuesday, November 11, 2008 5:43 PM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question
 
I have now finally established why the stylesheet runs so much faster when it is known in advance that all nodes will be untyped. It is not, as I thought, because atomizing the nodes is significantly faster, or because the logic for doing a sequence comparison is slower than a singleton comparison in the case where the sequence turns out to be a singleton. Rather it is because when nodes are untyped, a dedicated "comparer" is allocated at compile-time, whose task is to compare strings using the Unicode codepoint collation; whereas when it is not known what type the nodes will be, a generic "comparer" is allocated at compile time, which then does some complex run-time decision making to decide how to perform the comparison, and (crucially) ends up choosing a less than optimum strategy.
 
It actually relates to the problem described here:
 
http://saxonica.blogharbor.com/blog/_archives/2006/8/13/2226871.html
 
(I enjoyed the title of that blog...)
 
In fact, I actually describe the bug in the blog posting! "That means implementing a comparesEqual() method in the collator that's separate from the compare() method, and changing ValueComparisons to use this method rather than calling the general compare() method and testing the result against zero"
 
But on this path, I'm not using a ValueComparison, I'm using code that still uses the general compare() method, which because of the UTF-16 problem described in the blog posting, is looking at the characters in the string one-by-one rather than doing a string compare.
 
Once identified, the problem turns out to be quite easy to fix. At any rate, to fix the main problem, which is choosing an efficient strategy for doing the comparisons. There's still a small overhead because the decision making is done at run-time rather than at compile time, but that's almost unnoticeable.
 
It's also not all that surprising that the overhead of doing this low-level manipulation of strings should be higher on the .NET platform than on Java

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Stephen Caffo
In reply to this post by Stephen Caffo
Ah, ok - I see now that was (another) xslt question, not a saxon question.  Sorry about that.

With regards to saxon sa - we have a web app that we sell to our clients.  We're going to use xml/xslt to import/export data from our relational database.  All our xml has schema defined for it.  Users can only transform our data using our web app - meaning it's not some kind of generic xslt tool.

And say we have 30 clients that install our web app on their lan.  Each client has an average of 10 users that use that application.  Plus we have 5 developers on our team.

How does the sa licensing work for that situation ?

Steve


-----Original Message-----
From: Michael Kay <[hidden email]>
Sent: Wednesday, November 12, 2008 6:41 PM
To: Mailing list for the SAXON XSLT and XQuery processor <[hidden email]>
Subject: Re: [saxon] .net components, performance question

You can either use Saxon-SA, which will optimize this automatically, or you can use keys.
 
<xsl:key name="mapxsd" match="map:Mapping" use="@xsd"/>
<xsl:key name="mapsql" match="map:Mapping" use="@sql"/>
 
then
<xsl:variable name="MappingDocument" select="document('....')"/>
<xsl:variable name="MappingElement" select="
    key('mapxsd', $PathToCurrentNode, $MappingDocument)[$Mapping eq "XSDtoSQL"] |
    key('mapsql', $PathToCurrentNode, $MappingDocument)[$Mapping eq "SQLtoXSD"] "/>
 
Michael Kay
http://www.saxonica.com/
From: Stephen Caffo [mailto:[hidden email]]
Sent: 12 November 2008 22:49
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question

Ok, follow up question.  As my files get larger, my processing time (even from the command line transform.exe) is slowing way, way down.  Is there a better way to do this:
 
 
<!--Load all the mapping elements-->
<xsl:variable name="MappingElements" select="document('PortfolioSnapshotMapping_In.xslt')/descendant::map:Mapping"/>
 
<!--process each element in the large xml file, and look up the mapping element-->
<xsl:variable name="MappingElement" select='$MappingElements[($Mapping eq "XSDtoSQL" and string(@xsd) eq $PathToCurrentNode) or ($Mapping eq "SQLtoXSD" and string(@sql) eq $PathToCurrentNode)]'/>
 
Steve
 
 
From: Stephen Caffo [mailto:[hidden email]]
Sent: Wednesday, November 12, 2008 12:00 PM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question
 
Thank you so much.  Your quick responsiveness is almost inhuman considering the volume of work/email you must process daily!
 
Steve
 
 
From: Michael Kay [mailto:[hidden email]]
Sent: Tuesday, November 11, 2008 5:43 PM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question
 
I have now finally established why the stylesheet runs so much faster when it is known in advance that all nodes will be untyped. It is not, as I thought, because atomizing the nodes is significantly faster, or because the logic for doing a sequence comparison is slower than a singleton comparison in the case where the sequence turns out to be a singleton. Rather it is because when nodes are untyped, a dedicated "comparer" is allocated at compile-time, whose task is to compare strings using the Unicode codepoint collation; whereas when it is not known what type the nodes will be, a generic "comparer" is allocated at compile time, which then does some complex run-time decision making to decide how to perform the comparison, and (crucially) ends up choosing a less than optimum strategy.
 
It actually relates to the problem described here:
 
http://saxonica.blogharbor.com/blog/_archives/2006/8/13/2226871.html
 
(I enjoyed the title of that blog...)
 
In fact, I actually describe the bug in the blog posting! "That means implementing a comparesEqual() method in the collator that's separate from the compare() method, and changing ValueComparisons to use this method rather than calling the general compare() method and testing the result against zero"
 
But on this path, I'm not using a ValueComparison, I'm using code that still uses the general compare() method, which because of the UTF-16 problem described in the blog posting, is looking at the characters in the string one-by-one rather than doing a string compare.
 
Once identified, the problem turns out to be quite easy to fix. At any rate, to fix the main problem, which is choosing an efficient strategy for doing the comparisons. There's still a small overhead because the decision making is done at run-time rather than at compile time, but that's almost unnoticeable.
 
It's also not all that surprising that the overhead of doing this low-level manipulation of strings should be higher on the .NET platform than on Java

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Stephen Caffo
In reply to this post by Stephen Caffo
Ah, ok - I see now that was (another) xslt question, not a saxon question.  Sorry about that.

With regards to saxon sa - we have a web app that we sell to our clients.  We're going to use xml/xslt to import/export data from our relational database.  All our xml has schema defined for it.  Users can only transform our data using our web app - meaning it's not some kind of generic xslt tool.

And say we have 30 clients that install our web app on their lan.  Each client has an average of 10 users that use that application.  Plus we have 5 developers on our team.

How does the sa licensing work for that situation ?

Steve



-----Original Message-----
From: Michael Kay <[hidden email]>
Sent: Wednesday, November 12, 2008 6:41 PM
To: Mailing list for the SAXON XSLT and XQuery processor <[hidden email]>
Subject: Re: [saxon] .net components, performance question

You can either use Saxon-SA, which will optimize this automatically, or you can use keys.
 
<xsl:key name="mapxsd" match="map:Mapping" use="@xsd"/>
<xsl:key name="mapsql" match="map:Mapping" use="@sql"/>
 
then
<xsl:variable name="MappingDocument" select="document('....')"/>
<xsl:variable name="MappingElement" select="
    key('mapxsd', $PathToCurrentNode, $MappingDocument)[$Mapping eq "XSDtoSQL"] |
    key('mapsql', $PathToCurrentNode, $MappingDocument)[$Mapping eq "SQLtoXSD"] "/>
 
Michael Kay
http://www.saxonica.com/
From: Stephen Caffo [mailto:[hidden email]]
Sent: 12 November 2008 22:49
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question

Ok, follow up question.  As my files get larger, my processing time (even from the command line transform.exe) is slowing way, way down.  Is there a better way to do this:
 
 
<!--Load all the mapping elements-->
<xsl:variable name="MappingElements" select="document('PortfolioSnapshotMapping_In.xslt')/descendant::map:Mapping"/>
 
<!--process each element in the large xml file, and look up the mapping element-->
<xsl:variable name="MappingElement" select='$MappingElements[($Mapping eq "XSDtoSQL" and string(@xsd) eq $PathToCurrentNode) or ($Mapping eq "SQLtoXSD" and string(@sql) eq $PathToCurrentNode)]'/>
 
Steve
 
 
From: Stephen Caffo [mailto:[hidden email]]
Sent: Wednesday, November 12, 2008 12:00 PM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] .net components, performance question
 
Thank you so much.  Your quick responsiveness is almost inhuman considering the volume of work/email you must process daily!
 
Steve
 
 
From: Michael Kay [mailto:[hidden email]]
Sent: Tuesday, November 11, 2008 5:43 PM
To: 'Mailing list for the SAXON XSLT and XQuery processor'
Subject: Re: [saxon] .net components, performance question
 
I have now finally established why the stylesheet runs so much faster when it is known in advance that all nodes will be untyped. It is not, as I thought, because atomizing the nodes is significantly faster, or because the logic for doing a sequence comparison is slower than a singleton comparison in the case where the sequence turns out to be a singleton. Rather it is because when nodes are untyped, a dedicated "comparer" is allocated at compile-time, whose task is to compare strings using the Unicode codepoint collation; whereas when it is not known what type the nodes will be, a generic "comparer" is allocated at compile time, which then does some complex run-time decision making to decide how to perform the comparison, and (crucially) ends up choosing a less than optimum strategy.
 
It actually relates to the problem described here:
 
http://saxonica.blogharbor.com/blog/_archives/2006/8/13/2226871.html
 
(I enjoyed the title of that blog...)
 
In fact, I actually describe the bug in the blog posting! "That means implementing a comparesEqual() method in the collator that's separate from the compare() method, and changing ValueComparisons to use this method rather than calling the general compare() method and testing the result against zero"
 
But on this path, I'm not using a ValueComparison, I'm using code that still uses the general compare() method, which because of the UTF-16 problem described in the blog posting, is looking at the characters in the string one-by-one rather than doing a string compare.
 
Once identified, the problem turns out to be quite easy to fix. At any rate, to fix the main problem, which is choosing an efficient strategy for doing the comparisons. There's still a small overhead because the decision making is done at run-time rather than at compile time, but that's almost unnoticeable.
 
It's also not all that surprising that the overhead of doing this low-level manipulation of strings should be higher on the .NET platform than on Java

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: .net components, performance question

Michael Kay
In reply to this post by Stephen Caffo
> With regards to saxon sa - we have a web app that we sell to
> our clients.  We're going to use xml/xslt to import/export
> data from our relational database.  All our xml has schema
> defined for it.  Users can only transform our data using our
> web app - meaning it's not some kind of generic xslt tool.
>
> And say we have 30 clients that install our web app on their
> lan.  Each client has an average of 10 users that use that
> application.  Plus we have 5 developers on our team.
>
> How does the sa licensing work for that situation ?

You have two options: you can either tell your clients to purchase a
Saxon-SA license (they need one for each computer on which the software
actually runs, which might just be one from your description), or you can
negotiate an OEM contract with Saxonica that allows you to distribute the
product with your application, and activate it by means of a license key
supplied programmatically. Feel free to contact me off-list to discuss the
commercial terms for this. Our terms and conditions for OEM distributors
typically include unlimited use by the application vendor for development,
testing, and marketing of the application.

Regards,

Michael Kay
http://www.saxonica.com/


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Loading...