Transformation taking way to long

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Transformation taking way to long

Campbell, Lance
Saxon:  SaxonEE9-6-0-3J
Java: jdk1.7.0_72
OS: Linux RedHat

We are have terrible performance issues when transforming an XML and XSL to HTML.  Below are some time comparisons between saxonica and xalan.  The time comparisons are for the exact same data being transformed.  It could be that we are doing something wrong on our end.

Java Code used for both xalan and saxonica transformations:

private static final String invalidXMLChar = "[^\\u0009\\u000A\\u000D\\u0020-\\uD7FF\\uE000-\\uFFFD\uD800\uDC00-\uDBFF\uDFFF]";
public static String transform(String xml, String xsl) throws TransformerException
{
        StreamSource xmlSource = new StreamSource(new StringReader(xml.replaceAll(invalidXMLChar, "")));
        StreamSource xslSource = new StreamSource(new StringReader(xsl));
        // TransformerFactory factory = TransformerFactory.newInstance("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl", null);
        TransformerFactory factory = TransformerFactory.newInstance("com.saxonica.config.EnterpriseTransformerFactory", null);
        Transformer transformer = factory.newTransformer(xslSource);
        transformer.setOutputProperty(OutputKeys.INDENT, "no");
        StringWriter sw = new StringWriter();
        transformer.transform(xmlSource, new StreamResult(sw));
        return sw.toString();
}

Transformation documents:
XML document: http://test.webservices.illinois.edu/blog.xml
XSL document:  http://test.webservices.illinois.edu/blog.xsl

Time in milliseconds.
Transformer: com.saxonica.config.EnterpriseTransformerFactory
transform total time:2700
transform total time:3915
transform total time:3156
transform total time:3100
transform total time:3533

Transformer: com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
transform total time:1045
transform total time:825
transform total time:717
transform total time:822
transform total time:676

Any help would be appreciated.


Thanks,

Lance Campbell
Software Architect
Web Services at Public Affairs
217-333-0382





------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Transformation taking way to long

Michael Kay
We will certainly look into this. Are you measuring the total execution time of your transform() method, including creating the TransformerFactory, compiling the stylesheet, parsing the source document, and executing it?

In our own measurements, which we reported at XML London 2014, Saxon's tree-to-tree transformation time is usually faster than XSLTC, but compile time is often slower, and for some reason XSLTC seems to be faster at the raw parsing/building of the source document. If you need to compile a stylesheet and only want to execute it once, the total of compile+execution time will often be faster if you switch off bytecode generation: the cost of this is often not justified if the bytecode is only used once. Saxon "out of the box" does extra work at compile time in order to improve execution speed.

Also, for a small transformation, the cost of TransformerFactory.newInstance(), especially the classpath search, can dominate the actual transformation cost. This is outside Saxon's control, since it is non-Saxon code, and it may depend on the exact contents of the classpath.

Michael Kay
Saxonica
[hidden email]
+44 (0) 118 946 5893




On 10 Dec 2014, at 18:54, Campbell, Lance <[hidden email]> wrote:

> Saxon:  SaxonEE9-6-0-3J
> Java: jdk1.7.0_72
> OS: Linux RedHat
>
> We are have terrible performance issues when transforming an XML and XSL to HTML.  Below are some time comparisons between saxonica and xalan.  The time comparisons are for the exact same data being transformed.  It could be that we are doing something wrong on our end.
>
> Java Code used for both xalan and saxonica transformations:
>
> private static final String invalidXMLChar = "[^\\u0009\\u000A\\u000D\\u0020-\\uD7FF\\uE000-\\uFFFD\uD800\uDC00-\uDBFF\uDFFF]";
> public static String transform(String xml, String xsl) throws TransformerException
> {
> StreamSource xmlSource = new StreamSource(new StringReader(xml.replaceAll(invalidXMLChar, "")));
> StreamSource xslSource = new StreamSource(new StringReader(xsl));
> // TransformerFactory factory = TransformerFactory.newInstance("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl", null);
> TransformerFactory factory = TransformerFactory.newInstance("com.saxonica.config.EnterpriseTransformerFactory", null);
> Transformer transformer = factory.newTransformer(xslSource);
> transformer.setOutputProperty(OutputKeys.INDENT, "no");
> StringWriter sw = new StringWriter();
> transformer.transform(xmlSource, new StreamResult(sw));
> return sw.toString();
> }
>
> Transformation documents:
> XML document: http://test.webservices.illinois.edu/blog.xml
> XSL document:  http://test.webservices.illinois.edu/blog.xsl
>
> Time in milliseconds.
> Transformer: com.saxonica.config.EnterpriseTransformerFactory
> transform total time:2700
> transform total time:3915
> transform total time:3156
> transform total time:3100
> transform total time:3533
>
> Transformer: com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
> transform total time:1045
> transform total time:825
> transform total time:717
> transform total time:822
> transform total time:676
>
> Any help would be appreciated.
>
>
> Thanks,
>
> Lance Campbell
> Software Architect
> Web Services at Public Affairs
> 217-333-0382
>
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Transformation taking way to long

Campbell, Lance
Thanks so much for responding.  I was measuring the total time.  

After submitting my question I did some research.  I never knew that one could actual precompile XSLs by using "new Templates()".  I wrote some code that caches the XSLs.  That will work great in production.  We won't have any performance issues.

However, on our test server we are testing new XSL on a consistent basis.  You mentioned two things I could do to improve performance if we only compile once:

1) You mentioned something about " switch off bytecode generation".  How do you do that?

2) Replace the below for a little better performance:

TransformerFactory factory = TransformerFactory.newInstance("com.saxonica.config.EnterpriseTransformerFactory", null);

With this:

TransformerFactory transformerFactory = new com.saxonica.config.EnterpriseTransformerFactory();

Thanks for your help.

Thanks,

Lance Campbell
Software Architect
Web Services at Public Affairs
217-333-0382



-----Original Message-----
From: Michael Kay [mailto:[hidden email]]
Sent: Wednesday, December 10, 2014 6:47 PM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] Transformation taking way to long

We will certainly look into this. Are you measuring the total execution time of your transform() method, including creating the TransformerFactory, compiling the stylesheet, parsing the source document, and executing it?

In our own measurements, which we reported at XML London 2014, Saxon's tree-to-tree transformation time is usually faster than XSLTC, but compile time is often slower, and for some reason XSLTC seems to be faster at the raw parsing/building of the source document. If you need to compile a stylesheet and only want to execute it once, the total of compile+execution time will often be faster if you switch off bytecode generation: the cost of this is often not justified if the bytecode is only used once. Saxon "out of the box" does extra work at compile time in order to improve execution speed.

Also, for a small transformation, the cost of TransformerFactory.newInstance(), especially the classpath search, can dominate the actual transformation cost. This is outside Saxon's control, since it is non-Saxon code, and it may depend on the exact contents of the classpath.

Michael Kay
Saxonica
[hidden email]
+44 (0) 118 946 5893




On 10 Dec 2014, at 18:54, Campbell, Lance <[hidden email]> wrote:

> Saxon:  SaxonEE9-6-0-3J
> Java: jdk1.7.0_72
> OS: Linux RedHat
>
> We are have terrible performance issues when transforming an XML and XSL to HTML.  Below are some time comparisons between saxonica and xalan.  The time comparisons are for the exact same data being transformed.  It could be that we are doing something wrong on our end.
>
> Java Code used for both xalan and saxonica transformations:
>
> private static final String invalidXMLChar =
> "[^\\u0009\\u000A\\u000D\\u0020-\\uD7FF\\uE000-\\uFFFD\uD800\uDC00-\uD
> BFF\uDFFF]"; public static String transform(String xml, String xsl)
> throws TransformerException {
> StreamSource xmlSource = new StreamSource(new StringReader(xml.replaceAll(invalidXMLChar, "")));
> StreamSource xslSource = new StreamSource(new StringReader(xsl));
> // TransformerFactory factory = TransformerFactory.newInstance("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl", null);
> TransformerFactory factory = TransformerFactory.newInstance("com.saxonica.config.EnterpriseTransformerFactory", null);
> Transformer transformer = factory.newTransformer(xslSource);
> transformer.setOutputProperty(OutputKeys.INDENT, "no");
> StringWriter sw = new StringWriter();
> transformer.transform(xmlSource, new StreamResult(sw));
> return sw.toString();
> }
>
> Transformation documents:
> XML document: http://test.webservices.illinois.edu/blog.xml
> XSL document:  http://test.webservices.illinois.edu/blog.xsl
>
> Time in milliseconds.
> Transformer: com.saxonica.config.EnterpriseTransformerFactory
> transform total time:2700
> transform total time:3915
> transform total time:3156
> transform total time:3100
> transform total time:3533
>
> Transformer:
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
> transform total time:1045
> transform total time:825
> transform total time:717
> transform total time:822
> transform total time:676
>
> Any help would be appreciated.
>
>
> Thanks,
>
> Lance Campbell
> Software Architect
> Web Services at Public Affairs
> 217-333-0382
>
>
>
>
>
> ----------------------------------------------------------------------
> -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT
> Server from Actuate! Instantly Supercharge Your Business Reports and
> Dashboards with Interactivity, Sharing, Native Excel Exports, App
> Integration & more Get technology previously reserved for
> billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.
> clktrk _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/ 
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/ [hidden email] https://lists.sourceforge.net/lists/listinfo/saxon-help 

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Transformation taking way to long

Michael Kay
>
> However, on our test server we are testing new XSL on a consistent basis.  You mentioned two things I could do to improve performance if we only compile once:
>
> 1) You mentioned something about " switch off bytecode generation".  How do you do that?

>From the command line, --generateByteCode:off

>From Java, factory.setAttribute(FeatureKeys.GENERATE_BYTE_CODE, false)

Michael Kay
Saxonica
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Transformation taking way to long

debbie
In reply to this post by Campbell, Lance
Here are the time measurements we get for running this XSL transform:

Driver implemented: XSLTC
Average time for stylesheet compile: 169.47ms. Number of iterations: 6
Average time for fileToFileTransform: 11.11ms. Number of iterations: 90
Average time for treeToTreeTransform: 7.42ms. Number of iterations: 135

Driver implemented: SaxonEE-9.6.0.3-J (with bytecode generation OFF)
Average time for stylesheet compile: 170.29ms. Number of iterations: 6
Average time for fileToFileTransform: 15.21ms. Number of iterations: 66
Average time for treeToTreeTransform: 3.62ms. Number of iterations: 276

Driver implemented: SaxonEE-9.6.0.3-J (with bytecode generation ON)
Average time for stylesheet compile: 662.24ms. Number of iterations: 5
Average time for fileToFileTransform: 16.13ms. Number of iterations: 62
Average time for treeToTreeTransform: 4.81ms. Number of iterations: 208  
 
Certainly it is the compile time which dominates the total transform time, and this is about 4 times slower with bytecode generation on (consistent with our previous findings for similar cases). In this particular case, bytecode generation doesn't achieve any benefits; this often seems to be the case with XSLT 1.0 stylesheets where there is no type information.
 
The fact that Saxon's tree-to-tree tranformation is faster than XSLTC, while Saxon's file-to-file transform is slower than XSLTC, is also consistent with our previous performance testing. Currently we do not fully understand the reasons for this difference (though we would like to)!

Debbie Lockett
Saxonica

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 

untitled-[1.2].html (1K) Download Attachment
untitled-[2].html (19 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Transformation taking way to long

Campbell, Lance
We shut off byte code generation on our test server.  The performance is great.  Thanks so much for your help.  I cannot wait to push out our changes to production that will use the byte code generation with template cashing.  The performance will be outstanding.

Lance

Sent from my iPhone

> On Dec 11, 2014, at 6:50 AM, "[hidden email]" <[hidden email]> wrote:
>
> Here are the time measurements we get for running this XSL transform:
>
> Driver implemented: XSLTC
> Average time for stylesheet compile: 169.47ms. Number of iterations: 6
> Average time for fileToFileTransform: 11.11ms. Number of iterations: 90
> Average time for treeToTreeTransform: 7.42ms. Number of iterations: 135
>
> Driver implemented: SaxonEE-9.6.0.3-J (with bytecode generation OFF)
> Average time for stylesheet compile: 170.29ms. Number of iterations: 6
> Average time for fileToFileTransform: 15.21ms. Number of iterations: 66
> Average time for treeToTreeTransform: 3.62ms. Number of iterations: 276
>
> Driver implemented: SaxonEE-9.6.0.3-J (with bytecode generation ON)
> Average time for stylesheet compile: 662.24ms. Number of iterations: 5
> Average time for fileToFileTransform: 16.13ms. Number of iterations: 62
> Average time for treeToTreeTransform: 4.81ms. Number of iterations: 208  
>  
> Certainly it is the compile time which dominates the total transform time, and this is about 4 times slower with bytecode generation on (consistent with our previous findings for similar cases). In this particular case, bytecode generation doesn't achieve any benefits; this often seems to be the case with XSLT 1.0 stylesheets where there is no type information.
>  
> The fact that Saxon's tree-to-tree tranformation is faster than XSLTC, while Saxon's file-to-file transform is slower than XSLTC, is also consistent with our previous performance testing. Currently we do not fully understand the reasons for this difference (though we would like to)!
>
> Debbie Lockett
> Saxonica
> <untitled-[1.2].html>
> <untitled-[2].html>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Transformation taking way to long

Andrew Welch
In reply to this post by Michael Kay


On 11 December 2014 at 00:47, Michael Kay <[hidden email]> wrote:
, and for some reason XSLTC seems to be faster at the raw parsing/building of the source document. 

Fwiw, if Xerces-j is doing your underlying parsing, you can speed up the XMLReader creation a little:




--

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Transformation taking way to long

Michael Kay
Yes, instantiating the XML parser can take surprisingly long, and this is another reason for re-using the Saxon TransformerFactory, because the Saxon Configuration will then reuse the XML parser when it can.

Michael Kay
Saxonica
+44 (0) 118 946 5893




On 13 Dec 2014, at 10:38, Andrew Welch <[hidden email]> wrote:



On 11 December 2014 at 00:47, Michael Kay <[hidden email]> wrote:
, and for some reason XSLTC seems to be faster at the raw parsing/building of the source document. 

Fwiw, if Xerces-j is doing your underlying parsing, you can speed up the XMLReader creation a little:




--
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help