Java API vs. command line performance

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Java API vs. command line performance

Steve Ylvisaker-IRS
Forum:
We are implementing a docBook to html process using the docBook 5.0 portfolio and Saxon9ee. We have made relatively minor changes to the vanilla docbook and have the process running successfully in a workstation/windows 7 OS environment. In that environment we process via the command line interface using a simple .bat file process. We are working with a relatively large 4,200 kb docBook file as a baseline. It processes in 4 minutes on a modestly outfitted laptop.
 
When we attempt to port this process to our Linux server environment where the processing of Saxon is accomplished with the Java API we experience a 32 minute processing time with this same file. A test on that same Linux server using a command line call in a shell script results in a much better processing time of ~3 minutes.
 
We have studied the thread of September/October where this issue was discussed and the Java developers thought they had made the necessary changes to resolve the issue. However, the improvement was only modest as this same file now processed in ~19 minutes.
 
We removed the xsl:strip-space element and noted that the resulting html file was unchanged - however - there was no impact on processing time.
 
Are there other threads that we should be studying to resolve this issue or are we looking at something new? Would the forum be willing to engage with us to try and resolve this issue? If so - what shall we provide?
 
Steve


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Java API vs. command line performance

Graydon
The default answer for anything to do with performance and Saxon is to
use -T to find out what's really the source of slowness.  (and maybe
-TJ in your case, since it might be the API calls rather than the
transform.)

However, if the command line is giving you ~3 minutes and the Java API
is giving you ~30 minutes, I'd say the first two things to check are
the Java heap size available to the API -- as soon as the heap starts
swapping, run time goes horrible, and that's what the 3-to-30 minute
difference looks like -- and whether or not the API call version is
managing to do a lot of re-parsing somehow.

It can also be the case that which Java you are using matters; the
OpenJDK has had some very slow versions and some quite sprightly
versions with respect to Saxon-J.

-- Graydon

On Fri, May 2, 2014 at 11:25 AM, Ylvisaker Steven J [Contractor]
<[hidden email]> wrote:

> Forum:
> We are implementing a docBook to html process using the docBook 5.0 portfolio and Saxon9ee. We have made relatively minor changes to the vanilla docbook and have the process running successfully in a workstation/windows 7 OS environment. In that environment we process via the command line interface using a simple .bat file process. We are working with a relatively large 4,200 kb docBook file as a baseline. It processes in 4 minutes on a modestly outfitted laptop.
>
> When we attempt to port this process to our Linux server environment where the processing of Saxon is accomplished with the Java API we experience a 32 minute processing time with this same file. A test on that same Linux server using a command line call in a shell script results in a much better processing time of ~3 minutes.
>
> We have studied the thread of September/October where this issue was discussed and the Java developers thought they had made the necessary changes to resolve the issue. However, the improvement was only modest as this same file now processed in ~19 minutes.
>
> We removed the xsl:strip-space element and noted that the resulting html file was unchanged - however - there was no impact on processing time.
>
> Are there other threads that we should be studying to resolve this issue or are we looking at something new? Would the forum be willing to engage with us to try and resolve this issue? If so - what shall we provide?
>
> Steve
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
> unparalleled scalability from the best Selenium testing platform available.
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Java API vs. command line performance

Michael Kay
In reply to this post by Steve Ylvisaker-IRS
Obviously there is no reason why an application running via the Java API should be slower than using the command line, since the command line is itself an application using the Java API.

So you're doing something wrong, but it's hard to see what without knowing what you're doing.

Probably the most common mistake when people use the Java API is to build a DOM. Using a DOM with Saxon is 4-10 times slower than using Saxon's native tree model.

Michael Kay
Saxonica


On 2 May 2014, at 16:25, Ylvisaker Steven J [Contractor] <[hidden email]> wrote:

> Forum:
> We are implementing a docBook to html process using the docBook 5.0 portfolio and Saxon9ee. We have made relatively minor changes to the vanilla docbook and have the process running successfully in a workstation/windows 7 OS environment. In that environment we process via the command line interface using a simple .bat file process. We are working with a relatively large 4,200 kb docBook file as a baseline. It processes in 4 minutes on a modestly outfitted laptop.
>  
> When we attempt to port this process to our Linux server environment where the processing of Saxon is accomplished with the Java API we experience a 32 minute processing time with this same file. A test on that same Linux server using a command line call in a shell script results in a much better processing time of ~3 minutes.
>  
> We have studied the thread of September/October where this issue was discussed and the Java developers thought they had made the necessary changes to resolve the issue. However, the improvement was only modest as this same file now processed in ~19 minutes.
>  
> We removed the xsl:strip-space element and noted that the resulting html file was unchanged - however - there was no impact on processing time.
>  
> Are there other threads that we should be studying to resolve this issue or are we looking at something new? Would the forum be willing to engage with us to try and resolve this issue? If so - what shall we provide?
>  
> Steve
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
> unparalleled scalability from the best Selenium testing platform available.
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Java API vs. command line performance

Steve Ylvisaker-IRS
In reply to this post by Steve Ylvisaker-IRS
Forum - I would like to reopen this issue.

We continue to see material differences between our command line processing performance and the processing resulting from the use of the APIs. Our most recent example is a 49 minute composition via API that processes with the same input and result in less than 2 minutes when using the command line.

Would it be possible for someone to evalue ate the following Java code and comment on anything that looks suspect:

import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamSource;
import net.sf.saxon.Configuration;
import net.sf.saxon.s9api.DocumentBuilder;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.QName;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.Serializer;
import net.sf.saxon.s9api.XdmAtomicValue;
import net.sf.saxon.s9api.XdmNode;
import net.sf.saxon.s9api.XsltCompiler;
import net.sf.saxon.s9api.XsltExecutable;
import net.sf.saxon.s9api.XsltTransformer;
import org.xml.sax.InputSource;

        public boolean transform(CompParm proc, FilePkg fp, HashMap<String, String> params, File xslt, String inputFileName, String outputFileName) {
                boolean ret = false; // initialize to false = failure

                File xmlInput = new File(inputFileName);
                File outFile = new File(outputFileName);

                Processor saxonProc = new Processor(false);
                Configuration procConfig = saxonProc.getUnderlyingConfiguration();
                procConfig.setErrorListener(new ParserMessages(proc, fp));

                DocumentBuilder docBuilder = saxonProc.newDocumentBuilder();

                XdmNode input = null;
                try {
// input = docBuilder.build(new SAXSource(new InputSource(xmlInput.toURI().toString())));
                        input = docBuilder.build(new StreamSource(xmlInput));
                } catch (SaxonApiException e) {
                        procLogger.error(e.getMessage());
                        return ret;
                }

                DocumentBuilder stylesheetBuilder = saxonProc.newDocumentBuilder();
                StreamSource stylesheetSource = new StreamSource(xslt);
                XdmNode stylesheet = null;
                try {
                        stylesheet = stylesheetBuilder.build(stylesheetSource);
                } catch (SaxonApiException e) {
                        procLogger.fatal("Could not build stylesheet" + stylesheetSource.getSystemId());
                        return ret;
                }

                Serializer serializedOutput = new Serializer();
                serializedOutput.setOutputFile(outFile);

                XsltCompiler xsltCompiler = saxonProc.newXsltCompiler();
                XsltExecutable foStyle = null;
                try {
                        foStyle = xsltCompiler.compile(stylesheet.asSource());
                } catch (SaxonApiException sae) {
                        procLogger.fatal("Stylesheet compile failed.");
                        return ret;
                }

                XsltTransformer foTransformer = foStyle.load();
                foTransformer.setDestination(serializedOutput);
                foTransformer.setInitialContextNode(input);

                // set up runtime parameters
                QName user = new QName("user");
                XdmAtomicValue userVal = new XdmAtomicValue("CPM");
                foTransformer.setParameter(user, userVal);
                // repeated for other parameters...

                foTransformer.setMessageListener(new XSLTMessages(proc, fp));

                try {
                        foTransformer.transform();
                        if (outFile.exists()) {
                                ret = true; // nominal success
                        }
                } catch (SaxonApiException e) {
                        procLogger.error(e.getMessage());
                } finally {
                        if (foTransformer != null) {
                                try {
                                        foTransformer.close();
                                } catch (SaxonApiException e) {
                                }
                        }
                }
                return ret;
        }
}
Reply | Threaded
Open this post in threaded view
|

Re: Java API vs. command line performance

Michael Kay
I would start by inserting some metering calls at key stages into this code to see which stage accounts for the excessive time; output the value of System.currentTimeMillis() beween the main processing phases.

The way you are compiling the stylesheet by first building an XdmNode, rather than by passing a StreamSource to the compile() method, seems unnecessarily long-winded but it's hard to see why it should cause this problem.

After that, the best thing would be to try and package it up to see if we can reproduce the effect. The trouble about performance problems is that the devil usually lies in some detail, and we don' t know what detail to ask about until we know it's there.

Michael Kay
Saxonica
[hidden email]
+44 (0118) 946 5893



On 10 Jun 2014, at 21:31, Steve Ylvisaker-IRS <[hidden email]> wrote:

> Forum - I would like to reopen this issue.
>
> We continue to see material differences between our command line processing
> performance and the processing resulting from the use of the APIs. Our most
> recent example is a 49 minute composition via API that processes with the
> same input and result in less than 2 minutes when using the command line.
>
> Would it be possible for someone to evalue ate the following Java code and
> comment on anything that looks suspect:
>
> import javax.xml.transform.sax.SAXSource;
> import javax.xml.transform.stream.StreamSource;
> import net.sf.saxon.Configuration;
> import net.sf.saxon.s9api.DocumentBuilder;
> import net.sf.saxon.s9api.Processor;
> import net.sf.saxon.s9api.QName;
> import net.sf.saxon.s9api.SaxonApiException;
> import net.sf.saxon.s9api.Serializer;
> import net.sf.saxon.s9api.XdmAtomicValue;
> import net.sf.saxon.s9api.XdmNode;
> import net.sf.saxon.s9api.XsltCompiler;
> import net.sf.saxon.s9api.XsltExecutable;
> import net.sf.saxon.s9api.XsltTransformer;
> import org.xml.sax.InputSource;
>
> public boolean transform(CompParm proc, FilePkg fp, HashMap<String, String>
> params, File xslt, String inputFileName, String outputFileName) {
> boolean ret = false; // initialize to false = failure
>
> File xmlInput = new File(inputFileName);
> File outFile = new File(outputFileName);
>
> Processor saxonProc = new Processor(false);
> Configuration procConfig = saxonProc.getUnderlyingConfiguration();
> procConfig.setErrorListener(new ParserMessages(proc, fp));
>
> DocumentBuilder docBuilder = saxonProc.newDocumentBuilder();
>
> XdmNode input = null;
> try {
> // input = docBuilder.build(new SAXSource(new
> InputSource(xmlInput.toURI().toString())));
> input = docBuilder.build(new StreamSource(xmlInput));
> } catch (SaxonApiException e) {
> procLogger.error(e.getMessage());
> return ret;
> }
>
> DocumentBuilder stylesheetBuilder = saxonProc.newDocumentBuilder();
> StreamSource stylesheetSource = new StreamSource(xslt);
> XdmNode stylesheet = null;
> try {
> stylesheet = stylesheetBuilder.build(stylesheetSource);
> } catch (SaxonApiException e) {
> procLogger.fatal("Could not build stylesheet" +
> stylesheetSource.getSystemId());
> return ret;
> }
>
> Serializer serializedOutput = new Serializer();
> serializedOutput.setOutputFile(outFile);
>
> XsltCompiler xsltCompiler = saxonProc.newXsltCompiler();
> XsltExecutable foStyle = null;
> try {
> foStyle = xsltCompiler.compile(stylesheet.asSource());
> } catch (SaxonApiException sae) {
> procLogger.fatal("Stylesheet compile failed.");
> return ret;
> }
>
> XsltTransformer foTransformer = foStyle.load();
> foTransformer.setDestination(serializedOutput);
> foTransformer.setInitialContextNode(input);
>
> // set up runtime parameters
> QName user = new QName("user");
> XdmAtomicValue userVal = new XdmAtomicValue("CPM");
> foTransformer.setParameter(user, userVal);
> // repeated for other parameters...
>
> foTransformer.setMessageListener(new XSLTMessages(proc, fp));
>
> try {
> foTransformer.transform();
> if (outFile.exists()) {
> ret = true; // nominal success
> }
> } catch (SaxonApiException e) {
> procLogger.error(e.getMessage());
> } finally {
> if (foTransformer != null) {
> try {
> foTransformer.close();
> } catch (SaxonApiException e) {
> }
> }
> }
> return ret;
> }
> }
>
>
>
>
> --
> View this message in context: http://saxon-xslt-and-xquery-processor.13853.n7.nabble.com/Java-API-vs-command-line-performance-tp12933p13051.html
> Sent from the saxon-help mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help