Streaming using accumulator runs out of memory, streaming using xsl:stream does not

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming using accumulator runs out of memory, streaming using xsl:stream does not

Costello, Roger L.
Hi Michael,

I am using XSLT streaming to process a huge (3 GB) file.

I am counting the number of <node> elements in an Open Street Map (OSM) XML document.

Below are two versions of my XSLT program.

The first version uses xsl:stream and works great. There are 10 282 777 node elements!

The second version uses <xsl:mode streamable="yes" /> and xsl:accumulator. This version runs out of memory. See error message below. I am using oXygen XML. Is there something wrong with my program, or is it a bug in SAXON?

/Roger

-------------------------------------------------------
          massachusetts-v1.xsl
-------------------------------------------------------
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                exclude-result-prefixes="#all"
                version="3.0">
   
    <xsl:output method="xml" />
   
    <xsl:template match="/">
        <xsl:stream href="../huge-file/massachusetts.xml">
            <count>
                <xsl:for-each select="osm">
                    <xsl:iterate select="node">
                        <xsl:param name="count" select="0" as="xs:decimal"/>
                        <xsl:next-iteration>
                            <xsl:with-param name="count" select="$count+1"/>
                        </xsl:next-iteration>
                        <xsl:on-completion>
                            <xsl:value-of select="$count"/>
                        </xsl:on-completion>
                    </xsl:iterate>
                </xsl:for-each>
            </count>
        </xsl:stream>
    </xsl:template>
   
</xsl:stylesheet>

-------------------------------------------------------
          massachusetts-v2.xsl
-------------------------------------------------------
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:f="function"
                exclude-result-prefixes="#all"
                version="3.0">
   
    <xsl:output method="xml" />
   
    <xsl:mode streamable="yes" />
   
    <xsl:variable name="MA" select="doc('../huge-file/massachusetts.xml')" />
   
    <xsl:accumulator name="f:node-count"
                     post-descent="f:final-node-count"
                     as="xs:integer"
                     initial-value="0">
        <xsl:accumulator-rule match="node" new-value="$value + 1"/>
    </xsl:accumulator>
   
    <xsl:template match="/">
        <xsl:apply-templates select="$MA/osm" />
    </xsl:template>
   
    <xsl:template match="osm">
        <Massachusetts>
            <xsl:apply-templates select="node"/>
            <NumberOfNodeElements><xsl:value-of select="f:final-node-count()" /></NumberOfNodeElements>
        </Massachusetts>
    </xsl:template>
   
    <xsl:template match="node" />
   
</xsl:stylesheet>

-------------------------------------------------------------
      Error message from version2
-------------------------------------------------------------
The application exceeded the available memory: 1333MB. To avoid stability issues please restart the application. If the application has become unstable and cannot be closed normally, you can use the Force Quit button from this dialog. Be aware that by doing this you will lose any unsaved documents.
If the problem persists, it is recommended to increase the amount of memory available to the application.

You can increase the memory available to the application by setting a larger value for the -Xmx parameter in the startup script, for example -Xmx1866m.
For more details see the 'Performance problems' and 'Setting a parameter in the startup script' sections of the User Manual.

How to avoid these errors:
  In case you were running a diff tool when this problem occurred you can try to use another algorithm next time.
  If you just want to inspect a large file, please use the Large File Viewer available from the Tools menu.
  Do not keep many editors open, close them when you do not need them anymore.


[ Transformation Performer ]  -  java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Unknown Source)
        at java.lang.String.<init>(Unknown Source)
        at net.sf.saxon.om.StructuredQName.getLocalPart(StructuredQName.java:191)
        at net.sf.saxon.om.NoNamespaceName.equals(NoNamespaceName.java:158)
        at net.sf.saxon.event.Stripper.attribute(Stripper.java:148)
        at net.sf.saxon.event.ReceivingContentHandler.startElement(ReceivingContentHandler.java:347)
        at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
        at org.apache.xerces.xinclude.XIncludeHandler.startElement(Unknown Source)
        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:396)
        at net.sf.saxon.event.Sender.send(Sender.java:143)
        at net.sf.saxon.functions.DocumentFn.makeDoc(DocumentFn.java:398)
        at net.sf.saxon.functions.Doc.doc(Doc.java:208)
        at net.sf.saxon.functions.Doc.evaluateItem(Doc.java:158)
        at net.sf.saxon.functions.Doc.evaluateItem(Doc.java:28)
        at net.sf.saxon.expr.Expression.iterate(Expression.java:488)
        at net.sf.saxon.expr.parser.ExpressionTool.evaluate(ExpressionTool.java:338)
        at net.sf.saxon.expr.instruct.GlobalVariable.getSelectValue(GlobalVariable.java:563)
        at net.sf.saxon.expr.instruct.GlobalVariable.actuallyEvaluate(GlobalVariable.java:615)
        at net.sf.saxon.expr.instruct.GlobalVariable.evaluateVariable(GlobalVariable.java:587)
        at net.sf.saxon.expr.VariableReference.evaluateVariable(VariableReference.java:509)
        at net.sf.saxon.expr.VariableReference.evaluateItem(VariableReference.java:473)
        at net.sf.saxon.expr.SimpleStepExpression.iterate(SimpleStepExpression.java:86)
        at net.sf.saxon.expr.parser.ExpressionTool.evaluate(ExpressionTool.java:338)
        at net.sf.saxon.expr.instruct.GlobalVariable.getSelectValue(GlobalVariable.java:563)

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Streaming using accumulator runs out of memory, streaming using xsl:stream does not

Michael Kay
Thanks for the feedback you've been providing on streaming on the xml-dev list, Roger. I've been following with interest.

Your accumulator example binds a variable to the large document. That will always require enough memory to hold the document. If you process the document using xsl:stream, or simply by supplying it as the principal input document, the example should work OK. There have been a few recent bug fixes on streamed accumulators - see here

https://saxonica.plan.io/issues/1890

but I don't think they should affect this example.

Incidentally, your node counting examples can be done a lot more simply using something like

<xsl:stream href="...">
  <xsl:value-of select="count(/a/b/c[@x=2 and @y=4]"/>
</xsl:stream>

(This should work in Saxon 9.5 and in the next W3C draft, though I'm not sure it's streamable according to the current published draft).

You only really need xsl:iterate, or accumulators, when there is some order-significance in the processing, that is, when the way a node is processed depends in some way on nodes that were encountered earlier.

Michael Kay
Saxonica

On 13 Sep 2013, at 10:43, Costello, Roger L. wrote:

> Hi Michael,
>
> I am using XSLT streaming to process a huge (3 GB) file.
>
> I am counting the number of <node> elements in an Open Street Map (OSM) XML document.
>
> Below are two versions of my XSLT program.
>
> The first version uses xsl:stream and works great. There are 10 282 777 node elements!
>
> The second version uses <xsl:mode streamable="yes" /> and xsl:accumulator. This version runs out of memory. See error message below. I am using oXygen XML. Is there something wrong with my program, or is it a bug in SAXON?
>
> /Roger
>
> -------------------------------------------------------
>          massachusetts-v1.xsl
> -------------------------------------------------------
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                xmlns:xs="http://www.w3.org/2001/XMLSchema"
>                exclude-result-prefixes="#all"
>                version="3.0">
>
>    <xsl:output method="xml" />
>
>    <xsl:template match="/">
>        <xsl:stream href="../huge-file/massachusetts.xml">
>            <count>
>                <xsl:for-each select="osm">
>                    <xsl:iterate select="node">
>                        <xsl:param name="count" select="0" as="xs:decimal"/>
>                        <xsl:next-iteration>
>                            <xsl:with-param name="count" select="$count+1"/>
>                        </xsl:next-iteration>
>                        <xsl:on-completion>
>                            <xsl:value-of select="$count"/>
>                        </xsl:on-completion>
>                    </xsl:iterate>
>                </xsl:for-each>
>            </count>
>        </xsl:stream>
>    </xsl:template>
>
> </xsl:stylesheet>
>
> -------------------------------------------------------
>          massachusetts-v2.xsl
> -------------------------------------------------------
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                xmlns:xs="http://www.w3.org/2001/XMLSchema"
>                xmlns:f="function"
>                exclude-result-prefixes="#all"
>                version="3.0">
>
>    <xsl:output method="xml" />
>
>    <xsl:mode streamable="yes" />
>
>    <xsl:variable name="MA" select="doc('../huge-file/massachusetts.xml')" />
>
>    <xsl:accumulator name="f:node-count"
>                     post-descent="f:final-node-count"
>                     as="xs:integer"
>                     initial-value="0">
>        <xsl:accumulator-rule match="node" new-value="$value + 1"/>
>    </xsl:accumulator>
>
>    <xsl:template match="/">
>        <xsl:apply-templates select="$MA/osm" />
>    </xsl:template>
>
>    <xsl:template match="osm">
>        <Massachusetts>
>            <xsl:apply-templates select="node"/>
>            <NumberOfNodeElements><xsl:value-of select="f:final-node-count()" /></NumberOfNodeElements>
>        </Massachusetts>
>    </xsl:template>
>
>    <xsl:template match="node" />
>
> </xsl:stylesheet>
>
> -------------------------------------------------------------
>      Error message from version2
> -------------------------------------------------------------
> The application exceeded the available memory: 1333MB. To avoid stability issues please restart the application. If the application has become unstable and cannot be closed normally, you can use the Force Quit button from this dialog. Be aware that by doing this you will lose any unsaved documents.
> If the problem persists, it is recommended to increase the amount of memory available to the application.
>
> You can increase the memory available to the application by setting a larger value for the -Xmx parameter in the startup script, for example -Xmx1866m.
> For more details see the 'Performance problems' and 'Setting a parameter in the startup script' sections of the User Manual.
>
> How to avoid these errors:
>  In case you were running a diff tool when this problem occurred you can try to use another algorithm next time.
>  If you just want to inspect a large file, please use the Large File Viewer available from the Tools menu.
>  Do not keep many editors open, close them when you do not need them anymore.
>
>
> [ Transformation Performer ]  -  java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOfRange(Unknown Source)
> at java.lang.String.<init>(Unknown Source)
> at net.sf.saxon.om.StructuredQName.getLocalPart(StructuredQName.java:191)
> at net.sf.saxon.om.NoNamespaceName.equals(NoNamespaceName.java:158)
> at net.sf.saxon.event.Stripper.attribute(Stripper.java:148)
> at net.sf.saxon.event.ReceivingContentHandler.startElement(ReceivingContentHandler.java:347)
> at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
> at org.apache.xerces.xinclude.XIncludeHandler.startElement(Unknown Source)
> at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
> at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
> at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
> at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:396)
> at net.sf.saxon.event.Sender.send(Sender.java:143)
> at net.sf.saxon.functions.DocumentFn.makeDoc(DocumentFn.java:398)
> at net.sf.saxon.functions.Doc.doc(Doc.java:208)
> at net.sf.saxon.functions.Doc.evaluateItem(Doc.java:158)
> at net.sf.saxon.functions.Doc.evaluateItem(Doc.java:28)
> at net.sf.saxon.expr.Expression.iterate(Expression.java:488)
> at net.sf.saxon.expr.parser.ExpressionTool.evaluate(ExpressionTool.java:338)
> at net.sf.saxon.expr.instruct.GlobalVariable.getSelectValue(GlobalVariable.java:563)
> at net.sf.saxon.expr.instruct.GlobalVariable.actuallyEvaluate(GlobalVariable.java:615)
> at net.sf.saxon.expr.instruct.GlobalVariable.evaluateVariable(GlobalVariable.java:587)
> at net.sf.saxon.expr.VariableReference.evaluateVariable(VariableReference.java:509)
> at net.sf.saxon.expr.VariableReference.evaluateItem(VariableReference.java:473)
> at net.sf.saxon.expr.SimpleStepExpression.iterate(SimpleStepExpression.java:86)
> at net.sf.saxon.expr.parser.ExpressionTool.evaluate(ExpressionTool.java:338)
> at net.sf.saxon.expr.instruct.GlobalVariable.getSelectValue(GlobalVariable.java:563)
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. Consolidate legacy IT systems to a single system of record for IT
> 2. Standardize and globalize service processes across IT
> 3. Implement zero-touch automation to replace manual, redundant tasks
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Streaming using accumulator runs out of memory, streaming using xsl:stream does not

Costello, Roger L.
Hi Michael,

> Your accumulator example binds a variable
> to the large document. That will always require
> enough memory to hold the document.

Ah! That is a key insight. "Don't have an XSLT program read a huge XML file into a variable." I have added that to my streaming tutorial.

Okay, I supplied the huge XML file as the principal input document. Below is the new version. I ran it and it didn't produce an "Out of Memory" error. However, after 2 hours of running it still wasn't finished so I killed the process. Contrast with the xsl:stream version which finished in 134 seconds. Do you anticipate that in future versions of SAXON the accumulator version and the xsl:stream version will run in about the same time? Or will an accumulator version always be slower than an xsl:stream version?

/Roger

--------------------------------------------------------------
 Count <node> elements, using accumulator
--------------------------------------------------------------
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:f="function"
                exclude-result-prefixes="#all"
                version="3.0">
   
    <xsl:output method="xml" />
   
    <xsl:mode streamable="yes" />
   
    <xsl:accumulator name="f:node-count"
                     post-descent="f:final-node-count"
                     as="xs:integer"
                     initial-value="0">
        <xsl:accumulator-rule match="node" new-value="$value + 1"/>
    </xsl:accumulator>
   
    <xsl:template match="osm">
        <Massachusetts>
            <xsl:apply-templates select="node"/>
            <NumberOfNodeElements><xsl:value-of select="f:final-node-count()" /></NumberOfNodeElements>
        </Massachusetts>
    </xsl:template>
   
    <xsl:template match="node" />
   
</xsl:stylesheet>

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Streaming using accumulator runs out of memory, streaming using xsl:stream does not

Michael Kay
I can't see any reason why the accumulator version should be significantly slower. Perhaps a bug. Do you have a small sample of the data I can test on?

Michael Kay
Saxonica

On 14 Sep 2013, at 19:48, Costello, Roger L. wrote:

> Hi Michael,
>
>> Your accumulator example binds a variable
>> to the large document. That will always require
>> enough memory to hold the document.
>
> Ah! That is a key insight. "Don't have an XSLT program read a huge XML file into a variable." I have added that to my streaming tutorial.
>
> Okay, I supplied the huge XML file as the principal input document. Below is the new version. I ran it and it didn't produce an "Out of Memory" error. However, after 2 hours of running it still wasn't finished so I killed the process. Contrast with the xsl:stream version which finished in 134 seconds. Do you anticipate that in future versions of SAXON the accumulator version and the xsl:stream version will run in about the same time? Or will an accumulator version always be slower than an xsl:stream version?
>
> /Roger
>
> --------------------------------------------------------------
> Count <node> elements, using accumulator
> --------------------------------------------------------------
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                xmlns:xs="http://www.w3.org/2001/XMLSchema"
>                xmlns:f="function"
>                exclude-result-prefixes="#all"
>                version="3.0">
>
>    <xsl:output method="xml" />
>
>    <xsl:mode streamable="yes" />
>
>    <xsl:accumulator name="f:node-count"
>                     post-descent="f:final-node-count"
>                     as="xs:integer"
>                     initial-value="0">
>        <xsl:accumulator-rule match="node" new-value="$value + 1"/>
>    </xsl:accumulator>
>
>    <xsl:template match="osm">
>        <Massachusetts>
>            <xsl:apply-templates select="node"/>
>            <NumberOfNodeElements><xsl:value-of select="f:final-node-count()" /></NumberOfNodeElements>
>        </Massachusetts>
>    </xsl:template>
>
>    <xsl:template match="node" />
>
> </xsl:stylesheet>
>
> ------------------------------------------------------------------------------
> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
> http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Streaming using accumulator runs out of memory, streaming using xsl:stream does not

Costello, Roger L.
Hi Michael,

> Do you have a small sample of the data I can test on?

Below is a small version of the Open Street Map file.

I think that I am doing something wrong in oXygen (which is what I am using to invoke SAXON and run the XSLT program). If I add a stylesheet PI to the (small) XML document, and then click on the "Apply Transformation" button, I get the result quickly. Of course, it is difficult to add a stylesheet PI to the 3GB XML document because all the editors that I have used die trying to open a 3GB file (even the windows version of VI -- VIM -- crashed).

So instead of inserting a stylesheet PI in the input XML file, in oXygen I opened the XSLT program and then clicked on the "Configure Transformation Scenario" icon, selected the input XML document, and then clicked on the "Apply associated" button. Now the transformation runs forever, even on the below small XML file. Hey George, am I doing something wrong?

/Roger

-----------------------------------------------------------------
                   Small Input File
-----------------------------------------------------------------
<osm version="0.6" generator="osm-extract.pl">
    <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
        uid="4732" user="iandees" changeset="774950" lat="42.2017681"
        lon="-70.7561527">
        <tag k="gnis:created" v="08/27/2002"/>
        <tag k="gnis:county_id" v="023"/>
        <tag k="name" v="Wayland Middle School"/>
        <tag k="amenity" v="school"/>
        <tag k="gnis:feature_id" v="602607"/>
        <tag k="gnis:state_id" v="25"/>
        <tag k="ele" v="34"/>
    </node>
    <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
        uid="4732" user="iandees" changeset="774950" lat="42.2017681"
        lon="-70.7561527">
        <tag k="gnis:created" v="08/27/2002"/>
        <tag k="gnis:county_id" v="023"/>
        <tag k="name" v="Scituate Center Central School"/>
        <tag k="amenity" v="school"/>
        <tag k="gnis:feature_id" v="602607"/>
        <tag k="gnis:state_id" v="25"/>
        <tag k="ele" v="34"/>
    </node>
    <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
        uid="4732" user="iandees" changeset="774950" lat="42.2017681"
        lon="-70.7561527">
        <tag k="gnis:created" v="08/27/2002"/>
        <tag k="gnis:county_id" v="023"/>
        <tag k="name" v="Walnut Hill School for the Arts"/>
        <tag k="amenity" v="school"/>
        <tag k="gnis:feature_id" v="602607"/>
        <tag k="gnis:state_id" v="25"/>
        <tag k="ele" v="34"/>
    </node>
</osm>

--------------------------------------------------------------
 Count <node> elements, using accumulator
--------------------------------------------------------------
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:f="function"
                exclude-result-prefixes="#all"
                version="3.0">
   
    <xsl:output method="xml" />
   
    <xsl:mode streamable="yes" />
   
    <xsl:accumulator name="f:node-count"
                     post-descent="f:final-node-count"
                     as="xs:integer"
                     initial-value="0">
        <xsl:accumulator-rule match="node" new-value="$value + 1"/>
    </xsl:accumulator>
   
    <xsl:template match="osm">
        <Massachusetts>
            <xsl:apply-templates select="node"/>
            <NumberOfNodeElements><xsl:value-of select="f:final-node-count()" /></NumberOfNodeElements>
        </Massachusetts>
    </xsl:template>
   
    <xsl:template match="node" />
   
</xsl:stylesheet>

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Streaming using accumulator runs out of memory, streaming using xsl:stream does not

Dimitre Novatchev
 I can run this successfully -- independently from oXygen:

Saxon-EE 9.5.0.1J from Saxonica
Java version 1.7.0_25
Using license serial number XXXXXXXXX
Generating byte code...
Stylesheet compilation time: 332 milliseconds
Processing file:/C:/Program%20Files%20(x86)/Java/jre7/bin/marrowtr.xml
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Execution time: 88ms
Memory used: 10597096
NamePool contents: 5 entries in 5 chains. 6 URIs

And the result is:

<?xml version="1.0"
encoding="UTF-8"?><Massachusetts><NumberOfNodeElements>3</NumberOfNodeElements></Massachusetts>


Cheers,
Dimitre



On Sat, Sep 14, 2013 at 1:50 PM, Costello, Roger L. <[hidden email]> wrote:

> Hi Michael,
>
>> Do you have a small sample of the data I can test on?
>
> Below is a small version of the Open Street Map file.
>
> I think that I am doing something wrong in oXygen (which is what I am using to invoke SAXON and run the XSLT program). If I add a stylesheet PI to the (small) XML document, and then click on the "Apply Transformation" button, I get the result quickly. Of course, it is difficult to add a stylesheet PI to the 3GB XML document because all the editors that I have used die trying to open a 3GB file (even the windows version of VI -- VIM -- crashed).
>
> So instead of inserting a stylesheet PI in the input XML file, in oXygen I opened the XSLT program and then clicked on the "Configure Transformation Scenario" icon, selected the input XML document, and then clicked on the "Apply associated" button. Now the transformation runs forever, even on the below small XML file. Hey George, am I doing something wrong?
>
> /Roger
>
> -----------------------------------------------------------------
>                    Small Input File
> -----------------------------------------------------------------
> <osm version="0.6" generator="osm-extract.pl">
>     <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
>         uid="4732" user="iandees" changeset="774950" lat="42.2017681"
>         lon="-70.7561527">
>         <tag k="gnis:created" v="08/27/2002"/>
>         <tag k="gnis:county_id" v="023"/>
>         <tag k="name" v="Wayland Middle School"/>
>         <tag k="amenity" v="school"/>
>         <tag k="gnis:feature_id" v="602607"/>
>         <tag k="gnis:state_id" v="25"/>
>         <tag k="ele" v="34"/>
>     </node>
>     <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
>         uid="4732" user="iandees" changeset="774950" lat="42.2017681"
>         lon="-70.7561527">
>         <tag k="gnis:created" v="08/27/2002"/>
>         <tag k="gnis:county_id" v="023"/>
>         <tag k="name" v="Scituate Center Central School"/>
>         <tag k="amenity" v="school"/>
>         <tag k="gnis:feature_id" v="602607"/>
>         <tag k="gnis:state_id" v="25"/>
>         <tag k="ele" v="34"/>
>     </node>
>     <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
>         uid="4732" user="iandees" changeset="774950" lat="42.2017681"
>         lon="-70.7561527">
>         <tag k="gnis:created" v="08/27/2002"/>
>         <tag k="gnis:county_id" v="023"/>
>         <tag k="name" v="Walnut Hill School for the Arts"/>
>         <tag k="amenity" v="school"/>
>         <tag k="gnis:feature_id" v="602607"/>
>         <tag k="gnis:state_id" v="25"/>
>         <tag k="ele" v="34"/>
>     </node>
> </osm>
>
> --------------------------------------------------------------
>  Count <node> elements, using accumulator
> --------------------------------------------------------------
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                 xmlns:xs="http://www.w3.org/2001/XMLSchema"
>                 xmlns:f="function"
>                 exclude-result-prefixes="#all"
>                 version="3.0">
>
>     <xsl:output method="xml" />
>
>     <xsl:mode streamable="yes" />
>
>     <xsl:accumulator name="f:node-count"
>                      post-descent="f:final-node-count"
>                      as="xs:integer"
>                      initial-value="0">
>         <xsl:accumulator-rule match="node" new-value="$value + 1"/>
>     </xsl:accumulator>
>
>     <xsl:template match="osm">
>         <Massachusetts>
>             <xsl:apply-templates select="node"/>
>             <NumberOfNodeElements><xsl:value-of select="f:final-node-count()" /></NumberOfNodeElements>
>         </Massachusetts>
>     </xsl:template>
>
>     <xsl:template match="node" />
>
> </xsl:stylesheet>
>
> ------------------------------------------------------------------------------
> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
> http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help



--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Streaming using accumulator runs out of memory, streaming using xsl:stream does not

Radu Coravu
In reply to this post by Costello, Roger L.
Hi Roger,

You stated:

> in oXygen I opened the XSLT program and then clicked on the "Configure Transformation Scenario" icon, selected the input XML document, and then clicked on the "Apply associated" button. Now the transformation runs forever, even on the below small XML file. Hey George, am I doing something wrong?

I tested this with Oxygen 15.0 (Saxon 9.5.0.2 bundled).
I saved the sample XSL and small XML content, created a transformation
scenario on the XSL which used Saxon EE and applied over the XML, the
transformation finished immediately and gave the result:

> <Massachusetts><NumberOfNodeElements>3</NumberOfNodeElements></Massachusetts>

Are you sure the latency can be reproduced even on small XML documents?

Regards,
Radu

Radu Coravu
<oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

On 9/14/2013 11:50 PM, Costello, Roger L. wrote:

> Hi Michael,
>
>> Do you have a small sample of the data I can test on?
>
> Below is a small version of the Open Street Map file.
>
> I think that I am doing something wrong in oXygen (which is what I am using to invoke SAXON and run the XSLT program). If I add a stylesheet PI to the (small) XML document, and then click on the "Apply Transformation" button, I get the result quickly. Of course, it is difficult to add a stylesheet PI to the 3GB XML document because all the editors that I have used die trying to open a 3GB file (even the windows version of VI -- VIM -- crashed).
>
> So instead of inserting a stylesheet PI in the input XML file, in oXygen I opened the XSLT program and then clicked on the "Configure Transformation Scenario" icon, selected the input XML document, and then clicked on the "Apply associated" button. Now the transformation runs forever, even on the below small XML file. Hey George, am I doing something wrong?
>
> /Roger
>
> -----------------------------------------------------------------
>                     Small Input File
> -----------------------------------------------------------------
> <osm version="0.6" generator="osm-extract.pl">
>      <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
>          uid="4732" user="iandees" changeset="774950" lat="42.2017681"
>          lon="-70.7561527">
>          <tag k="gnis:created" v="08/27/2002"/>
>          <tag k="gnis:county_id" v="023"/>
>          <tag k="name" v="Wayland Middle School"/>
>          <tag k="amenity" v="school"/>
>          <tag k="gnis:feature_id" v="602607"/>
>          <tag k="gnis:state_id" v="25"/>
>          <tag k="ele" v="34"/>
>      </node>
>      <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
>          uid="4732" user="iandees" changeset="774950" lat="42.2017681"
>          lon="-70.7561527">
>          <tag k="gnis:created" v="08/27/2002"/>
>          <tag k="gnis:county_id" v="023"/>
>          <tag k="name" v="Scituate Center Central School"/>
>          <tag k="amenity" v="school"/>
>          <tag k="gnis:feature_id" v="602607"/>
>          <tag k="gnis:state_id" v="25"/>
>          <tag k="ele" v="34"/>
>      </node>
>      <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
>          uid="4732" user="iandees" changeset="774950" lat="42.2017681"
>          lon="-70.7561527">
>          <tag k="gnis:created" v="08/27/2002"/>
>          <tag k="gnis:county_id" v="023"/>
>          <tag k="name" v="Walnut Hill School for the Arts"/>
>          <tag k="amenity" v="school"/>
>          <tag k="gnis:feature_id" v="602607"/>
>          <tag k="gnis:state_id" v="25"/>
>          <tag k="ele" v="34"/>
>      </node>
> </osm>
>
> --------------------------------------------------------------
>   Count <node> elements, using accumulator
> --------------------------------------------------------------
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                  xmlns:xs="http://www.w3.org/2001/XMLSchema"
>                  xmlns:f="function"
>                  exclude-result-prefixes="#all"
>                  version="3.0">
>
>      <xsl:output method="xml" />
>
>      <xsl:mode streamable="yes" />
>
>      <xsl:accumulator name="f:node-count"
>                       post-descent="f:final-node-count"
>                       as="xs:integer"
>                       initial-value="0">
>          <xsl:accumulator-rule match="node" new-value="$value + 1"/>
>      </xsl:accumulator>
>
>      <xsl:template match="osm">
>          <Massachusetts>
>              <xsl:apply-templates select="node"/>
>              <NumberOfNodeElements><xsl:value-of select="f:final-node-count()" /></NumberOfNodeElements>
>          </Massachusetts>
>      </xsl:template>
>
>      <xsl:template match="node" />
>
> </xsl:stylesheet>
>
> ------------------------------------------------------------------------------
> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
> http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>


--


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Streaming using accumulator runs out of memory, streaming using xsl:stream does not

Radu Coravu
Hi Roger,

I also tested transforming the entire 3GB XML (you gave a link on the
XML Dev list) with the XSLT from Oxygen.
Indeed there is a problem in Oxygen because when the XSLT is opened in
it, Oxygen tries to learn the structure of the XML document in order to
propose content completion choices when editing XPath values in the XSLT
document, definitely a bad choice here as the XML is huge.

But this can be avoided, in Oxygen's Project view you can add the large
XML document. Then right click it and create a transformation scenario
using Saxon EE for it without having the XSL document opened in Oxygen.
The actual transformation will still take a lot of time but from what I
profiled the time is entirely taken by Saxon's libraries and not
something which can be helped on the Oxygen side.

It gave me this result:

> <Massachusetts><NumberOfNodeElements>10282777</NumberOfNodeElements></Massachusetts>

By the way, if you have Oxygen 15.0 you can also open the XML document
in Oxygen and look inside it, just try to open it and Oxygen will advice
you to optimize loading for Huge files.

Regards,
Radu

Radu Coravu
<oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

On 9/16/2013 10:05 AM, Radu Coravu wrote:

> Hi Roger,
>
> You stated:
>
>> in oXygen I opened the XSLT program and then clicked on the "Configure Transformation Scenario" icon, selected the input XML document, and then clicked on the "Apply associated" button. Now the transformation runs forever, even on the below small XML file. Hey George, am I doing something wrong?
>
> I tested this with Oxygen 15.0 (Saxon 9.5.0.2 bundled).
> I saved the sample XSL and small XML content, created a transformation
> scenario on the XSL which used Saxon EE and applied over the XML, the
> transformation finished immediately and gave the result:
>
>> <Massachusetts><NumberOfNodeElements>3</NumberOfNodeElements></Massachusetts>
>
> Are you sure the latency can be reproduced even on small XML documents?
>
> Regards,
> Radu
>
> Radu Coravu
> <oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
> http://www.oxygenxml.com
>
> On 9/14/2013 11:50 PM, Costello, Roger L. wrote:
>> Hi Michael,
>>
>>> Do you have a small sample of the data I can test on?
>>
>> Below is a small version of the Open Street Map file.
>>
>> I think that I am doing something wrong in oXygen (which is what I am using to invoke SAXON and run the XSLT program). If I add a stylesheet PI to the (small) XML document, and then click on the "Apply Transformation" button, I get the result quickly. Of course, it is difficult to add a stylesheet PI to the 3GB XML document because all the editors that I have used die trying to open a 3GB file (even the windows version of VI -- VIM -- crashed).
>>
>> So instead of inserting a stylesheet PI in the input XML file, in oXygen I opened the XSLT program and then clicked on the "Configure Transformation Scenario" icon, selected the input XML document, and then clicked on the "Apply associated" button. Now the transformation runs forever, even on the below small XML file. Hey George, am I doing something wrong?
>>
>> /Roger
>>
>> -----------------------------------------------------------------
>>                      Small Input File
>> -----------------------------------------------------------------
>> <osm version="0.6" generator="osm-extract.pl">
>>       <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
>>           uid="4732" user="iandees" changeset="774950" lat="42.2017681"
>>           lon="-70.7561527">
>>           <tag k="gnis:created" v="08/27/2002"/>
>>           <tag k="gnis:county_id" v="023"/>
>>           <tag k="name" v="Wayland Middle School"/>
>>           <tag k="amenity" v="school"/>
>>           <tag k="gnis:feature_id" v="602607"/>
>>           <tag k="gnis:state_id" v="25"/>
>>           <tag k="ele" v="34"/>
>>       </node>
>>       <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
>>           uid="4732" user="iandees" changeset="774950" lat="42.2017681"
>>           lon="-70.7561527">
>>           <tag k="gnis:created" v="08/27/2002"/>
>>           <tag k="gnis:county_id" v="023"/>
>>           <tag k="name" v="Scituate Center Central School"/>
>>           <tag k="amenity" v="school"/>
>>           <tag k="gnis:feature_id" v="602607"/>
>>           <tag k="gnis:state_id" v="25"/>
>>           <tag k="ele" v="34"/>
>>       </node>
>>       <node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
>>           uid="4732" user="iandees" changeset="774950" lat="42.2017681"
>>           lon="-70.7561527">
>>           <tag k="gnis:created" v="08/27/2002"/>
>>           <tag k="gnis:county_id" v="023"/>
>>           <tag k="name" v="Walnut Hill School for the Arts"/>
>>           <tag k="amenity" v="school"/>
>>           <tag k="gnis:feature_id" v="602607"/>
>>           <tag k="gnis:state_id" v="25"/>
>>           <tag k="ele" v="34"/>
>>       </node>
>> </osm>
>>
>> --------------------------------------------------------------
>>    Count <node> elements, using accumulator
>> --------------------------------------------------------------
>> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>>                   xmlns:xs="http://www.w3.org/2001/XMLSchema"
>>                   xmlns:f="function"
>>                   exclude-result-prefixes="#all"
>>                   version="3.0">
>>
>>       <xsl:output method="xml" />
>>
>>       <xsl:mode streamable="yes" />
>>
>>       <xsl:accumulator name="f:node-count"
>>                        post-descent="f:final-node-count"
>>                        as="xs:integer"
>>                        initial-value="0">
>>           <xsl:accumulator-rule match="node" new-value="$value + 1"/>
>>       </xsl:accumulator>
>>
>>       <xsl:template match="osm">
>>           <Massachusetts>
>>               <xsl:apply-templates select="node"/>
>>               <NumberOfNodeElements><xsl:value-of select="f:final-node-count()" /></NumberOfNodeElements>
>>           </Massachusetts>
>>       </xsl:template>
>>
>>       <xsl:template match="node" />
>>
>> </xsl:stylesheet>
>>
>> ------------------------------------------------------------------------------
>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13.
>> http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>
>


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help