Memory management with Saxon 8

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory management with Saxon 8

michella

Hi all,

I have a big problem while processing multiple XML files :

Actually, with 140 XML files (each 600Ko), the xsl-t processing requires
about 650MB of memory. I tryied to redesign the stylesheet in a way it
would chunk the overall process by creating temporary document-result(),
and reopen them later, with the use of specific templates.
Unfortunately, this won't be useful.
The portion of code below is a loop, where it will at each step 1. open
an xml document, 2. do something 3. Store the result in a new xml
document

I would expect the processor to close the opened document at each new
step, because it wouldn't be used anymore within this template. But it
seems to stick into memory. In this case, it loops 140 times...

Any good hint would be greatful... I finally need to have my stylesheet
process about 530 xml files... and my computer has only 1GB Ram ;-)

Best regards

Lawrence Michel

--------------------------------
(...)
<xsl:template match="/">
                <xsl:for-each
select="$Property_File/Vergleichseigenschaften/Input"> (here the list of
all fileName elements(~#140)
                        <xsl:call-template name="Import">
                                <xsl:with-param name="fileName"
select="@Input_XML"/> (the filename only, such as blabla.xml )
                        </xsl:call-template>
                </xsl:for-each>
</xsl:template>

<xsl:template name="Import">
                <xsl:param name="fileName" select="/."/>
                <xsl:variable name="Import_Document"
select="document(concat('../Input/',$fileName,'.xml'))"/>
                <xsl:variable name="dataBasis">
                        <xsl:element name="dataBasis">
                                <xsl:for-each
select="$Import_Document/x:Workbook/x:Worksheet[@ss:Name='Tabelle1']/x:T
able">
                                        (CREATING NEW NODESET)
                                </xsl:for-each>
                        </xsl:element>
                </xsl:variable>
                <xsl:result-document
href="{concat('../temp/imported_',$fileName,'.xml')}">
                        <xsl:call-template name="NodesetToDocument">
                                (Will store the dataBasis variable
nodeset into an xml document)
                                <xsl:with-param name="Nodeset"
select="$dataBasis/dataBasis"/>
                        </xsl:call-template>
                </xsl:result-document>
        </xsl:template>
(...)


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: Memory management with Saxon 8

Michael Kay
The document() function saves the document in memory in case you call
document() again with the same URI, in which case the spec requires that the
same nodes are returned.

You can avoid this effect by calling saxon:discard-document. It's best to
call this right away when loading the document: change document(XYZ) to
saxon:discard-document(document(XYZ)). The document will then be available
for garbage collection as soon as there are no remaining references to it.

Michael Kay
http://www.saxonica.com/
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> [hidden email]
> Sent: 16 February 2006 17:37
> To: [hidden email]
> Subject: [saxon] Memory management with Saxon 8
>
>
> Hi all,
>
> I have a big problem while processing multiple XML files :
>
> Actually, with 140 XML files (each 600Ko), the xsl-t
> processing requires
> about 650MB of memory. I tryied to redesign the stylesheet in a way it
> would chunk the overall process by creating temporary
> document-result(),
> and reopen them later, with the use of specific templates.
> Unfortunately, this won't be useful.
> The portion of code below is a loop, where it will at each
> step 1. open
> an xml document, 2. do something 3. Store the result in a new xml
> document
>
> I would expect the processor to close the opened document at each new
> step, because it wouldn't be used anymore within this template. But it
> seems to stick into memory. In this case, it loops 140 times...
>
> Any good hint would be greatful... I finally need to have my
> stylesheet
> process about 530 xml files... and my computer has only 1GB Ram ;-)
>
> Best regards
>
> Lawrence Michel
>
> --------------------------------
> (...)
> <xsl:template match="/">
> <xsl:for-each
> select="$Property_File/Vergleichseigenschaften/Input"> (here
> the list of
> all fileName elements(~#140)
> <xsl:call-template name="Import">
> <xsl:with-param name="fileName"
> select="@Input_XML"/> (the filename only, such as blabla.xml )
> </xsl:call-template>
> </xsl:for-each>
> </xsl:template>
>
> <xsl:template name="Import">
> <xsl:param name="fileName" select="/."/>
> <xsl:variable name="Import_Document"
> select="document(concat('../Input/',$fileName,'.xml'))"/>
> <xsl:variable name="dataBasis">
> <xsl:element name="dataBasis">
> <xsl:for-each
> select="$Import_Document/x:Workbook/x:Worksheet[@ss:Name='Tabe
> lle1']/x:T
> able">
> (CREATING NEW NODESET)
> </xsl:for-each>
> </xsl:element>
> </xsl:variable>
> <xsl:result-document
> href="{concat('../temp/imported_',$fileName,'.xml')}">
> <xsl:call-template name="NodesetToDocument">
> (Will store the dataBasis variable
> nodeset into an xml document)
> <xsl:with-param name="Nodeset"
> select="$dataBasis/dataBasis"/>
> </xsl:call-template>
> </xsl:result-document>
> </xsl:template>
> (...)
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep
> through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  
> DOWNLOAD SPLUNK!
> <a href="http://sel.as-us.falkag.net/sel?cmd=k&kid3432&bid#0486&dat1642">http://sel.as-us.falkag.net/sel?cmd=k&kid3432&bid#0486&dat1642
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: Memory management with Saxon 8

Chris Simmons
Michael Kay wrote:

>The document() function saves the document in memory in case you call
>document() again with the same URI, in which case the spec requires that the
>same nodes are returned.
>
>You can avoid this effect by calling saxon:discard-document. It's best to
>call this right away when loading the document: change document(XYZ) to
>saxon:discard-document(document(XYZ)). The document will then be available
>for garbage collection as soon as there are no remaining references to it.
>
>Michael Kay
>http://www.saxonica.com/
>
>
I think I've seen similar issues when processing lots of files.  Not
withstanding the spec, wouldn't it be appropriate to use java
SoftReference's to the documents?  This would ensure that the same
document is returned unless nothing is referencing the document, in
which case how can it possibly make any difference?  Perhaps this
wouldn't be appropriate default behaviour, but it would be a useful
alternative to having to use saxon:discard-document.

Just a suggestion :)

Regards,

Chris Simmons.


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: Memory management with Saxon 8

Michael Kay
>
> >The document() function saves the document in memory in case you call
> >document() again with the same URI, in which case the spec
> requires that the
> >same nodes are returned.
> >
> >
> I think I've seen similar issues when processing lots of files.  Not
> withstanding the spec, wouldn't it be appropriate to use java
> SoftReference's to the documents?  This would ensure that the same
> document is returned unless nothing is referencing the document, in
> which case how can it possibly make any difference?

If the same document is read twice, the spec guarantees that you get the
same nodes back. It would be possible to remember the node identity and
reallocate the same identifiers on a subsequent read, but there's no way of
locking the file on disc (out there in the wild wild web...) and ensuring
its contents haven't change.

Another possible mechanism (but it seems cumbersome) is for Saxon to detect
when memory is getting low and page out the saved documents to a private
location on disc, e.g. in PTree format.

> Perhaps this
> wouldn't be appropriate default behaviour, but it would be a useful
> alternative to having to use saxon:discard-document.

There's already been some debate about whether saxon:discard-document was
conformant to the spec. There's a similar problem (or worse) with Saxon's
implementation of collection(). I argued that the spec says that extension
functions can do whatever they like; others argued that they're not allowed
to cause other constructs to behave in a non-conformant way. To resolve the
issue I proposed a change to the spec, which has recently been agreed: see

http://www.w3.org/Bugs/Public/show_bug.cgi?id=2553

In the light of this decision, I will be reviewing how best to control this
in Saxon. I may well replace saxon:discard-document() with a command-line
switch that relaxes the isolation level for the whole transformation.

The one thing I won't do is to make the product behave in a non-conformant
way.

Michael Kay
http://www.saxonica.com/





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Behavoiur of Text nodes??

Sascha Punzmann
Hi,

for the last few month I worked with xml structures like these:

<para>
   For these tasks the menu
   <italic>Test</italic>
   provides the function
   <italic>Mask</italic>
</para>

Until now the text was formated (see below) in the order of appearance
in the tree! That was at least for my understanding of the XML tree
correct, because the element para has 4 child nodes
(text(),italic,text(),italic)
*
Example:*
For these tasks the menu "Test" provides the funciton "Mask"

At the moment SAXON seems to be in a different mode. The element para in
this case has only 3 child nodes (text(),italic,italic) and is printed
as following:

*Example:*
For these tasks the menu provides the funciton "Test" "Mask"

As you can see SAXON now interprets the various text nodes in the tree
as one big text node. Does any of you know why this happens, or how I
can switch back to the way it was before?

regards Sascha




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: Behavoiur of Text nodes??

Michael Kay
I can't relate this to any change that I'm aware of. There have been some
changes in the handling of whitespace text nodes, but that doesn't seem
relevant here. It looks to me as if you're using some expression that isn't
sorting the nodes into document order for some reason. You'll have to show
me the code that demonstrates the problem.

Michael Kay
http://www.saxonica.com/


> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> Sascha Punzmann
> Sent: 18 February 2006 17:22
> To: [hidden email]
> Subject: [saxon] Behavoiur of Text nodes??
>
> Hi,
>
> for the last few month I worked with xml structures like these:
>
> <para>
>    For these tasks the menu
>    <italic>Test</italic>
>    provides the function
>    <italic>Mask</italic>
> </para>
>
> Until now the text was formated (see below) in the order of
> appearance
> in the tree! That was at least for my understanding of the XML tree
> correct, because the element para has 4 child nodes
> (text(),italic,text(),italic)
> *
> Example:*
> For these tasks the menu "Test" provides the funciton "Mask"
>
> At the moment SAXON seems to be in a different mode. The
> element para in
> this case has only 3 child nodes (text(),italic,italic) and
> is printed
> as following:
>
> *Example:*
> For these tasks the menu provides the funciton "Test" "Mask"
>
> As you can see SAXON now interprets the various text nodes in
> the tree
> as one big text node. Does any of you know why this happens, or how I
> can switch back to the way it was before?
>
> regards Sascha
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep
> through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  
> DOWNLOAD SPLUNK!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&
> dat=121642
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help