Help with Document Pool cache

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with Document Pool cache

Jorge Williams-2
Hello all,

We have an app that does a large number of  transformations.  These transformations  reach out to external XML documents using “doc(…)”, a transformer is reused between each request.  Some of these external documents are transient, they will only be accessed once and don’t need to be cached.  Some are lookup tables and they will be accessed multiple times and would benefit from caching.

Unfortunately right now we have to make a tough choice — either:

1. Don’t call clearDocumentPool on the controller and risk running out of memory (there are millions of transient documents)
2. Call clearDocumentPool between transformations and take a hit recomputing indexes.

Is there a better option I’m missing?

If not, it wuld be nice if we had more control over what got cached and what didn’t. Maybe, for example, we could pass a query parameter in the SystemId telling saxon not to cache a document:  http://foo.rackspace.com/mydoc.xml?saxon:dontCache=true. Another possibility is to have the pool use weak references so that least frequently use documents can be evicted from the cache.

Thanks,

-jOrGe W.


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Help with Document Pool cache

Michael Kay
I think the best option would be to implement your own cache within a URIResolver. The URIResolver has to return a Source; a Saxon NodeInfo is a Source, so if a document is present in the cache it can simply return the DocumentInfo at its root. Then you can use clearDocumentPool() between transformations (or perhaps better, simply create a new Transformer) and do the document caching at application level.

One messiness to watch out for is strip-space/preserve-space. If different stylesheets use the same source documents with different whitespace-stripping options, considerable costs can be incurred.

Michael Kay
Saxonica
[hidden email]
+44 (0) 118 946 5893




On 8 Sep 2014, at 18:35, Jorge Williams <[hidden email]> wrote:

> Hello all,
>
> We have an app that does a large number of  transformations.  These transformations  reach out to external XML documents using “doc(…)”, a transformer is reused between each request.  Some of these external documents are transient, they will only be accessed once and don’t need to be cached.  Some are lookup tables and they will be accessed multiple times and would benefit from caching.
>
> Unfortunately right now we have to make a tough choice — either:
>
> 1. Don’t call clearDocumentPool on the controller and risk running out of memory (there are millions of transient documents)
> 2. Call clearDocumentPool between transformations and take a hit recomputing indexes.
>
> Is there a better option I’m missing?
>
> If not, it wuld be nice if we had more control over what got cached and what didn’t. Maybe, for example, we could pass a query parameter in the SystemId telling saxon not to cache a document:  http://foo.rackspace.com/mydoc.xml?saxon:dontCache=true. Another possibility is to have the pool use weak references so that least frequently use documents can be evicted from the cache.
>
> Thanks,
>
> -jOrGe W.
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Help with Document Pool cache

Jorge Williams-2

So just to clarify, if I cache a NodeInfo, then that will cache any indexes on that document — correct?

On a related note, what happens if I embed my lookup tables on the XSLT itself.  Will those indexes get cached and reused even if I call clearDocumentPool() between transformations?

Thanks,

-jOrGe W.


On Sep 8, 2014, at 5:01 PM, Michael Kay <[hidden email]> wrote:

> I think the best option would be to implement your own cache within a URIResolver. The URIResolver has to return a Source; a Saxon NodeInfo is a Source, so if a document is present in the cache it can simply return the DocumentInfo at its root. Then you can use clearDocumentPool() between transformations (or perhaps better, simply create a new Transformer) and do the document caching at application level.
>
> One messiness to watch out for is strip-space/preserve-space. If different stylesheets use the same source documents with different whitespace-stripping options, considerable costs can be incurred.
>
> Michael Kay
> Saxonica
> [hidden email]
> +44 (0) 118 946 5893
>
>
>
>
> On 8 Sep 2014, at 18:35, Jorge Williams <[hidden email]> wrote:
>
>> Hello all,
>>
>> We have an app that does a large number of  transformations.  These transformations  reach out to external XML documents using “doc(…)”, a transformer is reused between each request.  Some of these external documents are transient, they will only be accessed once and don’t need to be cached.  Some are lookup tables and they will be accessed multiple times and would benefit from caching.
>>
>> Unfortunately right now we have to make a tough choice — either:
>>
>> 1. Don’t call clearDocumentPool on the controller and risk running out of memory (there are millions of transient documents)
>> 2. Call clearDocumentPool between transformations and take a hit recomputing indexes.
>>
>> Is there a better option I’m missing?
>>
>> If not, it wuld be nice if we had more control over what got cached and what didn’t. Maybe, for example, we could pass a query parameter in the SystemId telling saxon not to cache a document:  http://foo.rackspace.com/mydoc.xml?saxon:dontCache=true. Another possibility is to have the pool use weak references so that least frequently use documents can be evicted from the cache.
>>
>> Thanks,
>>
>> -jOrGe W.
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce
>> Perforce version control. Predictably reliable.
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help 
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help