command line batch processing cache stylesheet and dtd

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

command line batch processing cache stylesheet and dtd

Andrew Welch
I'm transforming a directory of XML using Saxon B 8.5.1. from the
command line. Each XML file references entity files on the web:

<!ENTITY % HTMLlat1 PUBLIC '-//W3C//ENTITIES Latin 1 for XHTML//EN'
'http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent '> %HTMLlat1; <!ENTITY
% HTMLspecial PUBLIC '-//W3C//ENTITIES Special for XHTML//EN'
'http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent '> %HTMLspecial;
<!ENTITY % HTMLsymbol PUBLIC '-//W3C//ENTITIES Symbols for XHTML//EN'
'http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent '> %HTMLsymbol; ]>

- Do these get cached for subsequent transforms?  It seems like they
don't, as transform times vary quite a bit (which I'm putting down to
fetching the files at the moment - is a new parser created for each
XML file?)

- Does the stylesheet itself get cached as Saxon knows it's processing
a directory?

- Does the XML parser fetch all the entity files defined in the XML
file, regardless of whether any entities are actually used in the XML?
 I would think they must be.  Which means it's not a good idea to
include the definitions in all your XML, only those XML files that
actually use the entities.

thanks
andrew


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: command line batch processing cache stylesheet and dtd

Colin Paul Adams
>>>>> "Andrew" == andrew welch <[hidden email]> writes:

    Andrew> I'm transforming a directory of XML using Saxon B
    Andrew> 8.5.1. from the command line. Each XML file references
    Andrew> entity files on the web:

    Andrew> - Does the XML parser fetch all the entity files defined
    Andrew> in the XML file, regardless of whether any entities are
    Andrew> actually used in the XML?  I would think they must be.

Yes, if it reads the DTD at all (which it must be doing so in your
case).

    Andrew> Which means it's not a good idea to include the
    Andrew> definitions in all your XML, only those XML files that
    Andrew> actually use the entities.

This is a classic case for using an OASIS XML Catalog - download
copies of these entity files to your local system, set up a catalog to
point to the local copies, and use Norman Walsh's Catalog Resolver
(part of Apache Commons, these days, I believe).

If this isn't acceptable, then an alternative would be to write a
caching entity resolver.
--
Colin Adams
Preston Lancashire


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: command line batch processing cache stylesheet and dtd

Michael Kay
In reply to this post by Andrew Welch
>
> I'm transforming a directory of XML using Saxon B 8.5.1. from the
> command line. Each XML file references entity files on the web:
>
> <!ENTITY % HTMLlat1 PUBLIC '-//W3C//ENTITIES Latin 1 for XHTML//EN'
> 'http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent '> %HTMLlat1; <!ENTITY
> % HTMLspecial PUBLIC '-//W3C//ENTITIES Special for XHTML//EN'
> 'http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent '> %HTMLspecial;
> <!ENTITY % HTMLsymbol PUBLIC '-//W3C//ENTITIES Symbols for XHTML//EN'
> 'http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent '> %HTMLsymbol; ]>
>
> - Do these get cached for subsequent transforms?  It seems like they
> don't, as transform times vary quite a bit (which I'm putting down to
> fetching the files at the moment - is a new parser created for each
> XML file?)

I think the parser should now be reused, but would need to do some
investigation to be 100% sure. I've no idea whether the parser will cache
entity files if used repeatedly across multiple parses.
>
> - Does the stylesheet itself get cached as Saxon knows it's processing
> a directory?

Yes.
>
> - Does the XML parser fetch all the entity files defined in the XML
> file, regardless of whether any entities are actually used in the XML?

That's a question to direct at the people who support the parser...

Michael Kay
http://www.saxonica.com/




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Please unsubsribe me from the List.

Muhammad Masoom Alam
Thanks
MA
----- Original Message -----
From: "Michael Kay" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, September 27, 2005 11:24 AM
Subject: RE: [saxon] command line batch processing cache stylesheet and dtd


>>
>> I'm transforming a directory of XML using Saxon B 8.5.1. from the
>> command line. Each XML file references entity files on the web:
>>
>> <!ENTITY % HTMLlat1 PUBLIC '-//W3C//ENTITIES Latin 1 for XHTML//EN'
>> 'http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent '> %HTMLlat1; <!ENTITY
>> % HTMLspecial PUBLIC '-//W3C//ENTITIES Special for XHTML//EN'
>> 'http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent '> %HTMLspecial;
>> <!ENTITY % HTMLsymbol PUBLIC '-//W3C//ENTITIES Symbols for XHTML//EN'
>> 'http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent '> %HTMLsymbol; ]>
>>
>> - Do these get cached for subsequent transforms?  It seems like they
>> don't, as transform times vary quite a bit (which I'm putting down to
>> fetching the files at the moment - is a new parser created for each
>> XML file?)
>
> I think the parser should now be reused, but would need to do some
> investigation to be 100% sure. I've no idea whether the parser will cache
> entity files if used repeatedly across multiple parses.
>>
>> - Does the stylesheet itself get cached as Saxon knows it's processing
>> a directory?
>
> Yes.
>>
>> - Does the XML parser fetch all the entity files defined in the XML
>> file, regardless of whether any entities are actually used in the XML?
>
> That's a question to direct at the people who support the parser...
>
> Michael Kay
> http://www.saxonica.com/
>
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by:
> Tame your development challenges with Apache's Geronimo App Server.
> Download
> it for free - -and be entered to win a 42" plasma tv or your very own
> Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: command line batch processing cache stylesheet and dtd

Andrew Welch
In reply to this post by Colin Paul Adams
On 27 Sep 2005 10:17:22 +0100, Colin Paul Adams
<[hidden email]> wrote:

> >>>>> "Andrew" == andrew welch <[hidden email]> writes:
>
>     Andrew> I'm transforming a directory of XML using Saxon B
>     Andrew> 8.5.1. from the command line. Each XML file references
>     Andrew> entity files on the web:
>
>     Andrew> - Does the XML parser fetch all the entity files defined
>     Andrew> in the XML file, regardless of whether any entities are
>     Andrew> actually used in the XML?  I would think they must be.
>
> Yes, if it reads the DTD at all (which it must be doing so in your
> case).
>
>     Andrew> Which means it's not a good idea to include the
>     Andrew> definitions in all your XML, only those XML files that
>     Andrew> actually use the entities.
>
> This is a classic case for using an OASIS XML Catalog - download
> copies of these entity files to your local system, set up a catalog to
> point to the local copies, and use Norman Walsh's Catalog Resolver
> (part of Apache Commons, these days, I believe).

Hi Colin,

Yes I've thought about using an XML Catalogue - can it be used with
Saxon from the command line?  I would much rather have the entities
resolved before they get to me, as most of them are simple 1 for 1
human readable entity references to unicode character references eg
&someval;  ->  &#123; so they could easily be resolved out.  And I'm
pretty sure whatever is generating the XML is putting the entity
definitions in regardless of content, which is A Bad Thing and should
really be sorted.

> If this isn't acceptable, then an alternative would be to write a
> caching entity resolver.

Yes, all the transforms will be done in Java soon, but for now I'm
using the command line and it's quite enjoyable surprisingly.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: command line batch processing cache stylesheet and dtd

Colin Paul Adams
>>>>> "Andrew" == andrew welch <[hidden email]> writes:

    Andrew> Yes I've thought about using an XML Catalogue - can it be
    Andrew> used with Saxon from the command line?

I would think this can be done using the -x option.

--
Colin Adams
Preston Lancashire


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: command line batch processing cache stylesheet and dtd

Michael Kay
>
> >>>>> "Andrew" == andrew welch <[hidden email]> writes:
>
>     Andrew> Yes I've thought about using an XML Catalogue - can it be
>     Andrew> used with Saxon from the command line?
>
> I would think this can be done using the -x option.
>

It can't be done without some Java coding, I believe. You can write an
XMLReader class that wraps a real parser, preconfiguring it to use Oasis
catalogs, and then you can invoke this XMLReader using -x. Alternatively,
chances are someone has already written this Java class, in which case you
can do it without any Java coding...

Michael Kay
http://www.saxonica.com/




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: command line batch processing cache stylesheet and dtd

Colin Paul Adams
>>>>> "Michael" == Michael Kay <[hidden email]> writes:

    >>  >>>>> "Andrew" == andrew welch <[hidden email]>
    >> writes:
    >>
    Andrew> Yes I've thought about using an XML Catalogue - can it be
    Andrew> used with Saxon from the command line?
    >>  I would think this can be done using the -x option.
    >>

    Michael> It can't be done without some Java coding, I believe. You
    Michael> can write an XMLReader class that wraps a real parser,
    Michael> preconfiguring it to use Oasis catalogs, and then you can
    Michael> invoke this XMLReader using -x. Alternatively, chances
    Michael> are someone has already written this Java class, in which
    Michael> case you can do it without any Java coding...

Well, it's been a long time since I looked at the package, but I
thought I remembered seeing such a class there.
--
Colin Adams
Preston Lancashire


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: command line batch processing cache stylesheet and dtd

Dave Pawson-2
In reply to this post by Michael Kay
On 27/09/05, Michael Kay <[hidden email]> wrote:

> >
> > >>>>> "Andrew" == andrew welch <[hidden email]> writes:
> >
> >     Andrew> Yes I've thought about using an XML Catalogue - can it be
> >     Andrew> used with Saxon from the command line?
> >
> > I would think this can be done using the -x option.
> >
>
> It can't be done without some Java coding, I believe.

I use it Mike.

java  -cp \sgml;\myjava\saxon653.jar;\myjava\xercesImpl.jar;\myjava\resolver.jar;\sgml\nw\docbook-xsl\extensions\saxon643.jar
 com.icl.saxon.StyleSheet   -o %3  -x
org.apache.xml.resolver.tools.ResolvingXMLReader -y
org.apache.xml.resolver.tools.ResolvingXMLReader -r
org.apache.xml.resolver.tools.CatalogResolver  -w1 %1  %2
"saxon.extensions=1"  %4 %5 %6

sort out the paths, but that's been working for a while with oasis catalogs
at version 1.
Unsure if apache have upgraded to 1.1 yet.
thanks to Bob Stayton btw.

HTH DaveP


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Loading...