Schema Component Model serialization backwards compatibility.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Schema Component Model serialization backwards compatibility.

Chris Simmons-2
I'm using Saxon to serialize a Schema Component Model using
SchemaManager exportComponents and importComponents which is working fine.

What I would like to know is what backwards/forwards compatibility
guarantees there are for this format so that I can know whether its safe
to share should we upgrade Saxon in the future.

Regards,

Chris Simmons.

--
Chris Simmons, CoreFiling Limited
http://www.corefiling.com
Phone: +44-1865-203192


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Schema Component Model serialization backwards compatibility.

Michael Kay
We've never had to change the format yet, and I don't foresee any circumstances in which we would want to do so. But committing to "never" would be a little rash, especially as there's always a conversion route to a new format if need be.

I'm pleased to hear that you're using SCM and would be interested in any feedback on the benefits. It's one of those things that sits quietly in the product and we don't really know if anyone is using it...

Michael Kay
Saxonica
[hidden email]
+44 (0) 118 946 5893




On 12 Nov 2014, at 12:19, Chris Simmons <[hidden email]> wrote:

> I'm using Saxon to serialize a Schema Component Model using
> SchemaManager exportComponents and importComponents which is working fine.
>
> What I would like to know is what backwards/forwards compatibility
> guarantees there are for this format so that I can know whether its safe
> to share should we upgrade Saxon in the future.
>
> Regards,
>
> Chris Simmons.
>
> --
> Chris Simmons, CoreFiling Limited
> http://www.corefiling.com
> Phone: +44-1865-203192
>
>
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Schema Component Model serialization backwards compatibility.

Chris Simmons-2
On 12/11/14 16:20, Michael Kay wrote:
> We've never had to change the format yet, and I don't foresee any circumstances in which we would want to do so. But committing to "never" would be a little rash, especially as there's always a conversion route to a new format if need be.
That's helpful to know, it was pretty much what I expected.
>
> I'm pleased to hear that you're using SCM and would be interested in any feedback on the benefits. It's one of those things that sits quietly in the product and we don't really know if anyone is using it...
>
> Michael Kay
> Saxonica
> [hidden email]
> +44 (0) 118 946 5893
Loading a cached SCM has made a huge performance difference, building
the Saxon schema model was taking us roughly fifteen seconds but loading
one we made earlier is taking less than half a second.  This is only
feasible though because so far the clients that need Saxon can't extend
(i.e. import etc).  Some of our clients don't follow this pattern but so
far they haven't needed Saxon.

I found a query posted here from a colleague of mine about the SCM and
whether composing multiple schemas would be possible.

http://sourceforge.net/p/saxon/mailman/message/31039356/

Sadly the schemas in question do make (copious) use of substitution
groups so I suspect that they can't be easily decomposed.

Presumably Saxon is converting the substitution group head into a
choice?  Have you looked at how Xerces does this?  I believe that they
don't expand substitution groups like this which means that the head of
a substitution group doesn't have a (back) reference to everything in
the substitution group.  Instead there's a separate substitution group
handler that's composed (quickly).  The main advantage of this is that
because there aren't any back references it makes caching of individual
schema documents feasible.  Possibly they also get a speed-gain because
they not strictly creating a full DFA?  I'm not sure exactly how it
works but presumably they try the element's name with the DFA, if that
doesn't work try again with the substitution group ancestor if any.

I know Eclipse's XSD fairly well and they appear to have gone down a
similar route to Saxon and created DFA's with expanded substitution
groups and we see similarly bad performance there, with the additional
down side that we can't cache individual schema documents again due to
the back-references.  I can see the appeal as the resulting model will
closely match the XML schema specification but its not so good from a
performance perspective.

Chris Simmons.

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Schema Component Model serialization backwards compatibility.

Michael Kay
I don't know, first you say you want stability, then you suggest all sorts of improvements... ;-).

>
> Presumably Saxon is converting the substitution group head into a
> choice?  

Yes. The saved SCM includes not only the actual schema component model as defined in the XSD spec, but also the compiled finite state automata, and these follow the Thomson/Tobin algorithm where substitution groups are expanded as choices. This does have the consequence that one cannot add to a substitution group after it is compiled. This not only affects the (persistent) SCM, of course, but also in-memory compiled schemas: Saxon "seals" a namespace as soon as complex types in that namespace are used for validation, and once a namespace is sealed, you aren't allowed to derive new complex types by extension, or add to the content of substitution groups. It would certainly make things a bit more flexible, perhaps with a small cost in performance, to handle substitution groups dynamically. Any change to a schema after it has been used is problematic, even if the only effect is to make instances valid that were not valid before. But there are already other things in that category that we allow, such as adding new global element declarations (which changes the effect of <xs:any processContents="lax">), and that one in fact can make previously valid instances invalid, which is worse.

> Have you looked at how Xerces does this?

No, I haven't studied the internals of Xerces at all.

>  I believe that they
> don't expand substitution groups like this ...  Possibly they also get a speed-gain because
> they not strictly creating a full DFA?  I'm not sure exactly how it
> works but presumably they try the element's name with the DFA, if that
> doesn't work try again with the substitution group ancestor if any.

It would certainly be possible in principle to build a DFA that contains only the name (or in Saxon's case, fingerprint) of the head element, and do a dynamic check whether the instance element belongs to that substitution group. The main difficulty for Saxon is not the instance validation, but verification that the extended substitution group doesn't violate schema constraints (such as UPA), which Saxon currently handles by analysis of the DFA.
>
> I know Eclipse's XSD fairly well and they appear to have gone down a
> similar route to Saxon

I wasn't actually aware that Eclipse's XSD processor differed from Xerces! You live and learn.

The main performance problem with the Saxon DFA is with finite occurs limits, where the Thomson/Tobin algorithm is very expensive in both memory and time, especially at DFA construction time but also during validation. Interestingly, in Saxon 9.6 I've rewritten the regex engine so it no longer uses a DFA, but rather uses a simple recursive interpreter pattern on the expression tree formed by parsing the regular expression. It would certainly be possible to use a similar approach for schema validation, though I'm not immediately sure how the schema constraints (such as checking for valid restrictions) would be handled. However, I'm not highly motivated to do such a radical rewrite at the moment: it's not clear that the disadvantages of the current approach merit it.

One thing I have been considering for performance improvement, which is a less radical change, is to build the DFAs for complex types lazily. It's common for many types in a schema to be unused in any given validation episode, so the cost of building the DFAs is wasted - except that it's necessary to do a strict check that the schema is completely valid.

Regards,

Michael Kay
Saxonica
 

> and created DFA's with expanded substitution
> groups and we see similarly bad performance there, with the additional
> down side that we can't cache individual schema documents again due to
> the back-references.  I can see the appeal as the resulting model will
> closely match the XML schema specification but its not so good from a
> performance perspective.
>
> Chris Simmons.
>
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help