A couple beginner's issues with schema-aware processing

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

A couple beginner's issues with schema-aware processing

Cass Costello
Hello, all. 

First, thanks to everyone who contributes to this list, and congratulations, Mike, on the CR milestones and 8.6. You probably need a vacation. :)

I grabbed the eval version of Saxon-SA and have been attempting to write some code to wrap my head around the mechanics of schema-aware processing.  Specfically, I'm interested in type-based xpath expressions, and have constructed a couple tests that should a) parse an xml stream into a supported object model ( I'm playing with tinytrees and XOM ), and b) execute an xpath expression that should return an element of a specific schema type.

I've bumbed into 2 issues.  First, though I've specified a schemaLocation in my test xml, the schema is not auto-loaded during the parsing process.  I see "unknown type" errors unless I manually load the associated schema via a SchemaAwareConfiguration.  Once loaded, however, everything works as expected.

Test case...

    public void testSaxonXpathStuff() throws Exception {
       
        SchemaAwareConfiguration config = new SchemaAwareConfiguration();
       
        //why is this necessary given xsi:schemaLocation in instance xml?
        File testSchema = new File( "test/resources/test.xsd" );
        config.addSchemaSource( new StreamSource( testSchema ) );
       
        config.setSchemaValidationMode( Validation. STRICT );    
       
        XPathFactory xpf = XPathFactory.newInstance( NamespaceConstant.OBJECT_MODEL_SAXON );
        XPath xpathObj = xpf.newXPath();
        XPathEvaluator xpe = ( XPathEvaluator) xpathObj;
        StandaloneContext saCtx = new StandaloneContext( config );
       
        saCtx.declareNamespace( "sh", "uri://www.test.com " );
        xpe.setStaticContext( saCtx );
       
        XPathExpression xpathObjExp = xpathObj.compile( (String) "//element( *, sh:nameType )[1]" );   
       
        NodeInfo node = TinyBuilder.build( new StreamSource( new FileInputStream( "test/resources/test.xml" ) ),
                null,
                config );
       
        String result = (String) xpathObjExp.evaluate( node, XPathConstants.STRING );  
       
        assertEquals( "Cass", result );
    }

Second, moving the code from using tinytress to XOM objects results in the expression returning nothing at all.  I assume that there's some step I'm missing around parsing, but I don't know where to go next.  Any help or insight would be apprecated.

    public void testXomXpathStuff() throws Exception {
       
        XMLReader xerces = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
        xerces.setFeature("<a href="http://apache.org/xml/features/validation/schema" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"> http://apache.org/xml/features/validation/schema ", true);
       
        Builder parser = new Builder(xerces, true);
        Document doc = parser.build("test/resources/test.xml" );
       
        SchemaAwareConfiguration config = new SchemaAwareConfiguration();

                  //why is this necessary given xsi:schemaLocation in instance xml?
        File testSchema = new File( "test/resources/test.xsd" );
        config.addSchemaSource( new StreamSource( testSchema ) );

        config.setSchemaValidationMode( Validation.STRICT );    
       
        XPathFactory xpf = XPathFactory.newInstance( NamespaceConstant.OBJECT_MODEL_XOM );
        XPath xpathObj = xpf.newXPath();
        XPathEvaluator xpe = ( XPathEvaluator) xpathObj;
        StandaloneContext sacont = new StandaloneContext( config );
       
        sacont.declareNamespace( "sh", "uri://www.test.com " );

        xpe.setStaticContext( sacont );
        XPathExpression xpathObjExp = xpathObj.compile( (String) "//element( *, sh:nameType )[1]" );   
       
        String result = (String) xpathObjExp.evaluate( doc, XPathConstants.STRING );
       
        //fails - result is ""
        assertEquals( "Cass", result );
    }

I've attached the xml and xsd. 

Thanks for your time,
-Cass


test.xml (266 bytes) Download Attachment
test.xsd (627 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: A couple beginner's issues with schema-aware processing

Michael Kay
To get your first example to run I had to remove a trailing space in the namespace URI
 
sacont.declareNamespace( "sh", "uri://www.test.com " );
I don't think that's particularly relevant to your questions, though it's an interesting debating topic.
 
When I remove the config.addSchemaSource() call, I get the error
 
No schema has been imported for namespace 'uri://www.test.com'
 
This is because your XPath expression contains a reference to a type defined in this schema, sh:nameType. The source document will still be validated using the schema identified in its xsi:schemaLocation attribute, but the source document isn't involved when the XPath expression is compiled. In the jargon of the spec, any type mentioned in an XPath expression needs to be defined in the static context of the expression. In the case of XPath (as distinct from XSLT and XQuery) there's no "import schema" syntax to achieve that, it's up to the API design to work out how the static context is established (and there's no standard API for XPath 2.0 yet). StandaloneContext is (one) Saxon implementation of a static context for XPath expressions, and it makes every schema known to the Configuration part of the static context for the expression.
 
To demonstrate this further, change your schema so that nameType is a restriction of xs:NMTOKEN, and change your XPath expression to //element(*, xs:NMTOKEN). The element is now selected. That's because xs:NMTOKEN, unlike sh:nameType, is a built-in type and you don't need to do anything special to make it available in the static context.
 
To make this work, I changed the source XML to use the relative path "test.xsd" to refer to the schema, and I changed the source program to say
 
new StreamSource( new File( "c:/MyJava/users/costello/test.xml" ) )
 
rather than
 
new StreamSource( new FileInputStream( "c:/MyJava/users/costello/test.xml" ) )
 
The trouble about supplying a FileInputStream is that the original location of the source document isn't known, and the system then can't resolve a relative URI referring to the schema. However, this is an aside to your main question.
 
As a further demonstration, change the Configuration to set Validation.SKIP, and the element is no longer selected, because it is no longer an instance of xs:NMTOKEN.
 
In summary, there are two separate things going on: you have to tell the XPath processor at compile time where to find any schema types referenced in the expression, and you have to tell the document loader where to find any schema definitions used for validating the source. Loading the schema into the Configuration kills both these birds with one stone.
 
Finally, XOM. At present, only the TinyTree supports schema-aware processing, that is, this is the only implementation of the data model that can currently hold the type annotations that result from schema validation. I'm sure other implementations will come in time, but that's the situation today.
 
I hope this helps!
 
Michael Kay


From: [hidden email] [mailto:[hidden email]] On Behalf Of Cass Costello
Sent: 05 November 2005 18:43
To: [hidden email]
Subject: [saxon] A couple beginner's issues with schema-aware processing

Hello, all. 

First, thanks to everyone who contributes to this list, and congratulations, Mike, on the CR milestones and 8.6. You probably need a vacation. :)

I grabbed the eval version of Saxon-SA and have been attempting to write some code to wrap my head around the mechanics of schema-aware processing.  Specfically, I'm interested in type-based xpath expressions, and have constructed a couple tests that should a) parse an xml stream into a supported object model ( I'm playing with tinytrees and XOM ), and b) execute an xpath expression that should return an element of a specific schema type.

I've bumbed into 2 issues.  First, though I've specified a schemaLocation in my test xml, the schema is not auto-loaded during the parsing process.  I see "unknown type" errors unless I manually load the associated schema via a SchemaAwareConfiguration.  Once loaded, however, everything works as expected.

Test case...

    public void testSaxonXpathStuff() throws Exception {
       
        SchemaAwareConfiguration config = new SchemaAwareConfiguration();
       
        //why is this necessary given xsi:schemaLocation in instance xml?
        File testSchema = new File( "test/resources/test.xsd" );
        config.addSchemaSource( new StreamSource( testSchema ) );
       
        config.setSchemaValidationMode( Validation. STRICT );    
       
        XPathFactory xpf = XPathFactory.newInstance( NamespaceConstant.OBJECT_MODEL_SAXON );
        XPath xpathObj = xpf.newXPath();
        XPathEvaluator xpe = ( XPathEvaluator) xpathObj;
        StandaloneContext saCtx = new StandaloneContext( config );
       
        saCtx.declareNamespace( "sh", "uri://www.test.com " );
        xpe.setStaticContext( saCtx );
       
        XPathExpression xpathObjExp = xpathObj.compile( (String) "//element( *, sh:nameType )[1]" );   
       
        NodeInfo node = TinyBuilder.build( new StreamSource( new FileInputStream( "test/resources/test.xml" ) ),
                null,
                config );
       
        String result = (String) xpathObjExp.evaluate( node, XPathConstants.STRING );  
       
        assertEquals( "Cass", result );
    }

Second, moving the code from using tinytress to XOM objects results in the expression returning nothing at all.  I assume that there's some step I'm missing around parsing, but I don't know where to go next.  Any help or insight would be apprecated.

    public void testXomXpathStuff() throws Exception {
       
        XMLReader xerces = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
        xerces.setFeature("<A onclick="return top.js.OpenExtLink(window,event,this)" href="http://apache.org/xml/features/validation/schema" target=_blank> http://apache.org/xml/features/validation/schema ", true);
       
        Builder parser = new Builder(xerces, true);
        Document doc = parser.build("test/resources/test.xml" );
       
        SchemaAwareConfiguration config = new SchemaAwareConfiguration();

                  //why is this necessary given xsi:schemaLocation in instance xml?
        File testSchema = new File( "test/resources/test.xsd" );
        config.addSchemaSource( new StreamSource( testSchema ) );

        config.setSchemaValidationMode( Validation.STRICT );    
       
        XPathFactory xpf = XPathFactory.newInstance( NamespaceConstant.OBJECT_MODEL_XOM );
        XPath xpathObj = xpf.newXPath();
        XPathEvaluator xpe = ( XPathEvaluator) xpathObj;
        StandaloneContext sacont = new StandaloneContext( config );
       
        sacont.declareNamespace( "sh", "uri://www.test.com " );

        xpe.setStaticContext( sacont );
        XPathExpression xpathObjExp = xpathObj.compile( (String) "//element( *, sh:nameType )[1]" );   
       
        String result = (String) xpathObjExp.evaluate( doc, XPathConstants.STRING );
       
        //fails - result is ""
        assertEquals( "Cass", result );
    }

I've attached the xml and xsd. 

Thanks for your time,
-Cass

Reply | Threaded
Open this post in threaded view
|

Re: A couple beginner's issues with schema-aware processing

Elliotte Harold
Michael Kay wrote:

> Finally, XOM. At present, only the TinyTree supports schema-aware
> processing, that is, this is the only implementation of the data model
> that can currently hold the type annotations that result from schema
> validation. I'm sure other implementations will come in time, but that's
> the situation today.
>  

What would be necessary to support this for XOM? Would it be enough to
create special schema aware subclasses of the standard XOM classes and a
NodeFactory that creates these? Or is something else needed?

--
´╗┐Elliotte Rusty Harold  [hidden email]
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: A couple beginner's issues with schema-aware processing

Michael Kay
>
> > Finally, XOM. At present, only the TinyTree supports schema-aware
> > processing, that is, this is the only implementation of the
> data model
> > that can currently hold the type annotations that result
> from schema
> > validation. I'm sure other implementations will come in
> time, but that's
> > the situation today.
> >  
>
> What would be necessary to support this for XOM? Would it be
> enough to
> create special schema aware subclasses of the standard XOM
> classes and a
> NodeFactory that creates these? Or is something else needed?

Since Saxon creates wrapper objects around the XOM node objects anyway, it
could in principle be done without any change to XOM itself, just by holding
the type annotation in the wrapper. However, at present in Saxon validation
is always done as part of the operation of tree construction, so it would
require a change in the way the validation pipeline works internally. The
internal workings to achieve this are probably not that far removed from the
Saxon implementation of javax.xml.validation.ValidationHandler interface.

Subtyping the XOM nodes to hold a type annotation would be cleaner in many
ways (though the use of subtyping has the problem that it might conflict
with other uses of subtyping: it would be better if XOM allowed an
extensible set of data to be held at each node without requiring a subclass
to be defined). I think one would want to link this to a publicly-accessible
API for access to the schema itself. Saxon has such an API but it's designed
more for internal use than as a stable public API.

Michael Kay
http://www.saxonica.com/




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: A couple beginner's issues with schema-aware processing

Cass Costello
In reply to this post by Michael Kay
>> ...but the source document isn't involved when the XPath expression is compiled...

Ah, of course!  I'm now feeling a little sheepish. :)  Thanks for walking me through that, for pointing out the FileInputStream mistake, and for the quick response.

>> Finally, XOM. At present, only the TinyTree supports schema-aware processing...

Bummer, but good to know.  Ok, I'm off to delve deeper...

Again, thank you,
Cass



On 11/5/05, Michael Kay < [hidden email]> wrote:
To get your first example to run I had to remove a trailing space in the namespace URI
 
sacont.declareNamespace( "sh", "uri://www.test.com " );
I don't think that's particularly relevant to your questions, though it's an interesting debating topic.
 
When I remove the config.addSchemaSource() call, I get the error
 
No schema has been imported for namespace 'uri://www.test.com'
 
This is because your XPath expression contains a reference to a type defined in this schema, sh:nameType. The source document will still be validated using the schema identified in its xsi:schemaLocation attribute, but the source document isn't involved when the XPath expression is compiled. In the jargon of the spec, any type mentioned in an XPath expression needs to be defined in the static context of the expression. In the case of XPath (as distinct from XSLT and XQuery) there's no "import schema" syntax to achieve that, it's up to the API design to work out how the static context is established (and there's no standard API for XPath 2.0 yet). StandaloneContext is (one) Saxon implementation of a static context for XPath expressions, and it makes every schema known to the Configuration part of the static context for the expression.
 
To demonstrate this further, change your schema so that nameType is a restriction of xs:NMTOKEN, and change your XPath expression to //element(*, xs:NMTOKEN). The element is now selected. That's because xs:NMTOKEN, unlike sh:nameType, is a built-in type and you don't need to do anything special to make it available in the static context.
 
To make this work, I changed the source XML to use the relative path "test.xsd" to refer to the schema, and I changed the source program to say
 
new StreamSource( new File( "c:/MyJava/users/costello/test.xml" ) )
 
rather than
 
new StreamSource( new FileInputStream( "c:/MyJava/users/costello/test.xml" ) )
 
The trouble about supplying a FileInputStream is that the original location of the source document isn't known, and the system then can't resolve a relative URI referring to the schema. However, this is an aside to your main question.
 
As a further demonstration, change the Configuration to set Validation.SKIP, and the element is no longer selected, because it is no longer an instance of xs:NMTOKEN.
 
In summary, there are two separate things going on: you have to tell the XPath processor at compile time where to find any schema types referenced in the expression, and you have to tell the document loader where to find any schema definitions used for validating the source. Loading the schema into the Configuration kills both these birds with one stone.
 
Finally, XOM. At present, only the TinyTree supports schema-aware processing, that is, this is the only implementation of the data model that can currently hold the type annotations that result from schema validation. I'm sure other implementations will come in time, but that's the situation today.
 
I hope this helps!
 
Michael Kay
<a href="http://www.saxonica.com/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.saxonica.com/


From: [hidden email] [mailto:[hidden email]] On Behalf Of Cass Costello
Sent: 05 November 2005 18:43
To: [hidden email]
Subject: [saxon] A couple beginner's issues with schema-aware processing

Hello, all. 

First, thanks to everyone who contributes to this list, and congratulations, Mike, on the CR milestones and 8.6. You probably need a vacation. :)

I grabbed the eval version of Saxon-SA and have been attempting to write some code to wrap my head around the mechanics of schema-aware processing.  Specfically, I'm interested in type-based xpath expressions, and have constructed a couple tests that should a) parse an xml stream into a supported object model ( I'm playing with tinytrees and XOM ), and b) execute an xpath expression that should return an element of a specific schema type.

I've bumbed into 2 issues.  First, though I've specified a schemaLocation in my test xml, the schema is not auto-loaded during the parsing process.  I see "unknown type" errors unless I manually load the associated schema via a SchemaAwareConfiguration.  Once loaded, however, everything works as expected.

Test case...

    public void testSaxonXpathStuff() throws Exception {
       
        SchemaAwareConfiguration config = new SchemaAwareConfiguration();
       
        //why is this necessary given xsi:schemaLocation in instance xml?
        File testSchema = new File( "test/resources/test.xsd" );
        config.addSchemaSource( new StreamSource( testSchema ) );
       
        config.setSchemaValidationMode( Validation. STRICT );    
       
        XPathFactory xpf = XPathFactory.newInstance( NamespaceConstant.OBJECT_MODEL_SAXON );
        XPath xpathObj = xpf.newXPath();
        XPathEvaluator xpe = ( XPathEvaluator) xpathObj;
        StandaloneContext saCtx = new StandaloneContext( config );
       
        saCtx.declareNamespace( "sh", "uri://www.test.com " );
        xpe.setStaticContext( saCtx );
       
        XPathExpression xpathObjExp = xpathObj.compile( (String) "//element( *, sh:nameType )[1]" );   
       
        NodeInfo node = TinyBuilder.build( new StreamSource( new FileInputStream( "test/resources/test.xml" ) ),
                null,
                config );
       
        String result = (String) xpathObjExp.evaluate( node, XPathConstants.STRING );  
       
        assertEquals( "Cass", result );
    }

Second, moving the code from using tinytress to XOM objects results in the expression returning nothing at all.  I assume that there's some step I'm missing around parsing, but I don't know where to go next.  Any help or insight would be apprecated.

    public void testXomXpathStuff() throws Exception {
       
        XMLReader xerces = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
        xerces.setFeature("<a href="http://apache.org/xml/features/validation/schema" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"> http://apache.org/xml/features/validation/schema ", true);
       
        Builder parser = new Builder(xerces, true);
        Document doc = parser.build("test/resources/test.xml" );
       
        SchemaAwareConfiguration config = new SchemaAwareConfiguration();

                  //why is this necessary given xsi:schemaLocation in instance xml?
        File testSchema = new File( "test/resources/test.xsd" );
        config.addSchemaSource( new StreamSource( testSchema ) );

        config.setSchemaValidationMode( Validation.STRICT );    
       
        XPathFactory xpf = XPathFactory.newInstance( NamespaceConstant.OBJECT_MODEL_XOM );
        XPath xpathObj = xpf.newXPath();
        XPathEvaluator xpe = ( XPathEvaluator) xpathObj;
        StandaloneContext sacont = new StandaloneContext( config );
       
        sacont.declareNamespace( "sh", "uri://www.test.com " );

        xpe.setStaticContext( sacont );
        XPathExpression xpathObjExp = xpathObj.compile( (String) "//element( *, sh:nameType )[1]" );   
       
        String result = (String) xpathObjExp.evaluate( doc, XPathConstants.STRING );
       
        //fails - result is ""
        assertEquals( "Cass", result );
    }

I've attached the xml and xsd. 

Thanks for your time,
-Cass


Reply | Threaded
Open this post in threaded view
|

Re: A couple beginner's issues with schema-aware processing

frans.englich (Bugzilla)
In reply to this post by Michael Kay
On Sunday 06 November 2005 14:20, Michael Kay wrote:

> > > Finally, XOM. At present, only the TinyTree supports schema-aware
> > > processing, that is, this is the only implementation of the
> >
> > data model
> >
> > > that can currently hold the type annotations that result
> >
> > from schema
> >
> > > validation. I'm sure other implementations will come in
> >
> > time, but that's
> >
> > > the situation today.
> >
> > What would be necessary to support this for XOM? Would it be
> > enough to
> > create special schema aware subclasses of the standard XOM
> > classes and a
> > NodeFactory that creates these? Or is something else needed?
>
> Since Saxon creates wrapper objects around the XOM node objects anyway,

From what I can tell, Saxon connects with DOM by the same technique: wrapper
objects. I'm wondering, isn't that a significant performance penalty? Doesn't
it cost an object per used/referenced node, and thus allocation costs(CPU &
memory) and indirection overhead?

I'm not saying there is a significant performance penalty, I don't want to be
the one cheering to premature optimization, I'm interested in whatever
experiences/thoughts there is on this.


Cheers,

                Frans


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: A couple beginner's issues with schema-aware processing

Michael Kay
> From what I can tell, Saxon connects with DOM by the same
> technique: wrapper
> objects. I'm wondering, isn't that a significant performance
> penalty? Doesn't
> it cost an object per used/referenced node, and thus
> allocation costs(CPU & memory) and indirection overhead?
>
> I'm not saying there is a significant performance penalty, I
> don't want to be
> the one cheering to premature optimization, I'm interested in
> whatever
> experiences/thoughts there is on this.

Using a DOM with Saxon has always been far less efficient than using Saxon's
native tree implementation. In fact I think only a small amount of this
inefficiency comes from the use of wrapper objects, most comes from the
inefficient representation of namespace information in DOM, and from the
cumbersome navigation needed to test whether nodes are in document order.

I don't actually have any recent comparative measures of a given task with
different tree models, it would be interesting to collect this data.

Michael Kay
http://www.saxonica.com/




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: A couple beginner's issues with schema-aware processing

Wolfgang Hoschek-2
In reply to this post by Cass Costello
>> From what I can tell, Saxon connects with DOM by the same
>> technique: wrapper
>> objects. I'm wondering, isn't that a significant performance
>> penalty? Doesn't
>> it cost an object per used/referenced node, and thus
>> allocation costs(CPU & memory) and indirection overhead?
>>
>> I'm not saying there is a significant performance penalty, I
>> don't want to be
>> the one cheering to premature optimization, I'm interested in
>> whatever
>> experiences/thoughts there is on this.
>>
>
> Using a DOM with Saxon has always been far less efficient than  
> using Saxon's
> native tree implementation. In fact I think only a small amount of  
> this
> inefficiency comes from the use of wrapper objects, most comes from  
> the
> inefficient representation of namespace information in DOM, and  
> from the
> cumbersome navigation needed to test whether nodes are in document  
> order.

I can confirm this. Saxon with XOM is wildly more efficient than DOM  
according to my measurements (and certainly competitive with tinytree  
with some give and take depending on the usecase). Part of the reason  
is that the DOM wrappers and iterators are not nearly as optimized as  
the ones for XOM. The cost of using wrapper objects is measurable but  
not significant given proper implementation. Other factors play a  
*much* more prominent role, e.g. namespace and string handling as  
well as model conversions via SAX or STAX, i.e. serialization and  
deserialization as exemplified by http://www.ggf.org/GGF15/ 
presentations/wsPerform_hoschek.pdf

Wolfgang.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: A couple beginner's issues with schema-aware processing

Wolfgang Hoschek
In reply to this post by Cass Costello
>> From what I can tell, Saxon connects with DOM by the same
>> technique: wrapper
>> objects. I'm wondering, isn't that a significant performance
>> penalty? Doesn't
>> it cost an object per used/referenced node, and thus
>> allocation costs(CPU & memory) and indirection overhead?
>>
>> I'm not saying there is a significant performance penalty, I
>> don't want to be
>> the one cheering to premature optimization, I'm interested in
>> whatever
>> experiences/thoughts there is on this.
>>
>>
>
> Using a DOM with Saxon has always been far less efficient than  
> using Saxon's
> native tree implementation. In fact I think only a small amount of  
> this
> inefficiency comes from the use of wrapper objects, most comes from  
> the
> inefficient representation of namespace information in DOM, and  
> from the
> cumbersome navigation needed to test whether nodes are in document  
> order.
>

I can confirm this. Saxon with XOM is wildly more efficient than DOM  
according to my measurements (and certainly competitive with tinytree  
with some give and take depending on the usecase). Part of the reason  
is that the DOM wrappers and iterators are not nearly as optimized as  
the ones for XOM. The cost of using wrapper objects is measurable but  
not significant given proper implementation. Other factors play a  
*much* more prominent role, e.g. namespace and string handling as  
well as model conversions via SAX or STAX, i.e. serialization and  
deserialization as exemplified by http://www.ggf.org/GGF15/ 
presentations/wsPerform_hoschek.pdf

Wolfgang.



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help