SAXON does not follow page redirects in schema validation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SAXON does not follow page redirects in schema validation

Costello, Roger L.
Hello Michael,

I want to validate this XML document:

<book>
    <title>Parsing Techniques</title>
    <author>Dick Grune</author>
</book>

On my web site I have this redirect page: http://www.xfront.com/book-xsd-redirect.html. If you click on that link, you will see that you are immediately redirected to book.xsd on my web site.

Here is book-xsd-redirect.html:

<!DOCTYPE HTML>
<html lang="en-US">
    <head>
        <meta charset="UTF-8" />
        <meta http-equiv="refresh" content="1;url=http://www.xfront.com/book.xsd" />
        <script type="text/javascript">
            window.location.href = "http://www.xfront.com/book.xsd"
        </script>
        <title>Page Redirection</title>
    </head>
    <body>
        <!-- Note: don't tell people to `click` the link, just tell them that it is a link. -->
        If you are not redirected automatically, follow the <a href='http://www.xfront.com/book.xsd'>link to book.xsd</a>
    </body>
</html>

I invoked SAXON, giving it book.xml and the URL to the redirect page. I would expect SAXON to understand the redirect and proceed to validate book.xml against book.xsd

However, that's not what happens. SAXON tries to validate book.xml against the redirect page. This error message is produced:

        Error on line 2 of book-xsd-redirect.html:
  Outermost element of schema document must be xs:schema
        Schema processing failed: Outermost element of schema document must be xs:schema

I think that's a bug in SAXON. What do you think?

/Roger

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: SAXON does not follow page redirects in schema validation

Michael Kay
Well, if it’s a bug then it’s a bug well below the Saxon level of the system. Saxon supplies the requested URL to the XML parser, and the XML parser supplies it to the Java run-time library…

How are you actually invoking the validation? If you’re just giving Saxon a file name on the command line, then Saxon passes this to the URIResolver, and the default URIResolver generates a SAXSource containing the URI, which is passed to the XML parser’s XMLReader.parse() method. What happens then depends on which XML parser you are invoking, and is outside Saxon’s control.

I note that the class java.net.HttpUrlConnection has an option setFollowRedirects() which supposedly controls whether redirects are followed or not, and is true by default. But Saxon doesn’t get anywhere near the HttpUrlConnection.

You can of course control in detail how URLs get dereferenced either in the URIResolver, or in the parser’s EntityResolver.

Michael Kay
Saxonica



> On 28 Jun 2015, at 14:45, Costello, Roger L. <[hidden email]> wrote:
>
> Hello Michael,
>
> I want to validate this XML document:
>
> <book>
>    <title>Parsing Techniques</title>
>    <author>Dick Grune</author>
> </book>
>
> On my web site I have this redirect page: http://www.xfront.com/book-xsd-redirect.html. If you click on that link, you will see that you are immediately redirected to book.xsd on my web site.

Actually I get different results on Safari and Firefox. Neither simply shows me book.xsd. I wonder if this has something to do with MIME types.


------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: SAXON does not follow page redirects in schema validation

Costello, Roger L.
Thanks Michael.

> How are you actually invoking the validation? If you’re just giving Saxon a file name on the command line

Yes, that's how I am invoking Saxon, from the command line.

/Roger

-----Original Message-----
From: Michael Kay [mailto:[hidden email]]
Sent: Sunday, June 28, 2015 10:16 AM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] SAXON does not follow page redirects in schema validation

Well, if it’s a bug then it’s a bug well below the Saxon level of the system. Saxon supplies the requested URL to the XML parser, and the XML parser supplies it to the Java run-time library…

How are you actually invoking the validation? If you’re just giving Saxon a file name on the command line, then Saxon passes this to the URIResolver, and the default URIResolver generates a SAXSource containing the URI, which is passed to the XML parser’s XMLReader.parse() method. What happens then depends on which XML parser you are invoking, and is outside Saxon’s control.

I note that the class java.net.HttpUrlConnection has an option setFollowRedirects() which supposedly controls whether redirects are followed or not, and is true by default. But Saxon doesn’t get anywhere near the HttpUrlConnection.

You can of course control in detail how URLs get dereferenced either in the URIResolver, or in the parser’s EntityResolver.

Michael Kay
Saxonica



> On 28 Jun 2015, at 14:45, Costello, Roger L. <[hidden email]> wrote:
>
> Hello Michael,
>
> I want to validate this XML document:
>
> <book>
>    <title>Parsing Techniques</title>
>    <author>Dick Grune</author>
> </book>
>
> On my web site I have this redirect page: http://www.xfront.com/book-xsd-redirect.html. If you click on that link, you will see that you are immediately redirected to book.xsd on my web site.

Actually I get different results on Safari and Firefox. Neither simply shows me book.xsd. I wonder if this has something to do with MIME types.


------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: SAXON does not follow page redirects in schema validation

Michael Kay
In reply to this post by Costello, Roger L.
This ancient thread might also be relevant:

https://sourceforge.net/p/saxon/mailman/message/9283758/

In that case it seems that (at least with one choice of parser) the parser was following the redirect, but was setting the base URI of the retrieved document to the requested URI, not the actual URI.

Michael Kay
Saxonica


> On 28 Jun 2015, at 14:45, Costello, Roger L. <[hidden email]> wrote:
>
> Hello Michael,
>
> I want to validate this XML document:
>
> <book>
>    <title>Parsing Techniques</title>
>    <author>Dick Grune</author>
> </book>
>
> On my web site I have this redirect page: http://www.xfront.com/book-xsd-redirect.html. If you click on that link, you will see that you are immediately redirected to book.xsd on my web site.
>
> Here is book-xsd-redirect.html:
>
> <!DOCTYPE HTML>
> <html lang="en-US">
>    <head>
>        <meta charset="UTF-8" />
>        <meta http-equiv="refresh" content="1;url=http://www.xfront.com/book.xsd" />
>        <script type="text/javascript">
>            window.location.href = "http://www.xfront.com/book.xsd"
>        </script>
>        <title>Page Redirection</title>
>    </head>
>    <body>
>        <!-- Note: don't tell people to `click` the link, just tell them that it is a link. -->
>        If you are not redirected automatically, follow the <a href='http://www.xfront.com/book.xsd'>link to book.xsd</a>
>    </body>
> </html>
>
> I invoked SAXON, giving it book.xml and the URL to the redirect page. I would expect SAXON to understand the redirect and proceed to validate book.xml against book.xsd
>
> However, that's not what happens. SAXON tries to validate book.xml against the redirect page. This error message is produced:
>
> Error on line 2 of book-xsd-redirect.html:
> Outermost element of schema document must be xs:schema
> Schema processing failed: Outermost element of schema document must be xs:schema
>
> I think that's a bug in SAXON. What do you think?
>
> /Roger
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: SAXON does not follow page redirects in schema validation

cmarchand
In reply to this post by Costello, Roger L.

 

 

Hello Roger,

 

your redirect is a javascript redirect, not a http redirect.

So, http document must be parsed, javascript executed, to be redirected to your xsd URL.

 

XML parser will not interpret and execute javascript ; it's not his role.

Http client will not interpret html / xml content, so will not execute javascript ; it's not his role.

Only a browser can interpret - and execute - javascript, and detect the redirect.

 

So, in my opinion, it's not a saxon bug.

If you use a http redirect (302 response code), the  http client *can* follow the redirect. A 302 response can be specified via a .htaccess file (on apache), but it's an Apache Http / http client subject.

 

Christophe

 

 

Le 2015-06-28 15:45, Costello, Roger L. a écrit :

Hello Michael,

I want to validate this XML document:

<book>
    <title>Parsing Techniques</title>
    <author>Dick Grune</author>
</book>

On my web site I have this redirect page: http://www.xfront.com/book-xsd-redirect.html. If you click on that link, you will see that you are immediately redirected to book.xsd on my web site.

Here is book-xsd-redirect.html:

<!DOCTYPE HTML>
<html lang="en-US">
    <head>
        <meta charset="UTF-8" />
        <meta http-equiv="refresh" content="1;url=http://www.xfront.com/book.xsd" />
        <script type="text/javascript">
            window.location.href = "http://www.xfront.com/book.xsd"
        </script>
        <title>Page Redirection</title>
    </head>
    <body>
        <!-- Note: don't tell people to `click` the link, just tell them that it is a link. -->
        If you are not redirected automatically, follow the <a href='http://www.xfront.com/book.xsd'>link to book.xsd</a>
    </body>
</html>

I invoked SAXON, giving it book.xml and the URL to the redirect page. I would expect SAXON to understand the redirect and proceed to validate book.xml against book.xsd

However, that's not what happens. SAXON tries to validate book.xml against the redirect page. This error message is produced:

    Error on line 2 of book-xsd-redirect.html:
      Outermost element of schema document must be xs:schema
    Schema processing failed: Outermost element of schema document must be xs:schema

I think that's a bug in SAXON. What do you think?

/Roger

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: SAXON does not follow page redirects in schema validation

David Carlisle
In reply to this post by Costello, Roger L.
On 28/06/2015 14:45, Costello, Roger L. wrote:

> Hello Michael,
>
> I want to validate this XML document:
>
> <book>
>      <title>Parsing Techniques</title>
>      <author>Dick Grune</author>
> </book>
>
> On my web site I have this redirect page: http://www.xfront.com/book-xsd-redirect.html. If you click on that link, you will see that you are immediately redirected to book.xsd on my web site.
>
> Here is book-xsd-redirect.html:
>
> <!DOCTYPE HTML>
> <html lang="en-US">
>      <head>
>          <meta charset="UTF-8" />
>          <meta http-equiv="refresh" content="1;url=http://www.xfront.com/book.xsd" />
>          <script type="text/javascript">
>              window.location.href = "http://www.xfront.com/book.xsd"
>          </script>
>          <title>Page Redirection</title>
>      </head>
>      <body>
>          <!-- Note: don't tell people to `click` the link, just tell them that it is a link. -->
>          If you are not redirected automatically, follow the <a href='http://www.xfront.com/book.xsd'>link to book.xsd</a>
>      </body>
> </html>
>
> I invoked SAXON, giving it book.xml and the URL to the redirect page. I would expect SAXON to understand the redirect and proceed to validate book.xml against book.xsd
>
> However, that's not what happens. SAXON tries to validate book.xml against the redirect page. This error message is produced:
>
>       Error on line 2 of book-xsd-redirect.html:
>        Outermost element of schema document must be xs:schema
>       Schema processing failed: Outermost element of schema document must be xs:schema
>
> I think that's a bug in SAXON. What do you think?
>
> /Roger

As Liam explained on the other list this is not a bug as there is no
http redirect involved.

You are serving an HTML page that has

         <meta http-equiv="refresh"
content="1;url=http://www.xfront.com/book.xsd" />

so the redirect only happens if the original URL is handled by an HTML
renderer that understands the semantics of the html meta http-equiv
attribute.

If the redirect was handled by sending a 3xx response at the server then
a non html URL handler would have a chance.

David






________________________________


The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is:

Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.



This e-mail has been scanned for all viruses by Microsoft Office 365.

________________________________

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help