URIResolver exception with xsl:include and Unicode characters in path

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

URIResolver exception with xsl:include and Unicode characters in path

Clemens Uhlenhut

Hi group!

 

I’m using Java Saxon-HE 9.6.0.7 (by JNI) to do XSL transformations (and Schematron validation). Whenever there is a stylesheet that uses xsl:include with a relative path and the base path of the stylesheet contains special Unicode characters the URIResolver throws an exception.

 

This means I can create the source and transformer objects providing the base path (with Unicode characters like Ç …) successfully. But when I run the transformation using transform() the URIResolver exception is thrown. If I take a different stylesheet at the same path but without any xsl:include the transformation runs fine.

 

Is there something I can do about this except providing an own URIResolver which I want to avoid?

Do you need some more information?  

 

I was looking for any previous thread answering my question about the URIResolver exceptions but didn’t find any related messages.

 

Kind regards and thanks in advance!

Clemens

 


------------------------------------------------------------------------------

_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: URIResolver exception with xsl:include and Unicode characters in path

Earl Hood-3
On Thu, Oct 1, 2015 at 11:15 AM, Clemens Uhlenhut wrote:

> I’m using Java Saxon-HE 9.6.0.7 (by JNI) to do XSL transformations (and
> Schematron validation). Whenever there is a stylesheet that uses xsl:include
> with a relative path and the base path of the stylesheet contains special
> Unicode characters the URIResolver throws an exception.

In your <xsl:include> statement, are you using a valid URI?
Non-ASCII characters will need to be percent encoded.

--ewh

------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: URIResolver exception with xsl:include and Unicode characters in path

Clemens Uhlenhut
Hi Earl,

this doesn't have any impact. I tested it with included stylesheets where the file name only had plain ASCII characters. Like <xsl:include href="other.xsl"/>

Clemens


-----Original Message-----
From: Earl Hood [mailto:[hidden email]]
Sent: Thursday, October 01, 2015 6:50 PM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] URIResolver exception with xsl:include and Unicode characters in path

On Thu, Oct 1, 2015 at 11:15 AM, Clemens Uhlenhut wrote:

> I’m using Java Saxon-HE 9.6.0.7 (by JNI) to do XSL transformations
> (and Schematron validation). Whenever there is a stylesheet that uses
> xsl:include with a relative path and the base path of the stylesheet
> contains special Unicode characters the URIResolver throws an exception.

In your <xsl:include> statement, are you using a valid URI?
Non-ASCII characters will need to be percent encoded.

--ewh

------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/ [hidden email] https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: URIResolver exception with xsl:include and Unicode characters in path

Michael Kay
In reply to this post by Clemens Uhlenhut
It’s not clear from your description exactly what you are doing and exactly what is failing, but generally things can fail (but won’t always fail) if you fail to percent-encode special characters in the URIs you are using. It’s a pretty murky area, and the devil is in the detail. If you want to provide a reproducible example showing the failure, we’ll be happy to take a look at it.

I’m a little surprised by the precise symptoms you describe, but as I say, the details are murky.

Michael Kay
Saxonica


On 1 Oct 2015, at 17:15, Clemens Uhlenhut <[hidden email]> wrote:

Hi group!
 
I’m using Java Saxon-HE 9.6.0.7 (by JNI) to do XSL transformations (and Schematron validation). Whenever there is a stylesheet that uses xsl:include with a relative path and the base path of the stylesheet contains special Unicode characters the URIResolver throws an exception.
 
This means I can create the source and transformer objects providing the base path (with Unicode characters like Ç …) successfully. But when I run the transformation using transform() the URIResolver exception is thrown. If I take a different stylesheet at the same path but without any xsl:include the transformation runs fine.
 
Is there something I can do about this except providing an own URIResolver which I want to avoid?
Do you need some more information?  
 
I was looking for any previous thread answering my question about the URIResolver exceptions but didn’t find any related messages.
 
Kind regards and thanks in advance!
Clemens
 
------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------

_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: URIResolver exception with xsl:include and Unicode characters in path

Clemens Uhlenhut

Thanks for all the answers so far!

 

To make it (hopefully) more clear I tried to simplify the steps now and did some more debugging. What I have now is:

 

1.       An input XML which is located locally at a path with special Unicode characters.

2.       A stylesheet that is at some other path with no special characters (to reduce the number of possible error sources).

3.       I create “java/io/FileInputStream” objects for the input XML and stylesheet paths and use the original local path (not URL encoded) to initialize them (I noticed if I pass the paths URL encoded it doesn’t work also for paths with no Unicode characters).

4.       For the input XML and the stylesheet I create “javax/xml/transform/stream/StreamSource” objects. I initialize both in the same way, using the FileInputStream object as first parameter and the now URL encoded (<a href="file:///\\">file:/// with percent encoded characters) path as the second parameter for the constructor.

5.       I create a “javax/xml/transform/stream/StreamResult” object and initialize if with the URL encoded path for the output document.

6.       A transformer object is created using the “javax/xml/transform/TransformerFactory” and the stream object for the stylesheet.

7.       On this transformer object I call transform() from “javax/xml/transform/Transformer” using the stream objects from the input XML and for the output document.

 

This produces a “java.io.FileNotFoundException” saying that it can’t find the file with the path as it is given in the StreamResult object from step 5. The interesting part here is that the error message prints the path with the question mark symbol and not URL encoded although I created the object using the URL encoded path…

 

Anyone who had a similar case before?

 

Regards

Clemens

 

 

From: Michael Kay [mailto:[hidden email]]
Sent: Thursday, October 01, 2015 7:44 PM
To: Mailing list f
or the SAXON XSLT and XQuery processor
Subject: Re: [saxon] URIResolver exception with xsl:include and Unicode characters in path

 

It’s not clear from your description exactly what you are doing and exactly what is failing, but generally things can fail (but won’t always fail) if you fail to percent-encode special characters in the URIs you are using. It’s a pretty murky area, and the devil is in the detail. If you want to provide a reproducible example showing the failure, we’ll be happy to take a look at it.

 

I’m a little surprised by the precise symptoms you describe, but as I say, the details are murky.

 

Michael Kay

Saxonica

 

 

On 1 Oct 2015, at 17:15, Clemens Uhlenhut <[hidden email]> wrote:

 

Hi group!

 

I’m using Java Saxon-HE 9.6.0.7 (by JNI) to do XSL transformations (and Schematron validation). Whenever there is a stylesheet that uses xsl:include with a relative path and the base path of the stylesheet contains special Unicode characters the URIResolver throws an exception.

 

This means I can create the source and transformer objects providing the base path (with Unicode characters like Ç …) successfully. But when I run the transformation using transform() the URIResolver exception is thrown. If I take a different stylesheet at the same path but without any xsl:include the transformation runs fine.

 

Is there something I can do about this except providing an own URIResolver which I want to avoid?

Do you need some more information?  

 

I was looking for any previous thread answering my question about the URIResolver exceptions but didn’t find any related messages.

 

Kind regards and thanks in advance!

Clemens

 

------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at
http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help

 


------------------------------------------------------------------------------

_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: URIResolver exception with xsl:include and Unicode characters in path

Earl Hood-3
On Fri, Oct 2, 2015 at 7:50 AM, Clemens Uhlenhut wrote:

> To make it (hopefully) more clear I tried to simplify the steps now and did
> some more debugging. What I have now is:
...[snip]...

The extra details help, but having some real code to show would be
better...

> This produces a “java.io.FileNotFoundException” saying that it can’t find
> the file with the path as it is given in the StreamResult object from step
> 5. The interesting part here is that the error message prints the path with
> the question mark symbol and not URL encoded although I created the object
> using the URL encoded path…

The question marks indicate that a character encoding problem could be
occurring.

When coding in Java, many I/O related operations support the ability to
specify the encoding, and you may need to do use them.  Not doing so,
Java fallbacks to the your system default, which can cause unexpected
problems if your data does not match the system default.

See the following for more information about the default locale settings
wrt Java:

https://stackoverflow.com/questions/8809098/how-do-i-set-the-default-locale-for-my-jvm

If you are unable to resolve the problem, you will need to provide a
working example that others can replicate.

--ewh

------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: URIResolver exception with xsl:include and Unicode characters in path

Clemens Uhlenhut
Thank you Earl!

I was also reading some hints and explanations about the locale settings in the JVM but I didn't try to set them until now. Before I do so I would like to understand why it is even necessary to set the locale if I pass in the file name as an UTF-16 string and, in this case Windows, is also using UTF-16 in the file system APIs.

In some way it seems crazy to me to set a locale even if I provide Unicode strings? Not related to Saxon, I guess Java is the crazy part here.

Regarding providing the source code. "Unfortunately" I'm using the JNI interfaces and they produce a lot of code to call a single Java method. Most of the code is also encapsulated in my own helper classes. Not that I don't want to share the code, it is more that there are just too many lines and I guess nobody would take a look anyways in this case. So I provided the summary what objects I use in which way in my previous post.

Kind regards
Clemens



-----Original Message-----
From: Earl Hood [mailto:[hidden email]]
Sent: Friday, October 02, 2015 5:06 PM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] URIResolver exception with xsl:include and Unicode characters in path

On Fri, Oct 2, 2015 at 7:50 AM, Clemens Uhlenhut wrote:

> To make it (hopefully) more clear I tried to simplify the steps now
> and did some more debugging. What I have now is:
...[snip]...

The extra details help, but having some real code to show would be better...

> This produces a “java.io.FileNotFoundException” saying that it can’t
> find the file with the path as it is given in the StreamResult object
> from step 5. The interesting part here is that the error message
> prints the path with the question mark symbol and not URL encoded
> although I created the object using the URL encoded path…

The question marks indicate that a character encoding problem could be occurring.

When coding in Java, many I/O related operations support the ability to specify the encoding, and you may need to do use them.  Not doing so, Java fallbacks to the your system default, which can cause unexpected problems if your data does not match the system default.

See the following for more information about the default locale settings wrt Java:

https://stackoverflow.com/questions/8809098/how-do-i-set-the-default-locale-for-my-jvm

If you are unable to resolve the problem, you will need to provide a working example that others can replicate.

--ewh

------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/ [hidden email] https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: URIResolver exception with xsl:include and Unicode characters in path

Earl Hood
On Mon, Oct 5, 2015 at 9:32 AM, Clemens Uhlenhut wrote:

> I was also reading some hints and explanations about the locale
> settings in the JVM but I didn't try to set them until now. Before I
> do so I would like to understand why it is even necessary to set the
> locale if I pass in the file name as an UTF-16 string and, in this
> case Windows, is also using UTF-16 in the file system APIs.

A bit off-topic, but I have seen encoding issues on multiple occassions
when dealing with Java and (XML) textual data.

Internally, all Java strings are stored in a modified UTF-16 format, but
when creating strings from an input stream (i.e. octet data), the octets
must be decoded since textual data can be encoding in a variety of
formats.  The same applies when writing strings to an output stream
where an encoder is applied.

Remember, data is basically a bitstream that is typically read in an
octet (aka byte) at a time, where programs have to decode/encode as
needed when treating the data as text.

If you look at the java.io classes that support textual reading and
writing, and even java.lang.String, you will see constructors and
methods that support encoding specifications.  When using the methods
w/o encoding parameter, Java will default to the system default local.

You must use the methods that support an encoding parameter when you
know that the encoding of the data may not always match the system
default.  Generally, with XML data, this is how I code things since it
is common in US-based locales where the system default is ISO-8859-1,
but most XML data is in one of the UTF formats.

When working with the Java APIs, all it takes is one actor to create a
Reader or Writer w/o designating the encoding to muck things up.

Take your filename scenario, where the filename is in an attribute value
contained in XML data set.  That filename, along with all the XML data,
is subject to textual decoding when the parser reads the raw octets from
the input stream.  If you instantiate the XML parser with a Reader, but
failed to specify the proper character encoding, all the text will be
decoded based on your default local setting and not the encoding
specified in the XML file.

Some XML parsers may be smart enough to auto-detect the encoding from
the <?xml> declaration, or byte-order-mark when given a Java
java.io.File instance since the parser can look ahead, then reseek back
to beginning to do a full parse.

However, when giving it a java.io.Reader, seeking is not available, so
the parser has to rely on the underly Reader instance in decoding the
octet stream.

In my experience when using the Java XML APIs, I have had to use my own
encoding guesser and reader class so I can ensure that some actor in the
processing chain does not fall back to the default locale.


> In some way it seems crazy to me to set a locale even if I provide
> Unicode strings? Not related to Saxon, I guess Java is the crazy part
> here.

But how is Java to know what is the proper encoding?  As noted above,
everying is a sequence of bytes and you have to tell Java how to decode
that sequence into a sequence of characters.

Depending on your application and the Java XML APIs you use, you may
have to determine the encoding yourself and then pass that to the XML
parser.

--ewh

------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: URIResolver exception with xsl:include and Unicode characters in path

Clemens Uhlenhut
Earl, thank you for your answer.

> Take your filename scenario, where the filename is in an attribute value
contained in XML data set.

No, this is not the case here. I create a
"javax/xml/transform/stream/StreamResult" object and initialize if with the
URL encoded path for the output document. This path comes from the
application and is not taken from any XML data. Then I call transform() from
"javax/xml/transform/Transformer" using the stream objects from the input
XML and for the output document. This produces a
"java.io.FileNotFoundException" saying that it can't find the output file
with the path as it is given in the StreamResult object from above. The
interesting part here is that the error message, as reported by the Java
exception, prints the path with the question mark symbol and not URL encoded
although I created the object using the URL encoded path.

I'm aware of the stumble blocks related to XML streams and encoding.
However, I believe that this is not the problem here.

Regards
Clemens



-----Original Message-----
From: Earl Hood [mailto:[hidden email]]
Sent: Monday, October 05, 2015 7:26 PM
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] URIResolver exception with xsl:include and Unicode
characters in path

On Mon, Oct 5, 2015 at 9:32 AM, Clemens Uhlenhut wrote:

> I was also reading some hints and explanations about the locale
> settings in the JVM but I didn't try to set them until now. Before I
> do so I would like to understand why it is even necessary to set the
> locale if I pass in the file name as an UTF-16 string and, in this
> case Windows, is also using UTF-16 in the file system APIs.

A bit off-topic, but I have seen encoding issues on multiple occassions when
dealing with Java and (XML) textual data.

Internally, all Java strings are stored in a modified UTF-16 format, but
when creating strings from an input stream (i.e. octet data), the octets
must be decoded since textual data can be encoding in a variety of formats.
The same applies when writing strings to an output stream where an encoder
is applied.

Remember, data is basically a bitstream that is typically read in an octet
(aka byte) at a time, where programs have to decode/encode as needed when
treating the data as text.

If you look at the java.io classes that support textual reading and writing,
and even java.lang.String, you will see constructors and methods that
support encoding specifications.  When using the methods w/o encoding
parameter, Java will default to the system default local.

You must use the methods that support an encoding parameter when you know
that the encoding of the data may not always match the system default.
Generally, with XML data, this is how I code things since it is common in
US-based locales where the system default is ISO-8859-1, but most XML data
is in one of the UTF formats.

When working with the Java APIs, all it takes is one actor to create a
Reader or Writer w/o designating the encoding to muck things up.

Take your filename scenario, where the filename is in an attribute value
contained in XML data set.  That filename, along with all the XML data, is
subject to textual decoding when the parser reads the raw octets from the
input stream.  If you instantiate the XML parser with a Reader, but failed
to specify the proper character encoding, all the text will be decoded based
on your default local setting and not the encoding specified in the XML
file.

Some XML parsers may be smart enough to auto-detect the encoding from the
<?xml> declaration, or byte-order-mark when given a Java java.io.File
instance since the parser can look ahead, then reseek back to beginning to
do a full parse.

However, when giving it a java.io.Reader, seeking is not available, so the
parser has to rely on the underly Reader instance in decoding the octet
stream.

In my experience when using the Java XML APIs, I have had to use my own
encoding guesser and reader class so I can ensure that some actor in the
processing chain does not fall back to the default locale.


> In some way it seems crazy to me to set a locale even if I provide
> Unicode strings? Not related to Saxon, I guess Java is the crazy part
> here.

But how is Java to know what is the proper encoding?  As noted above,
everying is a sequence of bytes and you have to tell Java how to decode that
sequence into a sequence of characters.

Depending on your application and the Java XML APIs you use, you may have to
determine the encoding yourself and then pass that to the XML parser.

--ewh

----------------------------------------------------------------------------
--
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: URIResolver exception with xsl:include and Unicode characters in path

Earl Hood-3
On Mon, Oct 5, 2015 at 3:40 PM, Clemens Uhlenhut wrote:

> No, this is not the case here. I create a
> "javax/xml/transform/stream/StreamResult" object and initialize if with the
> URL encoded path for the output document. This path comes from the
> application and is not taken from any XML data. Then I call transform() from
> "javax/xml/transform/Transformer" using the stream objects from the input
> XML and for the output document. This produces a
> "java.io.FileNotFoundException" saying that it can't find the output file
> with the path as it is given in the StreamResult object from above. The
> interesting part here is that the error message, as reported by the Java
> exception, prints the path with the question mark symbol and not URL encoded
> although I created the object using the URL encoded path.

With what you mention above, you should be able to create a simple
standalone test case that other can try out without the baggage of the
other parts of your application.

As for the question marks, that could be a result of the character
encoding of the error message itself to your output device, i.e., if
your system locale is not unicode-based, the printing of error text to
your console may not print characters correctly.

The following SO post may be related to what you are experiencing:

https://stackoverflow.com/questions/3072376/how-can-i-open-files-containing-accents-in-java

--ewh

------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help