Quantcast

Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Eliot Kimber-2
I'm starting the process of implementing localized grouping and sorting in
the context of the DITA Open Toolkit, including implementation of a
dictionary-based collator for Simplified Chinese (using the open-source
CEDICT dictionary).

I want to make sure that I'm taking the most appropriate approach--it's
been more than a decade since I last implemented customized collation
features for Saxon (that was back in the Saxon 6 days).

I think there are two basic approaches I could take:

1. Implement a custom collator as a RuleBasedCollator and then use that
with Saxon through a collation URI specified on xsl:sort and similar.

2. Implement a custom extension function that returns sort keys that will
then collate correctly using the default Unicode collator (e.g., for most
languages the sort key would just return the input string but for
Simplified Chinese, in particular, would return the pinyin transliteration
as found in the dictionary).

I think my best course of action is to implement a custom collator in Java
and then use the Saxon 9.1 form of custom collator URI.

Is my analysis correct? Is there some other option I've overlooked?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 






------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Eliot Kimber-2
I have created a simple extension of the ICU RuleBasedCollator and my
local unit test verifies that it can be used as a Comparator and for
getting sort keys.

However, when I try to use it with Saxon 9.6I get "Failed to instantiate
class org.ditacommunity.i18n.ZhCnAwareCollator".

So I must be failing to implement the expected instantiation method but I
can't figure out what that might be.

Here is my passing unit test for ZhCnAwareCollator:

Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
assertNotNull("No comparator", collator != null);
int result;
result = collator.compare("a", "b");
assertTrue("Compared incorrectly", result == -1);
CollationKey sortKey = collator.getCollationKey("aaa");
CollationKey sortKeyC = collator.getCollationKey("c");
assertNotNull(sortKey);
result = sortKey.compareTo(sortKeyC);
assertEquals("Wrong compare result", result, -1);


How is Saxon doing the class instantiation? I know it's loading the class
because the load failed when I didn't have the ICU4J library in the class
path (my collator is backed by an ICU4J collator).

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 






On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:

>I'm starting the process of implementing localized grouping and sorting in
>the context of the DITA Open Toolkit, including implementation of a
>dictionary-based collator for Simplified Chinese (using the open-source
>CEDICT dictionary).
>
>I want to make sure that I'm taking the most appropriate approach--it's
>been more than a decade since I last implemented customized collation
>features for Saxon (that was back in the Saxon 6 days).
>
>I think there are two basic approaches I could take:
>
>1. Implement a custom collator as a RuleBasedCollator and then use that
>with Saxon through a collation URI specified on xsl:sort and similar.
>
>2. Implement a custom extension function that returns sort keys that will
>then collate correctly using the default Unicode collator (e.g., for most
>languages the sort key would just return the input string but for
>Simplified Chinese, in particular, would return the pinyin transliteration
>as found in the dictionary).
>
>I think my best course of action is to implement a custom collator in Java
>and then use the Saxon 9.1 form of custom collator URI.
>
>Is my analysis correct? Is there some other option I've overlooked?
>
>Thanks,
>
>Eliot
>
>--
>Eliot Kimber
>http://contrext.com
>
>
>
>
>
>
>
>--------------------------------------------------------------------------
>----
>Check out the vibrant tech community on one of the world's most
>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>_______________________________________________
>saxon-help mailing list archived at http://saxon.markmail.org/
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/saxon-help
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Michael Kay
Saxon-EE supports UCA collation URIs as defined in the XPath 3.1 and XSLT 3.0 specs by using the collation support in the ICU library. I've been fairly immersed in that over the last couple of weeks, as it happens, but I'm struggling to remember exactly what's available if you're using plain Saxon-HE. We probably need to change the packaging at some stage because UCA collation URIs are a mandatory feature in XPath 3.1, though we may continue to support them in HE only using what's in the JDK as distinct from using ICU.

I'd be inclined to avoid using collation keys unless you really need them. According to ICU documentation, a direct sort using a collation is supposed to be much more efficient.

It's not clear from your posts what you are doing to register the collation with Saxon. There are many different approaches as the design has evolved over time. There are two collation URI families recognized by Saxon: the UCA collations defined in XPath 3.1 (see the Functions and Operators spec), and the older Saxon collation URIs described here: http://www.saxonica.com/documentation/index.html#!extensibility/config-extend/collation/implementing-collation

There are also several ways of registering your own collation URIs, including Configuration.registerCollation(), Configuration.setCollationURIResolver(), and the <collation> element in the configuration file.

So to answer the question, how is Saxon doing the class instantiation, we need to know rather more about what interfaces you are using. But the likely answer is that it's simply doing Class.newInstance(). [If you really need to, you can register an overload of DynamicLoader with the Configuration, and override the method DynamicLoader.getInstance() to use a different instantiation method]. I think the approach I would recommend, given your description of what you are trying to do, is to instantiate the RuleBasedCollator yourself, wrap it in an instance of net.sf.saxon.expr.sort.SimpleCollation (which implements net.sf.saxon.lib.StringCollator), and register the collation URI with Configuration.registerCollation().

A collation registered as an instance of SimpleCollation probably can't be used in fn:contains() or other substring-matching functions, nor in fn:collation-key(). But it can be used for sorting, which seems to be your main use case, and for equality and ordering comparisons.

Michael Kay
Saxonica


> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]> wrote:
>
> I have created a simple extension of the ICU RuleBasedCollator and my
> local unit test verifies that it can be used as a Comparator and for
> getting sort keys.
>
> However, when I try to use it with Saxon 9.6I get "Failed to instantiate
> class org.ditacommunity.i18n.ZhCnAwareCollator".
>
> So I must be failing to implement the expected instantiation method but I
> can't figure out what that might be.
>
> Here is my passing unit test for ZhCnAwareCollator:
>
> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
> assertNotNull("No comparator", collator != null);
> int result;
> result = collator.compare("a", "b");
> assertTrue("Compared incorrectly", result == -1);
> CollationKey sortKey = collator.getCollationKey("aaa");
> CollationKey sortKeyC = collator.getCollationKey("c");
> assertNotNull(sortKey);
> result = sortKey.compareTo(sortKeyC);
> assertEquals("Wrong compare result", result, -1);
>
>
> How is Saxon doing the class instantiation? I know it's loading the class
> because the load failed when I didn't have the ICU4J library in the class
> path (my collator is backed by an ICU4J collator).
>
> Thanks,
>
> Eliot
>
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
>
>
>
> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>
>> I'm starting the process of implementing localized grouping and sorting in
>> the context of the DITA Open Toolkit, including implementation of a
>> dictionary-based collator for Simplified Chinese (using the open-source
>> CEDICT dictionary).
>>
>> I want to make sure that I'm taking the most appropriate approach--it's
>> been more than a decade since I last implemented customized collation
>> features for Saxon (that was back in the Saxon 6 days).
>>
>> I think there are two basic approaches I could take:
>>
>> 1. Implement a custom collator as a RuleBasedCollator and then use that
>> with Saxon through a collation URI specified on xsl:sort and similar.
>>
>> 2. Implement a custom extension function that returns sort keys that will
>> then collate correctly using the default Unicode collator (e.g., for most
>> languages the sort key would just return the input string but for
>> Simplified Chinese, in particular, would return the pinyin transliteration
>> as found in the dictionary).
>>
>> I think my best course of action is to implement a custom collator in Java
>> and then use the Saxon 9.1 form of custom collator URI.
>>
>> Is my analysis correct? Is there some other option I've overlooked?
>>
>> Thanks,
>>
>> Eliot
>>
>> --
>> Eliot Kimber
>> http://contrext.com
>>
>>
>>
>>
>>
>>
>>
>> --------------------------------------------------------------------------
>> ----
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Eliot Kimber-2
Because this is in the context of the DITA Open Toolkit I don't have the
ability to control the parser configuration, so I'm limited to either
using a collation URI or using a normal URI-accessed Java extension
function.

Here's what I'm currently trying to do:

<xsl:apply-templates select="word">
            <xsl:sort
collation="http://saxon.sf.net/collation?class=org.ditacommunity.i18n.ZhCnA
wareCollator"/>
          </xsl:apply-templates>

In an XSLT 2 transform.

Where ZhCnAwareCollator implements java.util.Comparator and extends
java.text.Collator. The implementation is backed by an ICU
RuleBasedCollator so I can take advantage of the existing ICU usage in the
OT (basically reusing the code the constructs collator rules from
configuration files).

I assumed Saxon was using the java.text.Collator.getInstance(Locale
desiredLocale) method to get an instance as that's how you provide the
locale to use.

If it's just calling Class.newInstance() then how it is it providing the
locale?


I also need to do something that works both with Saxon 9.1 and 9.6+. My
current tests have been with Saxon 9.6 as I'm running my initial tests
through oXygen just to keep things simple.


Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 






On 10/8/16, 6:47 PM, "Michael Kay" <[hidden email]> wrote:

>Saxon-EE supports UCA collation URIs as defined in the XPath 3.1 and XSLT
>3.0 specs by using the collation support in the ICU library. I've been
>fairly immersed in that over the last couple of weeks, as it happens, but
>I'm struggling to remember exactly what's available if you're using plain
>Saxon-HE. We probably need to change the packaging at some stage because
>UCA collation URIs are a mandatory feature in XPath 3.1, though we may
>continue to support them in HE only using what's in the JDK as distinct
>from using ICU.
>
>I'd be inclined to avoid using collation keys unless you really need
>them. According to ICU documentation, a direct sort using a collation is
>supposed to be much more efficient.
>
>It's not clear from your posts what you are doing to register the
>collation with Saxon. There are many different approaches as the design
>has evolved over time. There are two collation URI families recognized by
>Saxon: the UCA collations defined in XPath 3.1 (see the Functions and
>Operators spec), and the older Saxon collation URIs described here:
>http://www.saxonica.com/documentation/index.html#!extensibility/config-ext
>end/collation/implementing-collation
>
>There are also several ways of registering your own collation URIs,
>including Configuration.registerCollation(),
>Configuration.setCollationURIResolver(), and the <collation> element in
>the configuration file.
>
>So to answer the question, how is Saxon doing the class instantiation, we
>need to know rather more about what interfaces you are using. But the
>likely answer is that it's simply doing Class.newInstance(). [If you
>really need to, you can register an overload of DynamicLoader with the
>Configuration, and override the method DynamicLoader.getInstance() to use
>a different instantiation method]. I think the approach I would
>recommend, given your description of what you are trying to do, is to
>instantiate the RuleBasedCollator yourself, wrap it in an instance of
>net.sf.saxon.expr.sort.SimpleCollation (which implements
>net.sf.saxon.lib.StringCollator), and register the collation URI with
>Configuration.registerCollation().
>
>A collation registered as an instance of SimpleCollation probably can't
>be used in fn:contains() or other substring-matching functions, nor in
>fn:collation-key(). But it can be used for sorting, which seems to be
>your main use case, and for equality and ordering comparisons.
>
>Michael Kay
>Saxonica
>
>
>> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]> wrote:
>>
>> I have created a simple extension of the ICU RuleBasedCollator and my
>> local unit test verifies that it can be used as a Comparator and for
>> getting sort keys.
>>
>> However, when I try to use it with Saxon 9.6I get "Failed to instantiate
>> class org.ditacommunity.i18n.ZhCnAwareCollator".
>>
>> So I must be failing to implement the expected instantiation method but
>>I
>> can't figure out what that might be.
>>
>> Here is my passing unit test for ZhCnAwareCollator:
>>
>> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
>> assertNotNull("No comparator", collator != null);
>> int result;
>> result = collator.compare("a", "b");
>> assertTrue("Compared incorrectly", result == -1);
>> CollationKey sortKey = collator.getCollationKey("aaa");
>> CollationKey sortKeyC = collator.getCollationKey("c");
>> assertNotNull(sortKey);
>> result = sortKey.compareTo(sortKeyC);
>> assertEquals("Wrong compare result", result, -1);
>>
>>
>> How is Saxon doing the class instantiation? I know it's loading the
>>class
>> because the load failed when I didn't have the ICU4J library in the
>>class
>> path (my collator is backed by an ICU4J collator).
>>
>> Thanks,
>>
>> Eliot
>>
>> --
>> Eliot Kimber
>> http://contrext.com
>>
>>
>>
>>
>>
>>
>>
>> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>>
>>> I'm starting the process of implementing localized grouping and
>>>sorting in
>>> the context of the DITA Open Toolkit, including implementation of a
>>> dictionary-based collator for Simplified Chinese (using the open-source
>>> CEDICT dictionary).
>>>
>>> I want to make sure that I'm taking the most appropriate approach--it's
>>> been more than a decade since I last implemented customized collation
>>> features for Saxon (that was back in the Saxon 6 days).
>>>
>>> I think there are two basic approaches I could take:
>>>
>>> 1. Implement a custom collator as a RuleBasedCollator and then use that
>>> with Saxon through a collation URI specified on xsl:sort and similar.
>>>
>>> 2. Implement a custom extension function that returns sort keys that
>>>will
>>> then collate correctly using the default Unicode collator (e.g., for
>>>most
>>> languages the sort key would just return the input string but for
>>> Simplified Chinese, in particular, would return the pinyin
>>>transliteration
>>> as found in the dictionary).
>>>
>>> I think my best course of action is to implement a custom collator in
>>>Java
>>> and then use the Saxon 9.1 form of custom collator URI.
>>>
>>> Is my analysis correct? Is there some other option I've overlooked?
>>>
>>> Thanks,
>>>
>>> Eliot
>>>
>>> --
>>> Eliot Kimber
>>> http://contrext.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>--
>>> ----
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>
>>
>>
>>
>>-------------------------------------------------------------------------
>>-----
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>
>
>
>--------------------------------------------------------------------------
>----
>Check out the vibrant tech community on one of the world's most
>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>_______________________________________________
>saxon-help mailing list archived at http://saxon.markmail.org/
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/saxon-help
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Michael Kay
OK thanks.

If you provide a class name to a Saxon collation URI then it has to have a public zero-argument constructor. We don't set a Locale - that's up to you. I think in this scenario the best thing is to implement a simple Comparator which delegates (directly or indirectly) to a RuleBasedCollator.

Michael Kay
Saxonica

> On 9 Oct 2016, at 16:21, Eliot Kimber <[hidden email]> wrote:
>
> Because this is in the context of the DITA Open Toolkit I don't have the
> ability to control the parser configuration, so I'm limited to either
> using a collation URI or using a normal URI-accessed Java extension
> function.
>
> Here's what I'm currently trying to do:
>
> <xsl:apply-templates select="word">
>            <xsl:sort
> collation="http://saxon.sf.net/collation?class=org.ditacommunity.i18n.ZhCnA
> wareCollator"/>
>          </xsl:apply-templates>
>
> In an XSLT 2 transform.
>
> Where ZhCnAwareCollator implements java.util.Comparator and extends
> java.text.Collator. The implementation is backed by an ICU
> RuleBasedCollator so I can take advantage of the existing ICU usage in the
> OT (basically reusing the code the constructs collator rules from
> configuration files).
>
> I assumed Saxon was using the java.text.Collator.getInstance(Locale
> desiredLocale) method to get an instance as that's how you provide the
> locale to use.
>
> If it's just calling Class.newInstance() then how it is it providing the
> locale?
>
>
> I also need to do something that works both with Saxon 9.1 and 9.6+. My
> current tests have been with Saxon 9.6 as I'm running my initial tests
> through oXygen just to keep things simple.
>
>
> Thanks,
>
> Eliot
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
>
>
>
> On 10/8/16, 6:47 PM, "Michael Kay" <[hidden email]> wrote:
>
>> Saxon-EE supports UCA collation URIs as defined in the XPath 3.1 and XSLT
>> 3.0 specs by using the collation support in the ICU library. I've been
>> fairly immersed in that over the last couple of weeks, as it happens, but
>> I'm struggling to remember exactly what's available if you're using plain
>> Saxon-HE. We probably need to change the packaging at some stage because
>> UCA collation URIs are a mandatory feature in XPath 3.1, though we may
>> continue to support them in HE only using what's in the JDK as distinct
>> from using ICU.
>>
>> I'd be inclined to avoid using collation keys unless you really need
>> them. According to ICU documentation, a direct sort using a collation is
>> supposed to be much more efficient.
>>
>> It's not clear from your posts what you are doing to register the
>> collation with Saxon. There are many different approaches as the design
>> has evolved over time. There are two collation URI families recognized by
>> Saxon: the UCA collations defined in XPath 3.1 (see the Functions and
>> Operators spec), and the older Saxon collation URIs described here:
>> http://www.saxonica.com/documentation/index.html#!extensibility/config-ext
>> end/collation/implementing-collation
>>
>> There are also several ways of registering your own collation URIs,
>> including Configuration.registerCollation(),
>> Configuration.setCollationURIResolver(), and the <collation> element in
>> the configuration file.
>>
>> So to answer the question, how is Saxon doing the class instantiation, we
>> need to know rather more about what interfaces you are using. But the
>> likely answer is that it's simply doing Class.newInstance(). [If you
>> really need to, you can register an overload of DynamicLoader with the
>> Configuration, and override the method DynamicLoader.getInstance() to use
>> a different instantiation method]. I think the approach I would
>> recommend, given your description of what you are trying to do, is to
>> instantiate the RuleBasedCollator yourself, wrap it in an instance of
>> net.sf.saxon.expr.sort.SimpleCollation (which implements
>> net.sf.saxon.lib.StringCollator), and register the collation URI with
>> Configuration.registerCollation().
>>
>> A collation registered as an instance of SimpleCollation probably can't
>> be used in fn:contains() or other substring-matching functions, nor in
>> fn:collation-key(). But it can be used for sorting, which seems to be
>> your main use case, and for equality and ordering comparisons.
>>
>> Michael Kay
>> Saxonica
>>
>>
>>> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]> wrote:
>>>
>>> I have created a simple extension of the ICU RuleBasedCollator and my
>>> local unit test verifies that it can be used as a Comparator and for
>>> getting sort keys.
>>>
>>> However, when I try to use it with Saxon 9.6I get "Failed to instantiate
>>> class org.ditacommunity.i18n.ZhCnAwareCollator".
>>>
>>> So I must be failing to implement the expected instantiation method but
>>> I
>>> can't figure out what that might be.
>>>
>>> Here is my passing unit test for ZhCnAwareCollator:
>>>
>>> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
>>> assertNotNull("No comparator", collator != null);
>>> int result;
>>> result = collator.compare("a", "b");
>>> assertTrue("Compared incorrectly", result == -1);
>>> CollationKey sortKey = collator.getCollationKey("aaa");
>>> CollationKey sortKeyC = collator.getCollationKey("c");
>>> assertNotNull(sortKey);
>>> result = sortKey.compareTo(sortKeyC);
>>> assertEquals("Wrong compare result", result, -1);
>>>
>>>
>>> How is Saxon doing the class instantiation? I know it's loading the
>>> class
>>> because the load failed when I didn't have the ICU4J library in the
>>> class
>>> path (my collator is backed by an ICU4J collator).
>>>
>>> Thanks,
>>>
>>> Eliot
>>>
>>> --
>>> Eliot Kimber
>>> http://contrext.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>>>
>>>> I'm starting the process of implementing localized grouping and
>>>> sorting in
>>>> the context of the DITA Open Toolkit, including implementation of a
>>>> dictionary-based collator for Simplified Chinese (using the open-source
>>>> CEDICT dictionary).
>>>>
>>>> I want to make sure that I'm taking the most appropriate approach--it's
>>>> been more than a decade since I last implemented customized collation
>>>> features for Saxon (that was back in the Saxon 6 days).
>>>>
>>>> I think there are two basic approaches I could take:
>>>>
>>>> 1. Implement a custom collator as a RuleBasedCollator and then use that
>>>> with Saxon through a collation URI specified on xsl:sort and similar.
>>>>
>>>> 2. Implement a custom extension function that returns sort keys that
>>>> will
>>>> then collate correctly using the default Unicode collator (e.g., for
>>>> most
>>>> languages the sort key would just return the input string but for
>>>> Simplified Chinese, in particular, would return the pinyin
>>>> transliteration
>>>> as found in the dictionary).
>>>>
>>>> I think my best course of action is to implement a custom collator in
>>>> Java
>>>> and then use the Saxon 9.1 form of custom collator URI.
>>>>
>>>> Is my analysis correct? Is there some other option I've overlooked?
>>>>
>>>> Thanks,
>>>>
>>>> Eliot
>>>>
>>>> --
>>>> Eliot Kimber
>>>> http://contrext.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> --
>>>> ----
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>
>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------------------
>>> -----
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>>
>>
>> --------------------------------------------------------------------------
>> ----
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Eliot Kimber-2
Yes, having a zero-argument constructor results in instantiation success
(can I blame jet lag for fuzzy thinking--I've been doing this work while
returning to the US from Europe).

But if I'm understanding the documentation and what you say below, the
only way to provide the required locale information to the the collator is
through a custom collation URI resolver, is that correct?

I guess what I was hoping for was a way that I could directly instantiate
a Collator with the xml:lang value of the context element without having
to do anything more than supply a collator URI that resolves to my
Collator implementation.

As the API for instantiating Collator takes a Locale this seemed like a
reasonable thing for Saxon to do. The locale is always known, either the
effective value of xml:lang, if specified somewhere in the context's
ancestry, or Locale.getDefault() if xml:lang is not set at all.

I will work out what I need to do to configure a collation URI resolver in
the context of the Open Toolkit.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 






On 10/10/16, 3:34 AM, "Michael Kay" <[hidden email]> wrote:

>OK thanks.
>
>If you provide a class name to a Saxon collation URI then it has to have
>a public zero-argument constructor. We don't set a Locale - that's up to
>you. I think in this scenario the best thing is to implement a simple
>Comparator which delegates (directly or indirectly) to a
>RuleBasedCollator.
>
>Michael Kay
>Saxonica
>
>> On 9 Oct 2016, at 16:21, Eliot Kimber <[hidden email]> wrote:
>>
>> Because this is in the context of the DITA Open Toolkit I don't have the
>> ability to control the parser configuration, so I'm limited to either
>> using a collation URI or using a normal URI-accessed Java extension
>> function.
>>
>> Here's what I'm currently trying to do:
>>
>> <xsl:apply-templates select="word">
>>            <xsl:sort
>>
>>collation="http://saxon.sf.net/collation?class=org.ditacommunity.i18n.ZhC
>>nA
>> wareCollator"/>
>>          </xsl:apply-templates>
>>
>> In an XSLT 2 transform.
>>
>> Where ZhCnAwareCollator implements java.util.Comparator and extends
>> java.text.Collator. The implementation is backed by an ICU
>> RuleBasedCollator so I can take advantage of the existing ICU usage in
>>the
>> OT (basically reusing the code the constructs collator rules from
>> configuration files).
>>
>> I assumed Saxon was using the java.text.Collator.getInstance(Locale
>> desiredLocale) method to get an instance as that's how you provide the
>> locale to use.
>>
>> If it's just calling Class.newInstance() then how it is it providing the
>> locale?
>>
>>
>> I also need to do something that works both with Saxon 9.1 and 9.6+. My
>> current tests have been with Saxon 9.6 as I'm running my initial tests
>> through oXygen just to keep things simple.
>>
>>
>> Thanks,
>>
>> Eliot
>> --
>> Eliot Kimber
>> http://contrext.com
>>
>>
>>
>>
>>
>>
>>
>> On 10/8/16, 6:47 PM, "Michael Kay" <[hidden email]> wrote:
>>
>>> Saxon-EE supports UCA collation URIs as defined in the XPath 3.1 and
>>>XSLT
>>> 3.0 specs by using the collation support in the ICU library. I've been
>>> fairly immersed in that over the last couple of weeks, as it happens,
>>>but
>>> I'm struggling to remember exactly what's available if you're using
>>>plain
>>> Saxon-HE. We probably need to change the packaging at some stage
>>>because
>>> UCA collation URIs are a mandatory feature in XPath 3.1, though we may
>>> continue to support them in HE only using what's in the JDK as distinct
>>> from using ICU.
>>>
>>> I'd be inclined to avoid using collation keys unless you really need
>>> them. According to ICU documentation, a direct sort using a collation
>>>is
>>> supposed to be much more efficient.
>>>
>>> It's not clear from your posts what you are doing to register the
>>> collation with Saxon. There are many different approaches as the design
>>> has evolved over time. There are two collation URI families recognized
>>>by
>>> Saxon: the UCA collations defined in XPath 3.1 (see the Functions and
>>> Operators spec), and the older Saxon collation URIs described here:
>>>
>>>http://www.saxonica.com/documentation/index.html#!extensibility/config-e
>>>xt
>>> end/collation/implementing-collation
>>>
>>> There are also several ways of registering your own collation URIs,
>>> including Configuration.registerCollation(),
>>> Configuration.setCollationURIResolver(), and the <collation> element in
>>> the configuration file.
>>>
>>> So to answer the question, how is Saxon doing the class instantiation,
>>>we
>>> need to know rather more about what interfaces you are using. But the
>>> likely answer is that it's simply doing Class.newInstance(). [If you
>>> really need to, you can register an overload of DynamicLoader with the
>>> Configuration, and override the method DynamicLoader.getInstance() to
>>>use
>>> a different instantiation method]. I think the approach I would
>>> recommend, given your description of what you are trying to do, is to
>>> instantiate the RuleBasedCollator yourself, wrap it in an instance of
>>> net.sf.saxon.expr.sort.SimpleCollation (which implements
>>> net.sf.saxon.lib.StringCollator), and register the collation URI with
>>> Configuration.registerCollation().
>>>
>>> A collation registered as an instance of SimpleCollation probably can't
>>> be used in fn:contains() or other substring-matching functions, nor in
>>> fn:collation-key(). But it can be used for sorting, which seems to be
>>> your main use case, and for equality and ordering comparisons.
>>>
>>> Michael Kay
>>> Saxonica
>>>
>>>
>>>> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]> wrote:
>>>>
>>>> I have created a simple extension of the ICU RuleBasedCollator and my
>>>> local unit test verifies that it can be used as a Comparator and for
>>>> getting sort keys.
>>>>
>>>> However, when I try to use it with Saxon 9.6I get "Failed to
>>>>instantiate
>>>> class org.ditacommunity.i18n.ZhCnAwareCollator".
>>>>
>>>> So I must be failing to implement the expected instantiation method
>>>>but
>>>> I
>>>> can't figure out what that might be.
>>>>
>>>> Here is my passing unit test for ZhCnAwareCollator:
>>>>
>>>> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
>>>> assertNotNull("No comparator", collator != null);
>>>> int result;
>>>> result = collator.compare("a", "b");
>>>> assertTrue("Compared incorrectly", result == -1);
>>>> CollationKey sortKey = collator.getCollationKey("aaa");
>>>> CollationKey sortKeyC = collator.getCollationKey("c");
>>>> assertNotNull(sortKey);
>>>> result = sortKey.compareTo(sortKeyC);
>>>> assertEquals("Wrong compare result", result, -1);
>>>>
>>>>
>>>> How is Saxon doing the class instantiation? I know it's loading the
>>>> class
>>>> because the load failed when I didn't have the ICU4J library in the
>>>> class
>>>> path (my collator is backed by an ICU4J collator).
>>>>
>>>> Thanks,
>>>>
>>>> Eliot
>>>>
>>>> --
>>>> Eliot Kimber
>>>> http://contrext.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>>>>
>>>>> I'm starting the process of implementing localized grouping and
>>>>> sorting in
>>>>> the context of the DITA Open Toolkit, including implementation of a
>>>>> dictionary-based collator for Simplified Chinese (using the
>>>>>open-source
>>>>> CEDICT dictionary).
>>>>>
>>>>> I want to make sure that I'm taking the most appropriate
>>>>>approach--it's
>>>>> been more than a decade since I last implemented customized collation
>>>>> features for Saxon (that was back in the Saxon 6 days).
>>>>>
>>>>> I think there are two basic approaches I could take:
>>>>>
>>>>> 1. Implement a custom collator as a RuleBasedCollator and then use
>>>>>that
>>>>> with Saxon through a collation URI specified on xsl:sort and similar.
>>>>>
>>>>> 2. Implement a custom extension function that returns sort keys that
>>>>> will
>>>>> then collate correctly using the default Unicode collator (e.g., for
>>>>> most
>>>>> languages the sort key would just return the input string but for
>>>>> Simplified Chinese, in particular, would return the pinyin
>>>>> transliteration
>>>>> as found in the dictionary).
>>>>>
>>>>> I think my best course of action is to implement a custom collator in
>>>>> Java
>>>>> and then use the Saxon 9.1 form of custom collator URI.
>>>>>
>>>>> Is my analysis correct? Is there some other option I've overlooked?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Eliot
>>>>>
>>>>> --
>>>>> Eliot Kimber
>>>>> http://contrext.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----------------------------------------------------------------------
>>>>>--
>>>>> --
>>>>> ----
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>-----------------------------------------------------------------------
>>>>--
>>>> -----
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>--
>>> ----
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>
>>
>>
>>
>>-------------------------------------------------------------------------
>>-----
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>
>
>
>--------------------------------------------------------------------------
>----
>Check out the vibrant tech community on one of the world's most
>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>_______________________________________________
>saxon-help mailing list archived at http://saxon.markmail.org/
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/saxon-help
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Michael Kay
The trouble is that when you use the class=xxxx parameter in a Saxon collation URI, Saxon doesn't know what kind of object you are instantiating until it has been instantiated. I guess we could try and be smart by loading the class, then seeing whether it's a subclass of java.text.Collator, and if it is, instantiating it using the static factory method. But we don't currently do that. I think there's also a principle at stake, which is that collation URIs are not context-sensitive.

(F&O section 5.3.1: "Note that some specifications use the term collation to refer to an algorithm that can be parameterized, but in this specification, each possible parameterization is considered to be a distinct collation.")

Michael Kay
Saxonica

> On 10 Oct 2016, at 08:29, Eliot Kimber <[hidden email]> wrote:
>
> Yes, having a zero-argument constructor results in instantiation success
> (can I blame jet lag for fuzzy thinking--I've been doing this work while
> returning to the US from Europe).
>
> But if I'm understanding the documentation and what you say below, the
> only way to provide the required locale information to the the collator is
> through a custom collation URI resolver, is that correct?
>
> I guess what I was hoping for was a way that I could directly instantiate
> a Collator with the xml:lang value of the context element without having
> to do anything more than supply a collator URI that resolves to my
> Collator implementation.
>
> As the API for instantiating Collator takes a Locale this seemed like a
> reasonable thing for Saxon to do. The locale is always known, either the
> effective value of xml:lang, if specified somewhere in the context's
> ancestry, or Locale.getDefault() if xml:lang is not set at all.
>
> I will work out what I need to do to configure a collation URI resolver in
> the context of the Open Toolkit.
>
> Cheers,
>
> Eliot
>
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
>
>
>
> On 10/10/16, 3:34 AM, "Michael Kay" <[hidden email]> wrote:
>
>> OK thanks.
>>
>> If you provide a class name to a Saxon collation URI then it has to have
>> a public zero-argument constructor. We don't set a Locale - that's up to
>> you. I think in this scenario the best thing is to implement a simple
>> Comparator which delegates (directly or indirectly) to a
>> RuleBasedCollator.
>>
>> Michael Kay
>> Saxonica
>>
>>> On 9 Oct 2016, at 16:21, Eliot Kimber <[hidden email]> wrote:
>>>
>>> Because this is in the context of the DITA Open Toolkit I don't have the
>>> ability to control the parser configuration, so I'm limited to either
>>> using a collation URI or using a normal URI-accessed Java extension
>>> function.
>>>
>>> Here's what I'm currently trying to do:
>>>
>>> <xsl:apply-templates select="word">
>>>           <xsl:sort
>>>
>>> collation="http://saxon.sf.net/collation?class=org.ditacommunity.i18n.ZhC
>>> nA
>>> wareCollator"/>
>>>         </xsl:apply-templates>
>>>
>>> In an XSLT 2 transform.
>>>
>>> Where ZhCnAwareCollator implements java.util.Comparator and extends
>>> java.text.Collator. The implementation is backed by an ICU
>>> RuleBasedCollator so I can take advantage of the existing ICU usage in
>>> the
>>> OT (basically reusing the code the constructs collator rules from
>>> configuration files).
>>>
>>> I assumed Saxon was using the java.text.Collator.getInstance(Locale
>>> desiredLocale) method to get an instance as that's how you provide the
>>> locale to use.
>>>
>>> If it's just calling Class.newInstance() then how it is it providing the
>>> locale?
>>>
>>>
>>> I also need to do something that works both with Saxon 9.1 and 9.6+. My
>>> current tests have been with Saxon 9.6 as I'm running my initial tests
>>> through oXygen just to keep things simple.
>>>
>>>
>>> Thanks,
>>>
>>> Eliot
>>> --
>>> Eliot Kimber
>>> http://contrext.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 10/8/16, 6:47 PM, "Michael Kay" <[hidden email]> wrote:
>>>
>>>> Saxon-EE supports UCA collation URIs as defined in the XPath 3.1 and
>>>> XSLT
>>>> 3.0 specs by using the collation support in the ICU library. I've been
>>>> fairly immersed in that over the last couple of weeks, as it happens,
>>>> but
>>>> I'm struggling to remember exactly what's available if you're using
>>>> plain
>>>> Saxon-HE. We probably need to change the packaging at some stage
>>>> because
>>>> UCA collation URIs are a mandatory feature in XPath 3.1, though we may
>>>> continue to support them in HE only using what's in the JDK as distinct
>>>> from using ICU.
>>>>
>>>> I'd be inclined to avoid using collation keys unless you really need
>>>> them. According to ICU documentation, a direct sort using a collation
>>>> is
>>>> supposed to be much more efficient.
>>>>
>>>> It's not clear from your posts what you are doing to register the
>>>> collation with Saxon. There are many different approaches as the design
>>>> has evolved over time. There are two collation URI families recognized
>>>> by
>>>> Saxon: the UCA collations defined in XPath 3.1 (see the Functions and
>>>> Operators spec), and the older Saxon collation URIs described here:
>>>>
>>>> http://www.saxonica.com/documentation/index.html#!extensibility/config-e
>>>> xt
>>>> end/collation/implementing-collation
>>>>
>>>> There are also several ways of registering your own collation URIs,
>>>> including Configuration.registerCollation(),
>>>> Configuration.setCollationURIResolver(), and the <collation> element in
>>>> the configuration file.
>>>>
>>>> So to answer the question, how is Saxon doing the class instantiation,
>>>> we
>>>> need to know rather more about what interfaces you are using. But the
>>>> likely answer is that it's simply doing Class.newInstance(). [If you
>>>> really need to, you can register an overload of DynamicLoader with the
>>>> Configuration, and override the method DynamicLoader.getInstance() to
>>>> use
>>>> a different instantiation method]. I think the approach I would
>>>> recommend, given your description of what you are trying to do, is to
>>>> instantiate the RuleBasedCollator yourself, wrap it in an instance of
>>>> net.sf.saxon.expr.sort.SimpleCollation (which implements
>>>> net.sf.saxon.lib.StringCollator), and register the collation URI with
>>>> Configuration.registerCollation().
>>>>
>>>> A collation registered as an instance of SimpleCollation probably can't
>>>> be used in fn:contains() or other substring-matching functions, nor in
>>>> fn:collation-key(). But it can be used for sorting, which seems to be
>>>> your main use case, and for equality and ordering comparisons.
>>>>
>>>> Michael Kay
>>>> Saxonica
>>>>
>>>>
>>>>> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]> wrote:
>>>>>
>>>>> I have created a simple extension of the ICU RuleBasedCollator and my
>>>>> local unit test verifies that it can be used as a Comparator and for
>>>>> getting sort keys.
>>>>>
>>>>> However, when I try to use it with Saxon 9.6I get "Failed to
>>>>> instantiate
>>>>> class org.ditacommunity.i18n.ZhCnAwareCollator".
>>>>>
>>>>> So I must be failing to implement the expected instantiation method
>>>>> but
>>>>> I
>>>>> can't figure out what that might be.
>>>>>
>>>>> Here is my passing unit test for ZhCnAwareCollator:
>>>>>
>>>>> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
>>>>> assertNotNull("No comparator", collator != null);
>>>>> int result;
>>>>> result = collator.compare("a", "b");
>>>>> assertTrue("Compared incorrectly", result == -1);
>>>>> CollationKey sortKey = collator.getCollationKey("aaa");
>>>>> CollationKey sortKeyC = collator.getCollationKey("c");
>>>>> assertNotNull(sortKey);
>>>>> result = sortKey.compareTo(sortKeyC);
>>>>> assertEquals("Wrong compare result", result, -1);
>>>>>
>>>>>
>>>>> How is Saxon doing the class instantiation? I know it's loading the
>>>>> class
>>>>> because the load failed when I didn't have the ICU4J library in the
>>>>> class
>>>>> path (my collator is backed by an ICU4J collator).
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Eliot
>>>>>
>>>>> --
>>>>> Eliot Kimber
>>>>> http://contrext.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>>>>>
>>>>>> I'm starting the process of implementing localized grouping and
>>>>>> sorting in
>>>>>> the context of the DITA Open Toolkit, including implementation of a
>>>>>> dictionary-based collator for Simplified Chinese (using the
>>>>>> open-source
>>>>>> CEDICT dictionary).
>>>>>>
>>>>>> I want to make sure that I'm taking the most appropriate
>>>>>> approach--it's
>>>>>> been more than a decade since I last implemented customized collation
>>>>>> features for Saxon (that was back in the Saxon 6 days).
>>>>>>
>>>>>> I think there are two basic approaches I could take:
>>>>>>
>>>>>> 1. Implement a custom collator as a RuleBasedCollator and then use
>>>>>> that
>>>>>> with Saxon through a collation URI specified on xsl:sort and similar.
>>>>>>
>>>>>> 2. Implement a custom extension function that returns sort keys that
>>>>>> will
>>>>>> then collate correctly using the default Unicode collator (e.g., for
>>>>>> most
>>>>>> languages the sort key would just return the input string but for
>>>>>> Simplified Chinese, in particular, would return the pinyin
>>>>>> transliteration
>>>>>> as found in the dictionary).
>>>>>>
>>>>>> I think my best course of action is to implement a custom collator in
>>>>>> Java
>>>>>> and then use the Saxon 9.1 form of custom collator URI.
>>>>>>
>>>>>> Is my analysis correct? Is there some other option I've overlooked?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Eliot
>>>>>>
>>>>>> --
>>>>>> Eliot Kimber
>>>>>> http://contrext.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> --
>>>>>> --
>>>>>> ----
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>> _______________________________________________
>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----------------------------------------------------------------------
>>>>> --
>>>>> -----
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> --
>>>> ----
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>
>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------------------
>>> -----
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>>
>>
>> --------------------------------------------------------------------------
>> ----
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Eliot Kimber-2
But the documentation says the implementation has a to be a
java.text.Collator, which is I why I thought (hoped?) it would use the
Collator API to instantiate it.

But I see that delegating that decision to the URI resolver makes sense.

The alternative would be to have a separate class for each possible locale
but that's clearly not a realistic solution.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 






On 10/10/16, 5:43 AM, "Michael Kay" <[hidden email]> wrote:

>The trouble is that when you use the class=xxxx parameter in a Saxon
>collation URI, Saxon doesn't know what kind of object you are
>instantiating until it has been instantiated. I guess we could try and be
>smart by loading the class, then seeing whether it's a subclass of
>java.text.Collator, and if it is, instantiating it using the static
>factory method. But we don't currently do that. I think there's also a
>principle at stake, which is that collation URIs are not
>context-sensitive.
>
>(F&O section 5.3.1: "Note that some specifications use the term collation
>to refer to an algorithm that can be parameterized, but in this
>specification, each possible parameterization is considered to be a
>distinct collation.")
>
>Michael Kay
>Saxonica
>
>> On 10 Oct 2016, at 08:29, Eliot Kimber <[hidden email]> wrote:
>>
>> Yes, having a zero-argument constructor results in instantiation success
>> (can I blame jet lag for fuzzy thinking--I've been doing this work while
>> returning to the US from Europe).
>>
>> But if I'm understanding the documentation and what you say below, the
>> only way to provide the required locale information to the the collator
>>is
>> through a custom collation URI resolver, is that correct?
>>
>> I guess what I was hoping for was a way that I could directly
>>instantiate
>> a Collator with the xml:lang value of the context element without having
>> to do anything more than supply a collator URI that resolves to my
>> Collator implementation.
>>
>> As the API for instantiating Collator takes a Locale this seemed like a
>> reasonable thing for Saxon to do. The locale is always known, either the
>> effective value of xml:lang, if specified somewhere in the context's
>> ancestry, or Locale.getDefault() if xml:lang is not set at all.
>>
>> I will work out what I need to do to configure a collation URI resolver
>>in
>> the context of the Open Toolkit.
>>
>> Cheers,
>>
>> Eliot
>>
>> --
>> Eliot Kimber
>> http://contrext.com
>>
>>
>>
>>
>>
>>
>>
>> On 10/10/16, 3:34 AM, "Michael Kay" <[hidden email]> wrote:
>>
>>> OK thanks.
>>>
>>> If you provide a class name to a Saxon collation URI then it has to
>>>have
>>> a public zero-argument constructor. We don't set a Locale - that's up
>>>to
>>> you. I think in this scenario the best thing is to implement a simple
>>> Comparator which delegates (directly or indirectly) to a
>>> RuleBasedCollator.
>>>
>>> Michael Kay
>>> Saxonica
>>>
>>>> On 9 Oct 2016, at 16:21, Eliot Kimber <[hidden email]> wrote:
>>>>
>>>> Because this is in the context of the DITA Open Toolkit I don't have
>>>>the
>>>> ability to control the parser configuration, so I'm limited to either
>>>> using a collation URI or using a normal URI-accessed Java extension
>>>> function.
>>>>
>>>> Here's what I'm currently trying to do:
>>>>
>>>> <xsl:apply-templates select="word">
>>>>           <xsl:sort
>>>>
>>>>
>>>>collation="http://saxon.sf.net/collation?class=org.ditacommunity.i18n.Z
>>>>hC
>>>> nA
>>>> wareCollator"/>
>>>>         </xsl:apply-templates>
>>>>
>>>> In an XSLT 2 transform.
>>>>
>>>> Where ZhCnAwareCollator implements java.util.Comparator and extends
>>>> java.text.Collator. The implementation is backed by an ICU
>>>> RuleBasedCollator so I can take advantage of the existing ICU usage in
>>>> the
>>>> OT (basically reusing the code the constructs collator rules from
>>>> configuration files).
>>>>
>>>> I assumed Saxon was using the java.text.Collator.getInstance(Locale
>>>> desiredLocale) method to get an instance as that's how you provide the
>>>> locale to use.
>>>>
>>>> If it's just calling Class.newInstance() then how it is it providing
>>>>the
>>>> locale?
>>>>
>>>>
>>>> I also need to do something that works both with Saxon 9.1 and 9.6+.
>>>>My
>>>> current tests have been with Saxon 9.6 as I'm running my initial tests
>>>> through oXygen just to keep things simple.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Eliot
>>>> --
>>>> Eliot Kimber
>>>> http://contrext.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 10/8/16, 6:47 PM, "Michael Kay" <[hidden email]> wrote:
>>>>
>>>>> Saxon-EE supports UCA collation URIs as defined in the XPath 3.1 and
>>>>> XSLT
>>>>> 3.0 specs by using the collation support in the ICU library. I've
>>>>>been
>>>>> fairly immersed in that over the last couple of weeks, as it happens,
>>>>> but
>>>>> I'm struggling to remember exactly what's available if you're using
>>>>> plain
>>>>> Saxon-HE. We probably need to change the packaging at some stage
>>>>> because
>>>>> UCA collation URIs are a mandatory feature in XPath 3.1, though we
>>>>>may
>>>>> continue to support them in HE only using what's in the JDK as
>>>>>distinct
>>>>> from using ICU.
>>>>>
>>>>> I'd be inclined to avoid using collation keys unless you really need
>>>>> them. According to ICU documentation, a direct sort using a collation
>>>>> is
>>>>> supposed to be much more efficient.
>>>>>
>>>>> It's not clear from your posts what you are doing to register the
>>>>> collation with Saxon. There are many different approaches as the
>>>>>design
>>>>> has evolved over time. There are two collation URI families
>>>>>recognized
>>>>> by
>>>>> Saxon: the UCA collations defined in XPath 3.1 (see the Functions and
>>>>> Operators spec), and the older Saxon collation URIs described here:
>>>>>
>>>>>
>>>>>http://www.saxonica.com/documentation/index.html#!extensibility/config
>>>>>-e
>>>>> xt
>>>>> end/collation/implementing-collation
>>>>>
>>>>> There are also several ways of registering your own collation URIs,
>>>>> including Configuration.registerCollation(),
>>>>> Configuration.setCollationURIResolver(), and the <collation> element
>>>>>in
>>>>> the configuration file.
>>>>>
>>>>> So to answer the question, how is Saxon doing the class
>>>>>instantiation,
>>>>> we
>>>>> need to know rather more about what interfaces you are using. But the
>>>>> likely answer is that it's simply doing Class.newInstance(). [If you
>>>>> really need to, you can register an overload of DynamicLoader with
>>>>>the
>>>>> Configuration, and override the method DynamicLoader.getInstance() to
>>>>> use
>>>>> a different instantiation method]. I think the approach I would
>>>>> recommend, given your description of what you are trying to do, is to
>>>>> instantiate the RuleBasedCollator yourself, wrap it in an instance of
>>>>> net.sf.saxon.expr.sort.SimpleCollation (which implements
>>>>> net.sf.saxon.lib.StringCollator), and register the collation URI with
>>>>> Configuration.registerCollation().
>>>>>
>>>>> A collation registered as an instance of SimpleCollation probably
>>>>>can't
>>>>> be used in fn:contains() or other substring-matching functions, nor
>>>>>in
>>>>> fn:collation-key(). But it can be used for sorting, which seems to be
>>>>> your main use case, and for equality and ordering comparisons.
>>>>>
>>>>> Michael Kay
>>>>> Saxonica
>>>>>
>>>>>
>>>>>> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]> wrote:
>>>>>>
>>>>>> I have created a simple extension of the ICU RuleBasedCollator and
>>>>>>my
>>>>>> local unit test verifies that it can be used as a Comparator and for
>>>>>> getting sort keys.
>>>>>>
>>>>>> However, when I try to use it with Saxon 9.6I get "Failed to
>>>>>> instantiate
>>>>>> class org.ditacommunity.i18n.ZhCnAwareCollator".
>>>>>>
>>>>>> So I must be failing to implement the expected instantiation method
>>>>>> but
>>>>>> I
>>>>>> can't figure out what that might be.
>>>>>>
>>>>>> Here is my passing unit test for ZhCnAwareCollator:
>>>>>>
>>>>>> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
>>>>>> assertNotNull("No comparator", collator != null);
>>>>>> int result;
>>>>>> result = collator.compare("a", "b");
>>>>>> assertTrue("Compared incorrectly", result == -1);
>>>>>> CollationKey sortKey = collator.getCollationKey("aaa");
>>>>>> CollationKey sortKeyC = collator.getCollationKey("c");
>>>>>> assertNotNull(sortKey);
>>>>>> result = sortKey.compareTo(sortKeyC);
>>>>>> assertEquals("Wrong compare result", result, -1);
>>>>>>
>>>>>>
>>>>>> How is Saxon doing the class instantiation? I know it's loading the
>>>>>> class
>>>>>> because the load failed when I didn't have the ICU4J library in the
>>>>>> class
>>>>>> path (my collator is backed by an ICU4J collator).
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Eliot
>>>>>>
>>>>>> --
>>>>>> Eliot Kimber
>>>>>> http://contrext.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>>>>>>
>>>>>>> I'm starting the process of implementing localized grouping and
>>>>>>> sorting in
>>>>>>> the context of the DITA Open Toolkit, including implementation of a
>>>>>>> dictionary-based collator for Simplified Chinese (using the
>>>>>>> open-source
>>>>>>> CEDICT dictionary).
>>>>>>>
>>>>>>> I want to make sure that I'm taking the most appropriate
>>>>>>> approach--it's
>>>>>>> been more than a decade since I last implemented customized
>>>>>>>collation
>>>>>>> features for Saxon (that was back in the Saxon 6 days).
>>>>>>>
>>>>>>> I think there are two basic approaches I could take:
>>>>>>>
>>>>>>> 1. Implement a custom collator as a RuleBasedCollator and then use
>>>>>>> that
>>>>>>> with Saxon through a collation URI specified on xsl:sort and
>>>>>>>similar.
>>>>>>>
>>>>>>> 2. Implement a custom extension function that returns sort keys
>>>>>>>that
>>>>>>> will
>>>>>>> then collate correctly using the default Unicode collator (e.g.,
>>>>>>>for
>>>>>>> most
>>>>>>> languages the sort key would just return the input string but for
>>>>>>> Simplified Chinese, in particular, would return the pinyin
>>>>>>> transliteration
>>>>>>> as found in the dictionary).
>>>>>>>
>>>>>>> I think my best course of action is to implement a custom collator
>>>>>>>in
>>>>>>> Java
>>>>>>> and then use the Saxon 9.1 form of custom collator URI.
>>>>>>>
>>>>>>> Is my analysis correct? Is there some other option I've overlooked?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Eliot
>>>>>>>
>>>>>>> --
>>>>>>> Eliot Kimber
>>>>>>> http://contrext.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>--------------------------------------------------------------------
>>>>>>>--
>>>>>>> --
>>>>>>> --
>>>>>>> ----
>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>> _______________________________________________
>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>--
>>>>>> --
>>>>>> -----
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>> _______________________________________________
>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----------------------------------------------------------------------
>>>>>--
>>>>> --
>>>>> ----
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>-----------------------------------------------------------------------
>>>>--
>>>> -----
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>--
>>> ----
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>
>>
>>
>>
>>-------------------------------------------------------------------------
>>-----
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>
>
>
>--------------------------------------------------------------------------
>----
>Check out the vibrant tech community on one of the world's most
>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>_______________________________________________
>saxon-help mailing list archived at http://saxon.markmail.org/
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/saxon-help
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Eliot Kimber-2
In reply to this post by Michael Kay
Is it possible to set Saxon features (as for TransformFactor.setFeature())
using system properties? I'm looking at various bits of documentation but
I'm not seeing anything that looks like it should work, so I'm thinking
this is not provided for. But maybe I missed the obvious?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 






On 10/10/16, 5:43 AM, "Michael Kay" <[hidden email]> wrote:

>The trouble is that when you use the class=xxxx parameter in a Saxon
>collation URI, Saxon doesn't know what kind of object you are
>instantiating until it has been instantiated. I guess we could try and be
>smart by loading the class, then seeing whether it's a subclass of
>java.text.Collator, and if it is, instantiating it using the static
>factory method. But we don't currently do that. I think there's also a
>principle at stake, which is that collation URIs are not
>context-sensitive.
>
>(F&O section 5.3.1: "Note that some specifications use the term collation
>to refer to an algorithm that can be parameterized, but in this
>specification, each possible parameterization is considered to be a
>distinct collation.")
>
>Michael Kay
>Saxonica
>
>> On 10 Oct 2016, at 08:29, Eliot Kimber <[hidden email]> wrote:
>>
>> Yes, having a zero-argument constructor results in instantiation success
>> (can I blame jet lag for fuzzy thinking--I've been doing this work while
>> returning to the US from Europe).
>>
>> But if I'm understanding the documentation and what you say below, the
>> only way to provide the required locale information to the the collator
>>is
>> through a custom collation URI resolver, is that correct?
>>
>> I guess what I was hoping for was a way that I could directly
>>instantiate
>> a Collator with the xml:lang value of the context element without having
>> to do anything more than supply a collator URI that resolves to my
>> Collator implementation.
>>
>> As the API for instantiating Collator takes a Locale this seemed like a
>> reasonable thing for Saxon to do. The locale is always known, either the
>> effective value of xml:lang, if specified somewhere in the context's
>> ancestry, or Locale.getDefault() if xml:lang is not set at all.
>>
>> I will work out what I need to do to configure a collation URI resolver
>>in
>> the context of the Open Toolkit.
>>
>> Cheers,
>>
>> Eliot
>>
>> --
>> Eliot Kimber
>> http://contrext.com
>>
>>
>>
>>
>>
>>
>>
>> On 10/10/16, 3:34 AM, "Michael Kay" <[hidden email]> wrote:
>>
>>> OK thanks.
>>>
>>> If you provide a class name to a Saxon collation URI then it has to
>>>have
>>> a public zero-argument constructor. We don't set a Locale - that's up
>>>to
>>> you. I think in this scenario the best thing is to implement a simple
>>> Comparator which delegates (directly or indirectly) to a
>>> RuleBasedCollator.
>>>
>>> Michael Kay
>>> Saxonica
>>>
>>>> On 9 Oct 2016, at 16:21, Eliot Kimber <[hidden email]> wrote:
>>>>
>>>> Because this is in the context of the DITA Open Toolkit I don't have
>>>>the
>>>> ability to control the parser configuration, so I'm limited to either
>>>> using a collation URI or using a normal URI-accessed Java extension
>>>> function.
>>>>
>>>> Here's what I'm currently trying to do:
>>>>
>>>> <xsl:apply-templates select="word">
>>>>           <xsl:sort
>>>>
>>>>
>>>>collation="http://saxon.sf.net/collation?class=org.ditacommunity.i18n.Z
>>>>hC
>>>> nA
>>>> wareCollator"/>
>>>>         </xsl:apply-templates>
>>>>
>>>> In an XSLT 2 transform.
>>>>
>>>> Where ZhCnAwareCollator implements java.util.Comparator and extends
>>>> java.text.Collator. The implementation is backed by an ICU
>>>> RuleBasedCollator so I can take advantage of the existing ICU usage in
>>>> the
>>>> OT (basically reusing the code the constructs collator rules from
>>>> configuration files).
>>>>
>>>> I assumed Saxon was using the java.text.Collator.getInstance(Locale
>>>> desiredLocale) method to get an instance as that's how you provide the
>>>> locale to use.
>>>>
>>>> If it's just calling Class.newInstance() then how it is it providing
>>>>the
>>>> locale?
>>>>
>>>>
>>>> I also need to do something that works both with Saxon 9.1 and 9.6+.
>>>>My
>>>> current tests have been with Saxon 9.6 as I'm running my initial tests
>>>> through oXygen just to keep things simple.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Eliot
>>>> --
>>>> Eliot Kimber
>>>> http://contrext.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 10/8/16, 6:47 PM, "Michael Kay" <[hidden email]> wrote:
>>>>
>>>>> Saxon-EE supports UCA collation URIs as defined in the XPath 3.1 and
>>>>> XSLT
>>>>> 3.0 specs by using the collation support in the ICU library. I've
>>>>>been
>>>>> fairly immersed in that over the last couple of weeks, as it happens,
>>>>> but
>>>>> I'm struggling to remember exactly what's available if you're using
>>>>> plain
>>>>> Saxon-HE. We probably need to change the packaging at some stage
>>>>> because
>>>>> UCA collation URIs are a mandatory feature in XPath 3.1, though we
>>>>>may
>>>>> continue to support them in HE only using what's in the JDK as
>>>>>distinct
>>>>> from using ICU.
>>>>>
>>>>> I'd be inclined to avoid using collation keys unless you really need
>>>>> them. According to ICU documentation, a direct sort using a collation
>>>>> is
>>>>> supposed to be much more efficient.
>>>>>
>>>>> It's not clear from your posts what you are doing to register the
>>>>> collation with Saxon. There are many different approaches as the
>>>>>design
>>>>> has evolved over time. There are two collation URI families
>>>>>recognized
>>>>> by
>>>>> Saxon: the UCA collations defined in XPath 3.1 (see the Functions and
>>>>> Operators spec), and the older Saxon collation URIs described here:
>>>>>
>>>>>
>>>>>http://www.saxonica.com/documentation/index.html#!extensibility/config
>>>>>-e
>>>>> xt
>>>>> end/collation/implementing-collation
>>>>>
>>>>> There are also several ways of registering your own collation URIs,
>>>>> including Configuration.registerCollation(),
>>>>> Configuration.setCollationURIResolver(), and the <collation> element
>>>>>in
>>>>> the configuration file.
>>>>>
>>>>> So to answer the question, how is Saxon doing the class
>>>>>instantiation,
>>>>> we
>>>>> need to know rather more about what interfaces you are using. But the
>>>>> likely answer is that it's simply doing Class.newInstance(). [If you
>>>>> really need to, you can register an overload of DynamicLoader with
>>>>>the
>>>>> Configuration, and override the method DynamicLoader.getInstance() to
>>>>> use
>>>>> a different instantiation method]. I think the approach I would
>>>>> recommend, given your description of what you are trying to do, is to
>>>>> instantiate the RuleBasedCollator yourself, wrap it in an instance of
>>>>> net.sf.saxon.expr.sort.SimpleCollation (which implements
>>>>> net.sf.saxon.lib.StringCollator), and register the collation URI with
>>>>> Configuration.registerCollation().
>>>>>
>>>>> A collation registered as an instance of SimpleCollation probably
>>>>>can't
>>>>> be used in fn:contains() or other substring-matching functions, nor
>>>>>in
>>>>> fn:collation-key(). But it can be used for sorting, which seems to be
>>>>> your main use case, and for equality and ordering comparisons.
>>>>>
>>>>> Michael Kay
>>>>> Saxonica
>>>>>
>>>>>
>>>>>> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]> wrote:
>>>>>>
>>>>>> I have created a simple extension of the ICU RuleBasedCollator and
>>>>>>my
>>>>>> local unit test verifies that it can be used as a Comparator and for
>>>>>> getting sort keys.
>>>>>>
>>>>>> However, when I try to use it with Saxon 9.6I get "Failed to
>>>>>> instantiate
>>>>>> class org.ditacommunity.i18n.ZhCnAwareCollator".
>>>>>>
>>>>>> So I must be failing to implement the expected instantiation method
>>>>>> but
>>>>>> I
>>>>>> can't figure out what that might be.
>>>>>>
>>>>>> Here is my passing unit test for ZhCnAwareCollator:
>>>>>>
>>>>>> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
>>>>>> assertNotNull("No comparator", collator != null);
>>>>>> int result;
>>>>>> result = collator.compare("a", "b");
>>>>>> assertTrue("Compared incorrectly", result == -1);
>>>>>> CollationKey sortKey = collator.getCollationKey("aaa");
>>>>>> CollationKey sortKeyC = collator.getCollationKey("c");
>>>>>> assertNotNull(sortKey);
>>>>>> result = sortKey.compareTo(sortKeyC);
>>>>>> assertEquals("Wrong compare result", result, -1);
>>>>>>
>>>>>>
>>>>>> How is Saxon doing the class instantiation? I know it's loading the
>>>>>> class
>>>>>> because the load failed when I didn't have the ICU4J library in the
>>>>>> class
>>>>>> path (my collator is backed by an ICU4J collator).
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Eliot
>>>>>>
>>>>>> --
>>>>>> Eliot Kimber
>>>>>> http://contrext.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>>>>>>
>>>>>>> I'm starting the process of implementing localized grouping and
>>>>>>> sorting in
>>>>>>> the context of the DITA Open Toolkit, including implementation of a
>>>>>>> dictionary-based collator for Simplified Chinese (using the
>>>>>>> open-source
>>>>>>> CEDICT dictionary).
>>>>>>>
>>>>>>> I want to make sure that I'm taking the most appropriate
>>>>>>> approach--it's
>>>>>>> been more than a decade since I last implemented customized
>>>>>>>collation
>>>>>>> features for Saxon (that was back in the Saxon 6 days).
>>>>>>>
>>>>>>> I think there are two basic approaches I could take:
>>>>>>>
>>>>>>> 1. Implement a custom collator as a RuleBasedCollator and then use
>>>>>>> that
>>>>>>> with Saxon through a collation URI specified on xsl:sort and
>>>>>>>similar.
>>>>>>>
>>>>>>> 2. Implement a custom extension function that returns sort keys
>>>>>>>that
>>>>>>> will
>>>>>>> then collate correctly using the default Unicode collator (e.g.,
>>>>>>>for
>>>>>>> most
>>>>>>> languages the sort key would just return the input string but for
>>>>>>> Simplified Chinese, in particular, would return the pinyin
>>>>>>> transliteration
>>>>>>> as found in the dictionary).
>>>>>>>
>>>>>>> I think my best course of action is to implement a custom collator
>>>>>>>in
>>>>>>> Java
>>>>>>> and then use the Saxon 9.1 form of custom collator URI.
>>>>>>>
>>>>>>> Is my analysis correct? Is there some other option I've overlooked?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Eliot
>>>>>>>
>>>>>>> --
>>>>>>> Eliot Kimber
>>>>>>> http://contrext.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>--------------------------------------------------------------------
>>>>>>>--
>>>>>>> --
>>>>>>> --
>>>>>>> ----
>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>> _______________________________________________
>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>--
>>>>>> --
>>>>>> -----
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>> _______________________________________________
>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----------------------------------------------------------------------
>>>>>--
>>>>> --
>>>>> ----
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>-----------------------------------------------------------------------
>>>>--
>>>> -----
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>--
>>> ----
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>
>>
>>
>>
>>-------------------------------------------------------------------------
>>-----
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>
>
>
>--------------------------------------------------------------------------
>----
>Check out the vibrant tech community on one of the world's most
>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>_______________________________________________
>saxon-help mailing list archived at http://saxon.markmail.org/
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/saxon-help
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Eliot Kimber-2
I think the easier thing is just to implement my own TransformerFactory
that can then set whatever features I need. That simplifies configuration.

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 






On 10/10/16, 1:59 PM, "Eliot Kimber" <[hidden email]> wrote:

>Is it possible to set Saxon features (as for TransformFactor.setFeature())
>using system properties? I'm looking at various bits of documentation but
>I'm not seeing anything that looks like it should work, so I'm thinking
>this is not provided for. But maybe I missed the obvious?
>
>Thanks,
>
>Eliot
>
>--
>Eliot Kimber
>http://contrext.com
>
>
>
>
>
>
>
>On 10/10/16, 5:43 AM, "Michael Kay" <[hidden email]> wrote:
>
>>The trouble is that when you use the class=xxxx parameter in a Saxon
>>collation URI, Saxon doesn't know what kind of object you are
>>instantiating until it has been instantiated. I guess we could try and be
>>smart by loading the class, then seeing whether it's a subclass of
>>java.text.Collator, and if it is, instantiating it using the static
>>factory method. But we don't currently do that. I think there's also a
>>principle at stake, which is that collation URIs are not
>>context-sensitive.
>>
>>(F&O section 5.3.1: "Note that some specifications use the term collation
>>to refer to an algorithm that can be parameterized, but in this
>>specification, each possible parameterization is considered to be a
>>distinct collation.")
>>
>>Michael Kay
>>Saxonica
>>
>>> On 10 Oct 2016, at 08:29, Eliot Kimber <[hidden email]> wrote:
>>>
>>> Yes, having a zero-argument constructor results in instantiation
>>>success
>>> (can I blame jet lag for fuzzy thinking--I've been doing this work
>>>while
>>> returning to the US from Europe).
>>>
>>> But if I'm understanding the documentation and what you say below, the
>>> only way to provide the required locale information to the the collator
>>>is
>>> through a custom collation URI resolver, is that correct?
>>>
>>> I guess what I was hoping for was a way that I could directly
>>>instantiate
>>> a Collator with the xml:lang value of the context element without
>>>having
>>> to do anything more than supply a collator URI that resolves to my
>>> Collator implementation.
>>>
>>> As the API for instantiating Collator takes a Locale this seemed like a
>>> reasonable thing for Saxon to do. The locale is always known, either
>>>the
>>> effective value of xml:lang, if specified somewhere in the context's
>>> ancestry, or Locale.getDefault() if xml:lang is not set at all.
>>>
>>> I will work out what I need to do to configure a collation URI resolver
>>>in
>>> the context of the Open Toolkit.
>>>
>>> Cheers,
>>>
>>> Eliot
>>>
>>> --
>>> Eliot Kimber
>>> http://contrext.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 10/10/16, 3:34 AM, "Michael Kay" <[hidden email]> wrote:
>>>
>>>> OK thanks.
>>>>
>>>> If you provide a class name to a Saxon collation URI then it has to
>>>>have
>>>> a public zero-argument constructor. We don't set a Locale - that's up
>>>>to
>>>> you. I think in this scenario the best thing is to implement a simple
>>>> Comparator which delegates (directly or indirectly) to a
>>>> RuleBasedCollator.
>>>>
>>>> Michael Kay
>>>> Saxonica
>>>>
>>>>> On 9 Oct 2016, at 16:21, Eliot Kimber <[hidden email]> wrote:
>>>>>
>>>>> Because this is in the context of the DITA Open Toolkit I don't have
>>>>>the
>>>>> ability to control the parser configuration, so I'm limited to either
>>>>> using a collation URI or using a normal URI-accessed Java extension
>>>>> function.
>>>>>
>>>>> Here's what I'm currently trying to do:
>>>>>
>>>>> <xsl:apply-templates select="word">
>>>>>           <xsl:sort
>>>>>
>>>>>
>>>>>collation="http://saxon.sf.net/collation?class=org.ditacommunity.i18n.
>>>>>Z
>>>>>hC
>>>>> nA
>>>>> wareCollator"/>
>>>>>         </xsl:apply-templates>
>>>>>
>>>>> In an XSLT 2 transform.
>>>>>
>>>>> Where ZhCnAwareCollator implements java.util.Comparator and extends
>>>>> java.text.Collator. The implementation is backed by an ICU
>>>>> RuleBasedCollator so I can take advantage of the existing ICU usage
>>>>>in
>>>>> the
>>>>> OT (basically reusing the code the constructs collator rules from
>>>>> configuration files).
>>>>>
>>>>> I assumed Saxon was using the java.text.Collator.getInstance(Locale
>>>>> desiredLocale) method to get an instance as that's how you provide
>>>>>the
>>>>> locale to use.
>>>>>
>>>>> If it's just calling Class.newInstance() then how it is it providing
>>>>>the
>>>>> locale?
>>>>>
>>>>>
>>>>> I also need to do something that works both with Saxon 9.1 and 9.6+.
>>>>>My
>>>>> current tests have been with Saxon 9.6 as I'm running my initial
>>>>>tests
>>>>> through oXygen just to keep things simple.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Eliot
>>>>> --
>>>>> Eliot Kimber
>>>>> http://contrext.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/8/16, 6:47 PM, "Michael Kay" <[hidden email]> wrote:
>>>>>
>>>>>> Saxon-EE supports UCA collation URIs as defined in the XPath 3.1 and
>>>>>> XSLT
>>>>>> 3.0 specs by using the collation support in the ICU library. I've
>>>>>>been
>>>>>> fairly immersed in that over the last couple of weeks, as it
>>>>>>happens,
>>>>>> but
>>>>>> I'm struggling to remember exactly what's available if you're using
>>>>>> plain
>>>>>> Saxon-HE. We probably need to change the packaging at some stage
>>>>>> because
>>>>>> UCA collation URIs are a mandatory feature in XPath 3.1, though we
>>>>>>may
>>>>>> continue to support them in HE only using what's in the JDK as
>>>>>>distinct
>>>>>> from using ICU.
>>>>>>
>>>>>> I'd be inclined to avoid using collation keys unless you really need
>>>>>> them. According to ICU documentation, a direct sort using a
>>>>>>collation
>>>>>> is
>>>>>> supposed to be much more efficient.
>>>>>>
>>>>>> It's not clear from your posts what you are doing to register the
>>>>>> collation with Saxon. There are many different approaches as the
>>>>>>design
>>>>>> has evolved over time. There are two collation URI families
>>>>>>recognized
>>>>>> by
>>>>>> Saxon: the UCA collations defined in XPath 3.1 (see the Functions
>>>>>>and
>>>>>> Operators spec), and the older Saxon collation URIs described here:
>>>>>>
>>>>>>
>>>>>>http://www.saxonica.com/documentation/index.html#!extensibility/confi
>>>>>>g
>>>>>>-e
>>>>>> xt
>>>>>> end/collation/implementing-collation
>>>>>>
>>>>>> There are also several ways of registering your own collation URIs,
>>>>>> including Configuration.registerCollation(),
>>>>>> Configuration.setCollationURIResolver(), and the <collation> element
>>>>>>in
>>>>>> the configuration file.
>>>>>>
>>>>>> So to answer the question, how is Saxon doing the class
>>>>>>instantiation,
>>>>>> we
>>>>>> need to know rather more about what interfaces you are using. But
>>>>>>the
>>>>>> likely answer is that it's simply doing Class.newInstance(). [If you
>>>>>> really need to, you can register an overload of DynamicLoader with
>>>>>>the
>>>>>> Configuration, and override the method DynamicLoader.getInstance()
>>>>>>to
>>>>>> use
>>>>>> a different instantiation method]. I think the approach I would
>>>>>> recommend, given your description of what you are trying to do, is
>>>>>>to
>>>>>> instantiate the RuleBasedCollator yourself, wrap it in an instance
>>>>>>of
>>>>>> net.sf.saxon.expr.sort.SimpleCollation (which implements
>>>>>> net.sf.saxon.lib.StringCollator), and register the collation URI
>>>>>>with
>>>>>> Configuration.registerCollation().
>>>>>>
>>>>>> A collation registered as an instance of SimpleCollation probably
>>>>>>can't
>>>>>> be used in fn:contains() or other substring-matching functions, nor
>>>>>>in
>>>>>> fn:collation-key(). But it can be used for sorting, which seems to
>>>>>>be
>>>>>> your main use case, and for equality and ordering comparisons.
>>>>>>
>>>>>> Michael Kay
>>>>>> Saxonica
>>>>>>
>>>>>>
>>>>>>> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]> wrote:
>>>>>>>
>>>>>>> I have created a simple extension of the ICU RuleBasedCollator and
>>>>>>>my
>>>>>>> local unit test verifies that it can be used as a Comparator and
>>>>>>>for
>>>>>>> getting sort keys.
>>>>>>>
>>>>>>> However, when I try to use it with Saxon 9.6I get "Failed to
>>>>>>> instantiate
>>>>>>> class org.ditacommunity.i18n.ZhCnAwareCollator".
>>>>>>>
>>>>>>> So I must be failing to implement the expected instantiation method
>>>>>>> but
>>>>>>> I
>>>>>>> can't figure out what that might be.
>>>>>>>
>>>>>>> Here is my passing unit test for ZhCnAwareCollator:
>>>>>>>
>>>>>>> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
>>>>>>> assertNotNull("No comparator", collator != null);
>>>>>>> int result;
>>>>>>> result = collator.compare("a", "b");
>>>>>>> assertTrue("Compared incorrectly", result == -1);
>>>>>>> CollationKey sortKey = collator.getCollationKey("aaa");
>>>>>>> CollationKey sortKeyC = collator.getCollationKey("c");
>>>>>>> assertNotNull(sortKey);
>>>>>>> result = sortKey.compareTo(sortKeyC);
>>>>>>> assertEquals("Wrong compare result", result, -1);
>>>>>>>
>>>>>>>
>>>>>>> How is Saxon doing the class instantiation? I know it's loading the
>>>>>>> class
>>>>>>> because the load failed when I didn't have the ICU4J library in the
>>>>>>> class
>>>>>>> path (my collator is backed by an ICU4J collator).
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Eliot
>>>>>>>
>>>>>>> --
>>>>>>> Eliot Kimber
>>>>>>> http://contrext.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>>>>>>>
>>>>>>>> I'm starting the process of implementing localized grouping and
>>>>>>>> sorting in
>>>>>>>> the context of the DITA Open Toolkit, including implementation of
>>>>>>>>a
>>>>>>>> dictionary-based collator for Simplified Chinese (using the
>>>>>>>> open-source
>>>>>>>> CEDICT dictionary).
>>>>>>>>
>>>>>>>> I want to make sure that I'm taking the most appropriate
>>>>>>>> approach--it's
>>>>>>>> been more than a decade since I last implemented customized
>>>>>>>>collation
>>>>>>>> features for Saxon (that was back in the Saxon 6 days).
>>>>>>>>
>>>>>>>> I think there are two basic approaches I could take:
>>>>>>>>
>>>>>>>> 1. Implement a custom collator as a RuleBasedCollator and then use
>>>>>>>> that
>>>>>>>> with Saxon through a collation URI specified on xsl:sort and
>>>>>>>>similar.
>>>>>>>>
>>>>>>>> 2. Implement a custom extension function that returns sort keys
>>>>>>>>that
>>>>>>>> will
>>>>>>>> then collate correctly using the default Unicode collator (e.g.,
>>>>>>>>for
>>>>>>>> most
>>>>>>>> languages the sort key would just return the input string but for
>>>>>>>> Simplified Chinese, in particular, would return the pinyin
>>>>>>>> transliteration
>>>>>>>> as found in the dictionary).
>>>>>>>>
>>>>>>>> I think my best course of action is to implement a custom collator
>>>>>>>>in
>>>>>>>> Java
>>>>>>>> and then use the Saxon 9.1 form of custom collator URI.
>>>>>>>>
>>>>>>>> Is my analysis correct? Is there some other option I've
>>>>>>>>overlooked?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Eliot
>>>>>>>>
>>>>>>>> --
>>>>>>>> Eliot Kimber
>>>>>>>> http://contrext.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>-------------------------------------------------------------------
>>>>>>>>-
>>>>>>>>--
>>>>>>>> --
>>>>>>>> --
>>>>>>>> ----
>>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>>> _______________________________________________
>>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>--------------------------------------------------------------------
>>>>>>>-
>>>>>>>--
>>>>>>> --
>>>>>>> -----
>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>> _______________________________________________
>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>-
>>>>>>--
>>>>>> --
>>>>>> ----
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>> _______________________________________________
>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----------------------------------------------------------------------
>>>>>-
>>>>>--
>>>>> -----
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>
>>>>
>>>>
>>>>
>>>>-----------------------------------------------------------------------
>>>>-
>>>>--
>>>> ----
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>
>>>
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>-
>>>-----
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>>
>>
>>-------------------------------------------------------------------------
>>-
>>----
>>Check out the vibrant tech community on one of the world's most
>>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>_______________________________________________
>>saxon-help mailing list archived at http://saxon.markmail.org/
>>[hidden email]
>>https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>
>
>
>--------------------------------------------------------------------------
>----
>Check out the vibrant tech community on one of the world's most
>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>_______________________________________________
>saxon-help mailing list archived at http://saxon.markmail.org/
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/saxon-help
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Eliot Kimber-2
I have successfully implemented a custom transformer factory that returns
an appropriately-configured Saxon TransformerFactory with my custom
CollationURIResolver which then can provide my custom Collator to xsl:sort
when it specifies my magic URI as the collator URI. Whew. I've verified
that this works with the Open Toolkit simply by changing the system
javax.xml.transform.TransformerFactory setting, e.g., in ANT_OPTS.

A little bit indirect but the result requires only one configuration
change to the normal OT setup (the libraries can be provided through a
plugin that extends the OT's classpath automatically).

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 






On 10/10/16, 2:10 PM, "Eliot Kimber" <[hidden email]> wrote:

>I think the easier thing is just to implement my own TransformerFactory
>that can then set whatever features I need. That simplifies configuration.
>
>Cheers,
>
>E.
>--
>Eliot Kimber
>http://contrext.com
>
>
>
>
>
>
>
>On 10/10/16, 1:59 PM, "Eliot Kimber" <[hidden email]> wrote:
>
>>Is it possible to set Saxon features (as for
>>TransformFactor.setFeature())
>>using system properties? I'm looking at various bits of documentation but
>>I'm not seeing anything that looks like it should work, so I'm thinking
>>this is not provided for. But maybe I missed the obvious?
>>
>>Thanks,
>>
>>Eliot
>>
>>--
>>Eliot Kimber
>>http://contrext.com
>>
>>
>>
>>
>>
>>
>>
>>On 10/10/16, 5:43 AM, "Michael Kay" <[hidden email]> wrote:
>>
>>>The trouble is that when you use the class=xxxx parameter in a Saxon
>>>collation URI, Saxon doesn't know what kind of object you are
>>>instantiating until it has been instantiated. I guess we could try and
>>>be
>>>smart by loading the class, then seeing whether it's a subclass of
>>>java.text.Collator, and if it is, instantiating it using the static
>>>factory method. But we don't currently do that. I think there's also a
>>>principle at stake, which is that collation URIs are not
>>>context-sensitive.
>>>
>>>(F&O section 5.3.1: "Note that some specifications use the term
>>>collation
>>>to refer to an algorithm that can be parameterized, but in this
>>>specification, each possible parameterization is considered to be a
>>>distinct collation.")
>>>
>>>Michael Kay
>>>Saxonica
>>>
>>>> On 10 Oct 2016, at 08:29, Eliot Kimber <[hidden email]> wrote:
>>>>
>>>> Yes, having a zero-argument constructor results in instantiation
>>>>success
>>>> (can I blame jet lag for fuzzy thinking--I've been doing this work
>>>>while
>>>> returning to the US from Europe).
>>>>
>>>> But if I'm understanding the documentation and what you say below, the
>>>> only way to provide the required locale information to the the
>>>>collator
>>>>is
>>>> through a custom collation URI resolver, is that correct?
>>>>
>>>> I guess what I was hoping for was a way that I could directly
>>>>instantiate
>>>> a Collator with the xml:lang value of the context element without
>>>>having
>>>> to do anything more than supply a collator URI that resolves to my
>>>> Collator implementation.
>>>>
>>>> As the API for instantiating Collator takes a Locale this seemed like
>>>>a
>>>> reasonable thing for Saxon to do. The locale is always known, either
>>>>the
>>>> effective value of xml:lang, if specified somewhere in the context's
>>>> ancestry, or Locale.getDefault() if xml:lang is not set at all.
>>>>
>>>> I will work out what I need to do to configure a collation URI
>>>>resolver
>>>>in
>>>> the context of the Open Toolkit.
>>>>
>>>> Cheers,
>>>>
>>>> Eliot
>>>>
>>>> --
>>>> Eliot Kimber
>>>> http://contrext.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 10/10/16, 3:34 AM, "Michael Kay" <[hidden email]> wrote:
>>>>
>>>>> OK thanks.
>>>>>
>>>>> If you provide a class name to a Saxon collation URI then it has to
>>>>>have
>>>>> a public zero-argument constructor. We don't set a Locale - that's up
>>>>>to
>>>>> you. I think in this scenario the best thing is to implement a simple
>>>>> Comparator which delegates (directly or indirectly) to a
>>>>> RuleBasedCollator.
>>>>>
>>>>> Michael Kay
>>>>> Saxonica
>>>>>
>>>>>> On 9 Oct 2016, at 16:21, Eliot Kimber <[hidden email]> wrote:
>>>>>>
>>>>>> Because this is in the context of the DITA Open Toolkit I don't have
>>>>>>the
>>>>>> ability to control the parser configuration, so I'm limited to
>>>>>>either
>>>>>> using a collation URI or using a normal URI-accessed Java extension
>>>>>> function.
>>>>>>
>>>>>> Here's what I'm currently trying to do:
>>>>>>
>>>>>> <xsl:apply-templates select="word">
>>>>>>           <xsl:sort
>>>>>>
>>>>>>
>>>>>>collation="http://saxon.sf.net/collation?class=org.ditacommunity.i18n
>>>>>>.
>>>>>>Z
>>>>>>hC
>>>>>> nA
>>>>>> wareCollator"/>
>>>>>>         </xsl:apply-templates>
>>>>>>
>>>>>> In an XSLT 2 transform.
>>>>>>
>>>>>> Where ZhCnAwareCollator implements java.util.Comparator and extends
>>>>>> java.text.Collator. The implementation is backed by an ICU
>>>>>> RuleBasedCollator so I can take advantage of the existing ICU usage
>>>>>>in
>>>>>> the
>>>>>> OT (basically reusing the code the constructs collator rules from
>>>>>> configuration files).
>>>>>>
>>>>>> I assumed Saxon was using the java.text.Collator.getInstance(Locale
>>>>>> desiredLocale) method to get an instance as that's how you provide
>>>>>>the
>>>>>> locale to use.
>>>>>>
>>>>>> If it's just calling Class.newInstance() then how it is it providing
>>>>>>the
>>>>>> locale?
>>>>>>
>>>>>>
>>>>>> I also need to do something that works both with Saxon 9.1 and 9.6+.
>>>>>>My
>>>>>> current tests have been with Saxon 9.6 as I'm running my initial
>>>>>>tests
>>>>>> through oXygen just to keep things simple.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Eliot
>>>>>> --
>>>>>> Eliot Kimber
>>>>>> http://contrext.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/8/16, 6:47 PM, "Michael Kay" <[hidden email]> wrote:
>>>>>>
>>>>>>> Saxon-EE supports UCA collation URIs as defined in the XPath 3.1
>>>>>>>and
>>>>>>> XSLT
>>>>>>> 3.0 specs by using the collation support in the ICU library. I've
>>>>>>>been
>>>>>>> fairly immersed in that over the last couple of weeks, as it
>>>>>>>happens,
>>>>>>> but
>>>>>>> I'm struggling to remember exactly what's available if you're using
>>>>>>> plain
>>>>>>> Saxon-HE. We probably need to change the packaging at some stage
>>>>>>> because
>>>>>>> UCA collation URIs are a mandatory feature in XPath 3.1, though we
>>>>>>>may
>>>>>>> continue to support them in HE only using what's in the JDK as
>>>>>>>distinct
>>>>>>> from using ICU.
>>>>>>>
>>>>>>> I'd be inclined to avoid using collation keys unless you really
>>>>>>>need
>>>>>>> them. According to ICU documentation, a direct sort using a
>>>>>>>collation
>>>>>>> is
>>>>>>> supposed to be much more efficient.
>>>>>>>
>>>>>>> It's not clear from your posts what you are doing to register the
>>>>>>> collation with Saxon. There are many different approaches as the
>>>>>>>design
>>>>>>> has evolved over time. There are two collation URI families
>>>>>>>recognized
>>>>>>> by
>>>>>>> Saxon: the UCA collations defined in XPath 3.1 (see the Functions
>>>>>>>and
>>>>>>> Operators spec), and the older Saxon collation URIs described here:
>>>>>>>
>>>>>>>
>>>>>>>http://www.saxonica.com/documentation/index.html#!extensibility/conf
>>>>>>>i
>>>>>>>g
>>>>>>>-e
>>>>>>> xt
>>>>>>> end/collation/implementing-collation
>>>>>>>
>>>>>>> There are also several ways of registering your own collation URIs,
>>>>>>> including Configuration.registerCollation(),
>>>>>>> Configuration.setCollationURIResolver(), and the <collation>
>>>>>>>element
>>>>>>>in
>>>>>>> the configuration file.
>>>>>>>
>>>>>>> So to answer the question, how is Saxon doing the class
>>>>>>>instantiation,
>>>>>>> we
>>>>>>> need to know rather more about what interfaces you are using. But
>>>>>>>the
>>>>>>> likely answer is that it's simply doing Class.newInstance(). [If
>>>>>>>you
>>>>>>> really need to, you can register an overload of DynamicLoader with
>>>>>>>the
>>>>>>> Configuration, and override the method DynamicLoader.getInstance()
>>>>>>>to
>>>>>>> use
>>>>>>> a different instantiation method]. I think the approach I would
>>>>>>> recommend, given your description of what you are trying to do, is
>>>>>>>to
>>>>>>> instantiate the RuleBasedCollator yourself, wrap it in an instance
>>>>>>>of
>>>>>>> net.sf.saxon.expr.sort.SimpleCollation (which implements
>>>>>>> net.sf.saxon.lib.StringCollator), and register the collation URI
>>>>>>>with
>>>>>>> Configuration.registerCollation().
>>>>>>>
>>>>>>> A collation registered as an instance of SimpleCollation probably
>>>>>>>can't
>>>>>>> be used in fn:contains() or other substring-matching functions, nor
>>>>>>>in
>>>>>>> fn:collation-key(). But it can be used for sorting, which seems to
>>>>>>>be
>>>>>>> your main use case, and for equality and ordering comparisons.
>>>>>>>
>>>>>>> Michael Kay
>>>>>>> Saxonica
>>>>>>>
>>>>>>>
>>>>>>>> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]>
>>>>>>>>wrote:
>>>>>>>>
>>>>>>>> I have created a simple extension of the ICU RuleBasedCollator and
>>>>>>>>my
>>>>>>>> local unit test verifies that it can be used as a Comparator and
>>>>>>>>for
>>>>>>>> getting sort keys.
>>>>>>>>
>>>>>>>> However, when I try to use it with Saxon 9.6I get "Failed to
>>>>>>>> instantiate
>>>>>>>> class org.ditacommunity.i18n.ZhCnAwareCollator".
>>>>>>>>
>>>>>>>> So I must be failing to implement the expected instantiation
>>>>>>>>method
>>>>>>>> but
>>>>>>>> I
>>>>>>>> can't figure out what that might be.
>>>>>>>>
>>>>>>>> Here is my passing unit test for ZhCnAwareCollator:
>>>>>>>>
>>>>>>>> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
>>>>>>>> assertNotNull("No comparator", collator != null);
>>>>>>>> int result;
>>>>>>>> result = collator.compare("a", "b");
>>>>>>>> assertTrue("Compared incorrectly", result == -1);
>>>>>>>> CollationKey sortKey = collator.getCollationKey("aaa");
>>>>>>>> CollationKey sortKeyC = collator.getCollationKey("c");
>>>>>>>> assertNotNull(sortKey);
>>>>>>>> result = sortKey.compareTo(sortKeyC);
>>>>>>>> assertEquals("Wrong compare result", result, -1);
>>>>>>>>
>>>>>>>>
>>>>>>>> How is Saxon doing the class instantiation? I know it's loading
>>>>>>>>the
>>>>>>>> class
>>>>>>>> because the load failed when I didn't have the ICU4J library in
>>>>>>>>the
>>>>>>>> class
>>>>>>>> path (my collator is backed by an ICU4J collator).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Eliot
>>>>>>>>
>>>>>>>> --
>>>>>>>> Eliot Kimber
>>>>>>>> http://contrext.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>>>>>>>>
>>>>>>>>> I'm starting the process of implementing localized grouping and
>>>>>>>>> sorting in
>>>>>>>>> the context of the DITA Open Toolkit, including implementation of
>>>>>>>>>a
>>>>>>>>> dictionary-based collator for Simplified Chinese (using the
>>>>>>>>> open-source
>>>>>>>>> CEDICT dictionary).
>>>>>>>>>
>>>>>>>>> I want to make sure that I'm taking the most appropriate
>>>>>>>>> approach--it's
>>>>>>>>> been more than a decade since I last implemented customized
>>>>>>>>>collation
>>>>>>>>> features for Saxon (that was back in the Saxon 6 days).
>>>>>>>>>
>>>>>>>>> I think there are two basic approaches I could take:
>>>>>>>>>
>>>>>>>>> 1. Implement a custom collator as a RuleBasedCollator and then
>>>>>>>>>use
>>>>>>>>> that
>>>>>>>>> with Saxon through a collation URI specified on xsl:sort and
>>>>>>>>>similar.
>>>>>>>>>
>>>>>>>>> 2. Implement a custom extension function that returns sort keys
>>>>>>>>>that
>>>>>>>>> will
>>>>>>>>> then collate correctly using the default Unicode collator (e.g.,
>>>>>>>>>for
>>>>>>>>> most
>>>>>>>>> languages the sort key would just return the input string but for
>>>>>>>>> Simplified Chinese, in particular, would return the pinyin
>>>>>>>>> transliteration
>>>>>>>>> as found in the dictionary).
>>>>>>>>>
>>>>>>>>> I think my best course of action is to implement a custom
>>>>>>>>>collator
>>>>>>>>>in
>>>>>>>>> Java
>>>>>>>>> and then use the Saxon 9.1 form of custom collator URI.
>>>>>>>>>
>>>>>>>>> Is my analysis correct? Is there some other option I've
>>>>>>>>>overlooked?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Eliot
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Eliot Kimber
>>>>>>>>> http://contrext.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>------------------------------------------------------------------
>>>>>>>>>-
>>>>>>>>>-
>>>>>>>>>--
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> ----
>>>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>>>> _______________________________________________
>>>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>-------------------------------------------------------------------
>>>>>>>>-
>>>>>>>>-
>>>>>>>>--
>>>>>>>> --
>>>>>>>> -----
>>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>>> _______________________________________________
>>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>--------------------------------------------------------------------
>>>>>>>-
>>>>>>>-
>>>>>>>--
>>>>>>> --
>>>>>>> ----
>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>> _______________________________________________
>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>-
>>>>>>-
>>>>>>--
>>>>>> -----
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>> _______________________________________________
>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----------------------------------------------------------------------
>>>>>-
>>>>>-
>>>>>--
>>>>> ----
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>-----------------------------------------------------------------------
>>>>-
>>>>-
>>>>-----
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>-
>>>-
>>>----
>>>Check out the vibrant tech community on one of the world's most
>>>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>_______________________________________________
>>>saxon-help mailing list archived at http://saxon.markmail.org/
>>>[hidden email]
>>>https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>
>>
>>
>>-------------------------------------------------------------------------
>>-
>>----
>>Check out the vibrant tech community on one of the world's most
>>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>_______________________________________________
>>saxon-help mailing list archived at http://saxon.markmail.org/
>>[hidden email]
>>https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>
>
>
>--------------------------------------------------------------------------
>----
>Check out the vibrant tech community on one of the world's most
>engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>_______________________________________________
>saxon-help mailing list archived at http://saxon.markmail.org/
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/saxon-help
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Michael Kay
In reply to this post by Eliot Kimber-2

> On 10 Oct 2016, at 18:59, Eliot Kimber <[hidden email]> wrote:
>
> Is it possible to set Saxon features (as for TransformFactor.setFeature())
> using system properties? I'm looking at various bits of documentation but
> I'm not seeing anything that looks like it should work, so I'm thinking
> this is not provided for. But maybe I missed the obvious?
>

No, that's not possible.

They can be set from the command line though, using --feature:value

Michael Kay
Saxonica



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need Guidance On Implementing Custom Collators For Saxon 9.1, 9.6+

Michael Kay
In reply to this post by Eliot Kimber-2
Yes, that's often a good way of doing it. I've done the same thing with Ant in the past.

Michael Kay
Saxonica

> On 10 Oct 2016, at 21:18, Eliot Kimber <[hidden email]> wrote:
>
> I have successfully implemented a custom transformer factory that returns
> an appropriately-configured Saxon TransformerFactory with my custom
> CollationURIResolver which then can provide my custom Collator to xsl:sort
> when it specifies my magic URI as the collator URI. Whew. I've verified
> that this works with the Open Toolkit simply by changing the system
> javax.xml.transform.TransformerFactory setting, e.g., in ANT_OPTS.
>
> A little bit indirect but the result requires only one configuration
> change to the normal OT setup (the libraries can be provided through a
> plugin that extends the OT's classpath automatically).
>
> Cheers,
>
> E.
>
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
>
>
>
> On 10/10/16, 2:10 PM, "Eliot Kimber" <[hidden email]> wrote:
>
>> I think the easier thing is just to implement my own TransformerFactory
>> that can then set whatever features I need. That simplifies configuration.
>>
>> Cheers,
>>
>> E.
>> --
>> Eliot Kimber
>> http://contrext.com
>>
>>
>>
>>
>>
>>
>>
>> On 10/10/16, 1:59 PM, "Eliot Kimber" <[hidden email]> wrote:
>>
>>> Is it possible to set Saxon features (as for
>>> TransformFactor.setFeature())
>>> using system properties? I'm looking at various bits of documentation but
>>> I'm not seeing anything that looks like it should work, so I'm thinking
>>> this is not provided for. But maybe I missed the obvious?
>>>
>>> Thanks,
>>>
>>> Eliot
>>>
>>> --
>>> Eliot Kimber
>>> http://contrext.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 10/10/16, 5:43 AM, "Michael Kay" <[hidden email]> wrote:
>>>
>>>> The trouble is that when you use the class=xxxx parameter in a Saxon
>>>> collation URI, Saxon doesn't know what kind of object you are
>>>> instantiating until it has been instantiated. I guess we could try and
>>>> be
>>>> smart by loading the class, then seeing whether it's a subclass of
>>>> java.text.Collator, and if it is, instantiating it using the static
>>>> factory method. But we don't currently do that. I think there's also a
>>>> principle at stake, which is that collation URIs are not
>>>> context-sensitive.
>>>>
>>>> (F&O section 5.3.1: "Note that some specifications use the term
>>>> collation
>>>> to refer to an algorithm that can be parameterized, but in this
>>>> specification, each possible parameterization is considered to be a
>>>> distinct collation.")
>>>>
>>>> Michael Kay
>>>> Saxonica
>>>>
>>>>> On 10 Oct 2016, at 08:29, Eliot Kimber <[hidden email]> wrote:
>>>>>
>>>>> Yes, having a zero-argument constructor results in instantiation
>>>>> success
>>>>> (can I blame jet lag for fuzzy thinking--I've been doing this work
>>>>> while
>>>>> returning to the US from Europe).
>>>>>
>>>>> But if I'm understanding the documentation and what you say below, the
>>>>> only way to provide the required locale information to the the
>>>>> collator
>>>>> is
>>>>> through a custom collation URI resolver, is that correct?
>>>>>
>>>>> I guess what I was hoping for was a way that I could directly
>>>>> instantiate
>>>>> a Collator with the xml:lang value of the context element without
>>>>> having
>>>>> to do anything more than supply a collator URI that resolves to my
>>>>> Collator implementation.
>>>>>
>>>>> As the API for instantiating Collator takes a Locale this seemed like
>>>>> a
>>>>> reasonable thing for Saxon to do. The locale is always known, either
>>>>> the
>>>>> effective value of xml:lang, if specified somewhere in the context's
>>>>> ancestry, or Locale.getDefault() if xml:lang is not set at all.
>>>>>
>>>>> I will work out what I need to do to configure a collation URI
>>>>> resolver
>>>>> in
>>>>> the context of the Open Toolkit.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Eliot
>>>>>
>>>>> --
>>>>> Eliot Kimber
>>>>> http://contrext.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/10/16, 3:34 AM, "Michael Kay" <[hidden email]> wrote:
>>>>>
>>>>>> OK thanks.
>>>>>>
>>>>>> If you provide a class name to a Saxon collation URI then it has to
>>>>>> have
>>>>>> a public zero-argument constructor. We don't set a Locale - that's up
>>>>>> to
>>>>>> you. I think in this scenario the best thing is to implement a simple
>>>>>> Comparator which delegates (directly or indirectly) to a
>>>>>> RuleBasedCollator.
>>>>>>
>>>>>> Michael Kay
>>>>>> Saxonica
>>>>>>
>>>>>>> On 9 Oct 2016, at 16:21, Eliot Kimber <[hidden email]> wrote:
>>>>>>>
>>>>>>> Because this is in the context of the DITA Open Toolkit I don't have
>>>>>>> the
>>>>>>> ability to control the parser configuration, so I'm limited to
>>>>>>> either
>>>>>>> using a collation URI or using a normal URI-accessed Java extension
>>>>>>> function.
>>>>>>>
>>>>>>> Here's what I'm currently trying to do:
>>>>>>>
>>>>>>> <xsl:apply-templates select="word">
>>>>>>>          <xsl:sort
>>>>>>>
>>>>>>>
>>>>>>> collation="http://saxon.sf.net/collation?class=org.ditacommunity.i18n
>>>>>>> .
>>>>>>> Z
>>>>>>> hC
>>>>>>> nA
>>>>>>> wareCollator"/>
>>>>>>>        </xsl:apply-templates>
>>>>>>>
>>>>>>> In an XSLT 2 transform.
>>>>>>>
>>>>>>> Where ZhCnAwareCollator implements java.util.Comparator and extends
>>>>>>> java.text.Collator. The implementation is backed by an ICU
>>>>>>> RuleBasedCollator so I can take advantage of the existing ICU usage
>>>>>>> in
>>>>>>> the
>>>>>>> OT (basically reusing the code the constructs collator rules from
>>>>>>> configuration files).
>>>>>>>
>>>>>>> I assumed Saxon was using the java.text.Collator.getInstance(Locale
>>>>>>> desiredLocale) method to get an instance as that's how you provide
>>>>>>> the
>>>>>>> locale to use.
>>>>>>>
>>>>>>> If it's just calling Class.newInstance() then how it is it providing
>>>>>>> the
>>>>>>> locale?
>>>>>>>
>>>>>>>
>>>>>>> I also need to do something that works both with Saxon 9.1 and 9.6+.
>>>>>>> My
>>>>>>> current tests have been with Saxon 9.6 as I'm running my initial
>>>>>>> tests
>>>>>>> through oXygen just to keep things simple.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Eliot
>>>>>>> --
>>>>>>> Eliot Kimber
>>>>>>> http://contrext.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/8/16, 6:47 PM, "Michael Kay" <[hidden email]> wrote:
>>>>>>>
>>>>>>>> Saxon-EE supports UCA collation URIs as defined in the XPath 3.1
>>>>>>>> and
>>>>>>>> XSLT
>>>>>>>> 3.0 specs by using the collation support in the ICU library. I've
>>>>>>>> been
>>>>>>>> fairly immersed in that over the last couple of weeks, as it
>>>>>>>> happens,
>>>>>>>> but
>>>>>>>> I'm struggling to remember exactly what's available if you're using
>>>>>>>> plain
>>>>>>>> Saxon-HE. We probably need to change the packaging at some stage
>>>>>>>> because
>>>>>>>> UCA collation URIs are a mandatory feature in XPath 3.1, though we
>>>>>>>> may
>>>>>>>> continue to support them in HE only using what's in the JDK as
>>>>>>>> distinct
>>>>>>>> from using ICU.
>>>>>>>>
>>>>>>>> I'd be inclined to avoid using collation keys unless you really
>>>>>>>> need
>>>>>>>> them. According to ICU documentation, a direct sort using a
>>>>>>>> collation
>>>>>>>> is
>>>>>>>> supposed to be much more efficient.
>>>>>>>>
>>>>>>>> It's not clear from your posts what you are doing to register the
>>>>>>>> collation with Saxon. There are many different approaches as the
>>>>>>>> design
>>>>>>>> has evolved over time. There are two collation URI families
>>>>>>>> recognized
>>>>>>>> by
>>>>>>>> Saxon: the UCA collations defined in XPath 3.1 (see the Functions
>>>>>>>> and
>>>>>>>> Operators spec), and the older Saxon collation URIs described here:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://www.saxonica.com/documentation/index.html#!extensibility/conf
>>>>>>>> i
>>>>>>>> g
>>>>>>>> -e
>>>>>>>> xt
>>>>>>>> end/collation/implementing-collation
>>>>>>>>
>>>>>>>> There are also several ways of registering your own collation URIs,
>>>>>>>> including Configuration.registerCollation(),
>>>>>>>> Configuration.setCollationURIResolver(), and the <collation>
>>>>>>>> element
>>>>>>>> in
>>>>>>>> the configuration file.
>>>>>>>>
>>>>>>>> So to answer the question, how is Saxon doing the class
>>>>>>>> instantiation,
>>>>>>>> we
>>>>>>>> need to know rather more about what interfaces you are using. But
>>>>>>>> the
>>>>>>>> likely answer is that it's simply doing Class.newInstance(). [If
>>>>>>>> you
>>>>>>>> really need to, you can register an overload of DynamicLoader with
>>>>>>>> the
>>>>>>>> Configuration, and override the method DynamicLoader.getInstance()
>>>>>>>> to
>>>>>>>> use
>>>>>>>> a different instantiation method]. I think the approach I would
>>>>>>>> recommend, given your description of what you are trying to do, is
>>>>>>>> to
>>>>>>>> instantiate the RuleBasedCollator yourself, wrap it in an instance
>>>>>>>> of
>>>>>>>> net.sf.saxon.expr.sort.SimpleCollation (which implements
>>>>>>>> net.sf.saxon.lib.StringCollator), and register the collation URI
>>>>>>>> with
>>>>>>>> Configuration.registerCollation().
>>>>>>>>
>>>>>>>> A collation registered as an instance of SimpleCollation probably
>>>>>>>> can't
>>>>>>>> be used in fn:contains() or other substring-matching functions, nor
>>>>>>>> in
>>>>>>>> fn:collation-key(). But it can be used for sorting, which seems to
>>>>>>>> be
>>>>>>>> your main use case, and for equality and ordering comparisons.
>>>>>>>>
>>>>>>>> Michael Kay
>>>>>>>> Saxonica
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 8 Oct 2016, at 22:03, Eliot Kimber <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I have created a simple extension of the ICU RuleBasedCollator and
>>>>>>>>> my
>>>>>>>>> local unit test verifies that it can be used as a Comparator and
>>>>>>>>> for
>>>>>>>>> getting sort keys.
>>>>>>>>>
>>>>>>>>> However, when I try to use it with Saxon 9.6I get "Failed to
>>>>>>>>> instantiate
>>>>>>>>> class org.ditacommunity.i18n.ZhCnAwareCollator".
>>>>>>>>>
>>>>>>>>> So I must be failing to implement the expected instantiation
>>>>>>>>> method
>>>>>>>>> but
>>>>>>>>> I
>>>>>>>>> can't figure out what that might be.
>>>>>>>>>
>>>>>>>>> Here is my passing unit test for ZhCnAwareCollator:
>>>>>>>>>
>>>>>>>>> Collator collator = ZhCnAwareCollator.getInstance(Locale.CHINESE);
>>>>>>>>> assertNotNull("No comparator", collator != null);
>>>>>>>>> int result;
>>>>>>>>> result = collator.compare("a", "b");
>>>>>>>>> assertTrue("Compared incorrectly", result == -1);
>>>>>>>>> CollationKey sortKey = collator.getCollationKey("aaa");
>>>>>>>>> CollationKey sortKeyC = collator.getCollationKey("c");
>>>>>>>>> assertNotNull(sortKey);
>>>>>>>>> result = sortKey.compareTo(sortKeyC);
>>>>>>>>> assertEquals("Wrong compare result", result, -1);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> How is Saxon doing the class instantiation? I know it's loading
>>>>>>>>> the
>>>>>>>>> class
>>>>>>>>> because the load failed when I didn't have the ICU4J library in
>>>>>>>>> the
>>>>>>>>> class
>>>>>>>>> path (my collator is backed by an ICU4J collator).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Eliot
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Eliot Kimber
>>>>>>>>> http://contrext.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/8/16, 2:19 AM, "Eliot Kimber" <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>> I'm starting the process of implementing localized grouping and
>>>>>>>>>> sorting in
>>>>>>>>>> the context of the DITA Open Toolkit, including implementation of
>>>>>>>>>> a
>>>>>>>>>> dictionary-based collator for Simplified Chinese (using the
>>>>>>>>>> open-source
>>>>>>>>>> CEDICT dictionary).
>>>>>>>>>>
>>>>>>>>>> I want to make sure that I'm taking the most appropriate
>>>>>>>>>> approach--it's
>>>>>>>>>> been more than a decade since I last implemented customized
>>>>>>>>>> collation
>>>>>>>>>> features for Saxon (that was back in the Saxon 6 days).
>>>>>>>>>>
>>>>>>>>>> I think there are two basic approaches I could take:
>>>>>>>>>>
>>>>>>>>>> 1. Implement a custom collator as a RuleBasedCollator and then
>>>>>>>>>> use
>>>>>>>>>> that
>>>>>>>>>> with Saxon through a collation URI specified on xsl:sort and
>>>>>>>>>> similar.
>>>>>>>>>>
>>>>>>>>>> 2. Implement a custom extension function that returns sort keys
>>>>>>>>>> that
>>>>>>>>>> will
>>>>>>>>>> then collate correctly using the default Unicode collator (e.g.,
>>>>>>>>>> for
>>>>>>>>>> most
>>>>>>>>>> languages the sort key would just return the input string but for
>>>>>>>>>> Simplified Chinese, in particular, would return the pinyin
>>>>>>>>>> transliteration
>>>>>>>>>> as found in the dictionary).
>>>>>>>>>>
>>>>>>>>>> I think my best course of action is to implement a custom
>>>>>>>>>> collator
>>>>>>>>>> in
>>>>>>>>>> Java
>>>>>>>>>> and then use the Saxon 9.1 form of custom collator URI.
>>>>>>>>>>
>>>>>>>>>> Is my analysis correct? Is there some other option I've
>>>>>>>>>> overlooked?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Eliot
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Eliot Kimber
>>>>>>>>>> http://contrext.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------
>>>>>>>>>> -
>>>>>>>>>> -
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> ----
>>>>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>>>>> _______________________________________________
>>>>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>>>>> [hidden email]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------------
>>>>>>>>> -
>>>>>>>>> -
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> -----
>>>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>>>> _______________________________________________
>>>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------
>>>>>>>> -
>>>>>>>> -
>>>>>>>> --
>>>>>>>> --
>>>>>>>> ----
>>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>>> _______________________________________________
>>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> -
>>>>>>> -
>>>>>>> --
>>>>>>> -----
>>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>>> _______________________________________________
>>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> -
>>>>>> -
>>>>>> --
>>>>>> ----
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>>> _______________________________________________
>>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -----
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> -
>>>> -
>>>> ----
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------------------
>>> -
>>> ----
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> saxon-help mailing list archived at http://saxon.markmail.org/
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>>
>>
>>
>>
>> --------------------------------------------------------------------------
>> ----
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> saxon-help mailing list archived at http://saxon.markmail.org/
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Loading...