Syntax of 'select' within 'collection'

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Syntax of 'select' within 'collection'

Kerry, Richard

 

Please can someone clarify the syntax for 'select' within a 'collection' in Saxon.

I am looking at http://www.saxonica.com/documentation/index.html#!sourcedocs/collections

 

It doesn't clearly say that the parameter value supplied *is* a regular expression, although implies that is how it is handled.  It actually goes straight into saying that the pattern *can* use 'glob' syntax, and states how '.', '*' and '?' are converted to RE elements, thus leading me to assume that fundamentally the value is treated as a regex. 

What would happen if I provided '.*', '\.' or '.?' - can it detect that the parameter is already a regex and not do the glob-to-regex conversion ?

 

It goes on to say that "special characters used in the URL may need to be escaped".  Does that mean that if one wishes to use such characters as literals they need to be escaped ?  Or does it mean that any usage at all (ie as RE special characters) needs them to be escaped ?

 

It refers to "Java regular expression rules".  I am not a Java programmer so I don't know where to find these rules documented (there are various different regex system around so I'd like to be sure I'm using the right syntax).  Could a link please be added pointing to the correct regex spec.

 

 

What I have at the moment is

<x:apply-templates select="collection( 'file:///Temp?select=al_acq_[0-9]{4}.cfg.xml' )" />

or

<x:apply-templates select="collection( 'file:///Temp?select=al_acq_\d{4}.cfg.xml' )" />

 

but when run that gives

Error at char 12 in x:apply-templates/@select on line nn column nn of process.xsl:
  FODC0004: Illegal character in query at index 32: file:///Temp?select=al_acq_[0-9]{4}.cfg.xml

or

Error at char 12 in x:apply-templates/@select on line nn column nn of process.xsl:
  FODC0004: Illegal character in query at index 27: file:///Temp?select=al_acq_\d{4}.cfg.xml

 

I hope someone can clarify this for me, and perhaps update the documentation page in due course.

 

Regards,

Richard.

 

 

 

PS. Apologies if this message comes through twice.  I sent it last Friday but haven’t received it back and it hasn’t appeared in the archives so may have got lost in the ether somewhere.

 

Richard Kerry

BNCS Engineer, SI SOL Telco & Media Vertical Practice

 

T: +44 (0)20 3618 2669

M: +44 (0)7812 325518

Lync: +44 (0) 20 3618 0778

Room G300, Stadium House, Wood Lane, London, W12 7TA

[hidden email]

 

 

 

 

Atos, Atos Consulting, Worldline and Canopy The Open Cloud Company are trading names used by the Atos group. The following trading entities are registered in England and Wales: Atos IT Services UK Limited (registered number 01245534), Atos Consulting Limited (registered number 04312380), Atos Worldline UK Limited (registered number 08514184) and Canopy The Open Cloud Company Limited (registration number 08011902). The registered office for each is at 4 Triton Square, Regent’s Place, London, NW1 3HG.The VAT No. for each is: GB232327983.

This e-mail and the documents attached are confidential and intended solely for the addressee, and may contain confidential or privileged information. If you receive this e-mail in error, you are not authorised to copy, disclose, use or retain it. Please notify the sender immediately and delete this email from your systems. As emails may be intercepted, amended or lost, they are not secure. Atos therefore can accept no liability for any errors or their content. Although Atos endeavours to maintain a virus-free network, we do not warrant that this transmission is virus-free and can accept no liability for any damages resulting from any virus transmitted. The risks are deemed to be accepted by everyone who communicates with Atos by email.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Syntax of 'select' within 'collection'

Michael Kay

On 21 Mar 2017, at 16:19, Kerry, Richard <[hidden email]> wrote:

 

Please can someone clarify the syntax for 'select' within a 'collection' in Saxon.

I am looking at http://www.saxonica.com/documentation/index.html#!sourcedocs/collections

 

It doesn't clearly say that the parameter value supplied *is* a regular expression, although implies that is how it is handled.  It actually goes straight into saying that the pattern *can* use 'glob' syntax, and states how '.', '*' and '?' are converted to RE elements, thus leading me to assume that fundamentally the value is treated as a regex. 

What would happen if I provided '.*', '\.' or '.?' - can it detect that the parameter is already a regex and not do the glob-to-regex conversion ?


It's a bit of a hybrid. It was intended to be "glob" syntax, but that's not very well defined, and some bits of the underlying regex implementation creep through.

I think the documentation is fairly clear that the value you supply is converted to a regular expression by converting any instances of '.', '*', and '?', and then treating the result as a regex. There's no way to avoid the glob-to-regex conversion. 

 

It goes on to say that "special characters used in the URL may need to be escaped".  Does that mean that if one wishes to use such characters as literals they need to be escaped ?  Or does it mean that any usage at all (ie as RE special characters) needs them to be escaped ?


It basically means that it must be a valid URI according to the URI rules, which disallow certain characters: for example space must be written as %20.

 

It refers to "Java regular expression rules".  I am not a Java programmer so I don't know where to find these rules documented (there are various different regex system around so I'd like to be sure I'm using the right syntax).  Could a link please be added pointing to the correct regex spec.

 



Googling for "java regular expression syntax" gets me to http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html which is where the rules can be found.

 

What I have at the moment is

<x:apply-templates select="collection( '<a href="file:///Temp?select=al_acq_[0-9]{4}.cfg.xml'" class="">file:///Temp?select=al_acq_[0-9]{4}.cfg.xml' )" />

or

<x:apply-templates select="collection( '<a href="file:///Temp?select=al_acq_\d{4}.cfg.xml'" class="">file:///Temp?select=al_acq_\d{4}.cfg.xml' )" />

 

but when run that gives

Error at char 12 in x:apply-templates/@select on line nn column nn of process.xsl:
  FODC0004: Illegal character in query at index 32: file:///Temp?select=al_acq_[0-9]{4}.cfg.xml

or

Error at char 12 in x:apply-templates/@select on line nn column nn of process.xsl:
  FODC0004: Illegal character in query at index 27: file:///Temp?select=al_acq_\d{4}.cfg.xml


The rules for what is allowed in the query part of a URI (or IRI) are in RFC 3987:
   iquery         = *( ipchar / iprivate / "/" / "?" )

   ipchar         = iunreserved / pct-encoded / sub-delims / ":" / "@"
iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar
    sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="
      iprivate = (unicode private use characters)

ucschar = (loosely, non-ASCII characters)

So curly braces and backslashes aren't allowed. (It seems to be allowing the square brackets, but I'm not sure why - perhaps it's using a different version of the spec).

 

I hope someone can clarify this for me, and perhaps update the documentation page in due course.

 

Regards,

Richard.

 

 

 

PS. Apologies if this message comes through twice.  I sent it last Friday but haven’t received it back and it hasn’t appeared in the archives so may have got lost in the ether somewhere.

 

Richard Kerry

BNCS Engineer, SI SOL Telco & Media Vertical Practice

 

T: +44 (0)20 3618 2669

M: +44 (0)7812 325518

Lync: +44 (0) 20 3618 0778

Room G300, Stadium House, Wood Lane, London, W12 7TA

[hidden email]

 

 

 

 

Atos, Atos Consulting, Worldline and Canopy The Open Cloud Company are trading names used by the Atos group. The following trading entities are registered in England and Wales: Atos IT Services UK Limited (registered number 01245534), Atos Consulting Limited (registered number 04312380), Atos Worldline UK Limited (registered number 08514184) and Canopy The Open Cloud Company Limited (registration number 08011902). The registered office for each is at 4 Triton Square, Regent’s Place, London, NW1 3HG.The VAT No. for each is: GB232327983.

This e-mail and the documents attached are confidential and intended solely for the addressee, and may contain confidential or privileged information. If you receive this e-mail in error, you are not authorised to copy, disclose, use or retain it. Please notify the sender immediately and delete this email from your systems. As emails may be intercepted, amended or lost, they are not secure. Atos therefore can accept no liability for any errors or their content. Although Atos endeavours to maintain a virus-free network, we do not warrant that this transmission is virus-free and can accept no liability for any damages resulting from any virus transmitted. The risks are deemed to be accepted by everyone who communicates with Atos by email.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Syntax of 'select' within 'collection'

Kerry, Richard

 

Michael,

Thanks for the help on this. That’s got it working.

 

>It's a bit of a hybrid. It was intended to be "glob" syntax, but that's not very well defined, and some bits of the underlying regex >implementation creep through.

[RK>] I got that working – for my use I didn’t have a problem with the glob conversions.


>I think the documentation is fairly clear that the value you supply is converted to a regular expression by converting any instances of '.', '*', and '?', and then treating the result as a regex. There's no way to avoid the glob-to-regex conversion. 

[RK>] I’d maybe say maybe it was only “fairly" clear, but it wasn’t explicit, which I’d have preferred, and had expected.  Perhaps something like "It uses Regular Expression syntax (link), with the following changes to make use of 'glob' syntax simpler."


It basically means that it must be a valid URI according to the URI rules, which disallow certain characters: for example space must be written as %20.

[RK>] I think that means write the RE as you want it, then call iri-to-uri on it, or it may not be accepted.  That's what I seem to have found.


Googling for "java regular expression syntax" gets me to 
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html which is where the rules can be found.

[RK>] I also found that using Google, but as I’m not a Java programmer and you are (I think), I thought you might be able to confirm (or not) that was the right reference.


So curly braces and backslashes aren't allowed. (It seems to be allowing the square brackets, but I'm not sure why - perhaps it's using a different version of the spec).

[RK>] An interesting exercise was to build my string, then print it out, then pass it through iri-to-uri and print the result.

That does seem to show that iri-to-uri (which I presume you didn’t write) preserves square brackets!

So either whoever wrote it got it wrong or the spec does allow them.


 <x:variable name="filter-re">file:///Temp?select=al_out_[0-9]{4}.cfg.xml</x:variable>

 <x:message><x:value-of select="$filter-re" /></x:message>

 <x:variable name="filter-re-uri" select="iri-to-uri($filter-re)" />

 <x:message><x:value-of select="$filter-re-uri" /></x:message>


gives

file:///Temp?select=al_out_[0-9]{4}.cfg.xml

file:///Temp?select=al_out_[0-9]%7B4%7D.cfg.xml

 


Next step is to use it to apply-templates on the resultant files.  That'll be a new thread if I can't get it to work.


 

Regards,

Richard.

 

 

 

PS.  Sorry, formatting's a bit weird.  Ongoing mailer issues......



Atos, Atos Consulting, Worldline and Canopy The Open Cloud Company are trading names used by the Atos group. The following trading entities are registered in England and Wales: Atos IT Services UK Limited (registered number 01245534), Atos Consulting Limited (registered number 04312380), Atos Worldline UK Limited (registered number 08514184) and Canopy The Open Cloud Company Limited (registration number 08011902). The registered office for each is at 4 Triton Square, Regent’s Place, London, NW1 3HG.The VAT No. for each is: GB232327983.

This e-mail and the documents attached are confidential and intended solely for the addressee, and may contain confidential or privileged information. If you receive this e-mail in error, you are not authorised to copy, disclose, use or retain it. Please notify the sender immediately and delete this email from your systems. As emails may be intercepted, amended or lost, they are not secure. Atos therefore can accept no liability for any errors or their content. Although Atos endeavours to maintain a virus-free network, we do not warrant that this transmission is virus-free and can accept no liability for any damages resulting from any virus transmitted. The risks are deemed to be accepted by everyone who communicates with Atos by email.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Syntax of 'select' within 'collection'

Michael Kay
I seem to remember many happy days researching the RFCs on this kind of question. Here's a flavour of what you find:


Which one can summarize by saying it's a mess. TimBL is often quoted as saying he got the "//" part of URI syntax wrong, but actually, the whole thing is a disaster.

Michael Kay
Saxonica


On 24 Mar 2017, at 14:39, Kerry, Richard <[hidden email]> wrote:

 

Michael,

Thanks for the help on this. That’s got it working.

 

>It's a bit of a hybrid. It was intended to be "glob" syntax, but that's not very well defined, and some bits of the underlying regex >implementation creep through.

[RK>] I got that working – for my use I didn’t have a problem with the glob conversions.


>I think the documentation is fairly clear that the value you supply is converted to a regular expression by converting any instances of '.', '*', and '?', and then treating the result as a regex. There's no way to avoid the glob-to-regex conversion. 

[RK>] I’d maybe say maybe it was only “fairly" clear, but it wasn’t explicit, which I’d have preferred, and had expected.  Perhaps something like "It uses Regular Expression syntax (link), with the following changes to make use of 'glob' syntax simpler."


It basically means that it must be a valid URI according to the URI rules, which disallow certain characters: for example space must be written as %20.

[RK>] I think that means write the RE as you want it, then call iri-to-uri on it, or it may not be accepted.  That's what I seem to have found.


Googling for "java regular expression syntax" gets me to 
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html which is where the rules can be found.

[RK>] I also found that using Google, but as I’m not a Java programmer and you are (I think), I thought you might be able to confirm (or not) that was the right reference.


So curly braces and backslashes aren't allowed. (It seems to be allowing the square brackets, but I'm not sure why - perhaps it's using a different version of the spec).

[RK>] An interesting exercise was to build my string, then print it out, then pass it through iri-to-uri and print the result.

That does seem to show that iri-to-uri (which I presume you didn’t write) preserves square brackets!

So either whoever wrote it got it wrong or the spec does allow them.


 <x:variable name="filter-re"><a href="file:///Temp?select=al_out_[0-9]{4}.cfg.xml&lt;/x:variable&gt;" class="">file:///Temp?select=al_out_[0-9]{4}.cfg.xml</x:variable>

 <x:message><x:value-of select="$filter-re" /></x:message>

 <x:variable name="filter-re-uri" select="iri-to-uri($filter-re)" />

 <x:message><x:value-of select="$filter-re-uri" /></x:message>


gives

<a href="file:///Temp?select=al_out_[0-9]{4}.cfg.xml" class="">file:///Temp?select=al_out_[0-9]{4}.cfg.xml

file:///Temp?select=al_out_[0-9]%7B4%7D.cfg.xml

 


Next step is to use it to apply-templates on the resultant files.  That'll be a new thread if I can't get it to work.


 

Regards,

Richard.

 
 
 

PS.  Sorry, formatting's a bit weird.  Ongoing mailer issues......



Atos, Atos Consulting, Worldline and Canopy The Open Cloud Company are trading names used by the Atos group. The following trading entities are registered in England and Wales: Atos IT Services UK Limited (registered number 01245534), Atos Consulting Limited (registered number 04312380), Atos Worldline UK Limited (registered number 08514184) and Canopy The Open Cloud Company Limited (registration number 08011902). The registered office for each is at 4 Triton Square, Regent’s Place, London, NW1 3HG.The VAT No. for each is: GB232327983.

This e-mail and the documents attached are confidential and intended solely for the addressee, and may contain confidential or privileged information. If you receive this e-mail in error, you are not authorised to copy, disclose, use or retain it. Please notify the sender immediately and delete this email from your systems. As emails may be intercepted, amended or lost, they are not secure. Atos therefore can accept no liability for any errors or their content. Although Atos endeavours to maintain a virus-free network, we do not warrant that this transmission is virus-free and can accept no liability for any damages resulting from any virus transmitted. The risks are deemed to be accepted by everyone who communicates with Atos by email.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Loading...