Using Saxon to Update Files In Place?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Using Saxon to Update Files In Place?

Eliot Kimber-2
I need to update a large number of files in place, rather than copying to a new location.

I’m using the Saxon collection function to get the files to process, e.g.:

    <xsl:variable name="docs" as="document-node()*"
      select="for $x in collection(concat($sourceDir, '?select=*.xml;recurse=yes;on-error=ignore'))
                        return saxon:discard-document($x)"
    />

And then I iterate over the value of $docs to process each doc. This comes from working code that makes a copy of the original files.

But to update the files in place I think I would need to use java to rename each input document, process the renamed file, using the original name as the result URL, and then delete the renamed file (or not, if I want to keep them as a backup).

I think my question is: in a template like this:

    <xsl:for-each select="$docs">
     …
   </xsl:for-each>

Will the file have already been parsed, meaning I can safely rename it before rewriting it or do I need to write to a temp location and only then delete the original and rename the temp file to the original?

Or is there a simpler way to do an update-in-place?

This is using Saxon HE so I don’t have license to use XQuery update.

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

Michael Kay
Complex question but I think the simplest answer is that making any assumptions about the temporal sequence of events is very dangerous.

* The variable $docs won't be evaluated until it's needed

* It will then be evaluated incrementally, each item in the sequence being materialized on demand

* The collection() function is multi-threaded

My instinct would be write all the new files to a new directory and then rename the directory on final completion. Anything else is at your own risk.

Michael Kay
Saxonica


> On 31 Mar 2017, at 14:42, Eliot Kimber <[hidden email]> wrote:
>
> I need to update a large number of files in place, rather than copying to a new location.
>
> I’m using the Saxon collection function to get the files to process, e.g.:
>
>    <xsl:variable name="docs" as="document-node()*"
>      select="for $x in collection(concat($sourceDir, '?select=*.xml;recurse=yes;on-error=ignore'))
>                        return saxon:discard-document($x)"
>    />
>
> And then I iterate over the value of $docs to process each doc. This comes from working code that makes a copy of the original files.
>
> But to update the files in place I think I would need to use java to rename each input document, process the renamed file, using the original name as the result URL, and then delete the renamed file (or not, if I want to keep them as a backup).
>
> I think my question is: in a template like this:
>
>    <xsl:for-each select="$docs">
>     …
>   </xsl:for-each>
>
> Will the file have already been parsed, meaning I can safely rename it before rewriting it or do I need to write to a temp location and only then delete the original and rename the temp file to the original?
>
> Or is there a simpler way to do an update-in-place?
>
> This is using Saxon HE so I don’t have license to use XQuery update.
>
> Thanks,
>
> Eliot
>
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

Eliot Kimber-2
Yes, I was coming to the same conclusion. In this case I need to make a change that’s just a little bit harder than simple sed search and replace but not so involved that it requires XSLT but I already had a transform that operated on this same data set successfully (but by copying). Not worth the effort to build more involved infrastructure. In addition, the volume of data relative to the space available on the server at hand (not under my control) makes doing a fully copy potentially problematic.

Since XQuery allows for update in place it seems like it would be a useful Saxon extension to allow XSLT to do the same, although I can understand why that might not be either practical or desirable.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 3/31/17, 4:25 PM, "Michael Kay" <[hidden email]> wrote:

    Complex question but I think the simplest answer is that making any assumptions about the temporal sequence of events is very dangerous.
   
    * The variable $docs won't be evaluated until it's needed
   
    * It will then be evaluated incrementally, each item in the sequence being materialized on demand
   
    * The collection() function is multi-threaded
   
    My instinct would be write all the new files to a new directory and then rename the directory on final completion. Anything else is at your own risk.
   
    Michael Kay
    Saxonica
   
   
    > On 31 Mar 2017, at 14:42, Eliot Kimber <[hidden email]> wrote:
    >
    > I need to update a large number of files in place, rather than copying to a new location.
    >
    > I’m using the Saxon collection function to get the files to process, e.g.:
    >
    >    <xsl:variable name="docs" as="document-node()*"
    >      select="for $x in collection(concat($sourceDir, '?select=*.xml;recurse=yes;on-error=ignore'))
    >                        return saxon:discard-document($x)"
    >    />
    >
    > And then I iterate over the value of $docs to process each doc. This comes from working code that makes a copy of the original files.
    >
    > But to update the files in place I think I would need to use java to rename each input document, process the renamed file, using the original name as the result URL, and then delete the renamed file (or not, if I want to keep them as a backup).
    >
    > I think my question is: in a template like this:
    >
    >    <xsl:for-each select="$docs">
    >     …
    >   </xsl:for-each>
    >
    > Will the file have already been parsed, meaning I can safely rename it before rewriting it or do I need to write to a temp location and only then delete the original and rename the temp file to the original?
    >
    > Or is there a simpler way to do an update-in-place?
    >
    > This is using Saxon HE so I don’t have license to use XQuery update.
    >
    > Thanks,
    >
    > Eliot
    >
    > --
    > Eliot Kimber
    > http://contrext.com
    >
    >
    >
    >
    >
    > ------------------------------------------------------------------------------
    > Check out the vibrant tech community on one of the world's most
    > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
    > _______________________________________________
    > saxon-help mailing list archived at http://saxon.markmail.org/
    > [hidden email]
    > https://lists.sourceforge.net/lists/listinfo/saxon-help
   
   
    ------------------------------------------------------------------------------
    Check out the vibrant tech community on one of the world's most
    engaging tech sites, Slashdot.org! http://sdm.link/slashdot
    _______________________________________________
    saxon-help mailing list archived at http://saxon.markmail.org/
    [hidden email]
    https://lists.sourceforge.net/lists/listinfo/saxon-help 





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

Ihe Onwuka-2


On Tue, Apr 4, 2017 at 2:54 AM, Eliot Kimber <[hidden email]> wrote:
Yes, I was coming to the same conclusion. In this case I need to make a change that’s just a little bit harder than simple sed search and replace but not so involved that it requires XSLT

usually has me reaching for awk. 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

Michael Kay
In reply to this post by Eliot Kimber-2
>
> Since XQuery allows for update in place it seems like it would be a useful Saxon extension to allow XSLT to do the same, although I can understand why that might not be either practical or desirable.
>

Actually there are two parts to that which are almost completely independent of each other:

(a) in-situ update of the source tree

(b) writing the serialized result tree to the file where the source tree was read from

One could do either without doing the other, and (b) does not require any XSLT extensions, other than a relaxation to the rule that you can't write and read the same files within a transformation.

Michael Kay
Saxonica


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

Ron Wheeler
In reply to this post by Eliot Kimber-2
Would it not be easier to iterate over the files at the shell level
for each file
    rename it
    process it to original name
    delete input file

That would never have any problem understanding the sequence.

What operating system does this have to run under?

http://stackoverflow.com/questions/15567809/batch-extract-path-and-filename-from-a-variable 
is a model for Windows.
http://stackoverflow.com/questions/7119223/file-name-without-extension-in-bash-for-loop 
linux

Ron



On 04/04/2017 2:54 AM, Eliot Kimber wrote:

> Yes, I was coming to the same conclusion. In this case I need to make a change that’s just a little bit harder than simple sed search and replace but not so involved that it requires XSLT but I already had a transform that operated on this same data set successfully (but by copying). Not worth the effort to build more involved infrastructure. In addition, the volume of data relative to the space available on the server at hand (not under my control) makes doing a fully copy potentially problematic.
>
> Since XQuery allows for update in place it seems like it would be a useful Saxon extension to allow XSLT to do the same, although I can understand why that might not be either practical or desirable.
>
> Cheers,
>
> Eliot
>
> --
> Eliot Kimber
> http://contrext.com
>  
>
>
> On 3/31/17, 4:25 PM, "Michael Kay" <[hidden email]> wrote:
>
>      Complex question but I think the simplest answer is that making any assumptions about the temporal sequence of events is very dangerous.
>      
>      * The variable $docs won't be evaluated until it's needed
>      
>      * It will then be evaluated incrementally, each item in the sequence being materialized on demand
>      
>      * The collection() function is multi-threaded
>      
>      My instinct would be write all the new files to a new directory and then rename the directory on final completion. Anything else is at your own risk.
>      
>      Michael Kay
>      Saxonica
>      
>      
>      > On 31 Mar 2017, at 14:42, Eliot Kimber <[hidden email]> wrote:
>      >
>      > I need to update a large number of files in place, rather than copying to a new location.
>      >
>      > I’m using the Saxon collection function to get the files to process, e.g.:
>      >
>      >    <xsl:variable name="docs" as="document-node()*"
>      >      select="for $x in collection(concat($sourceDir, '?select=*.xml;recurse=yes;on-error=ignore'))
>      >                        return saxon:discard-document($x)"
>      >    />
>      >
>      > And then I iterate over the value of $docs to process each doc. This comes from working code that makes a copy of the original files.
>      >
>      > But to update the files in place I think I would need to use java to rename each input document, process the renamed file, using the original name as the result URL, and then delete the renamed file (or not, if I want to keep them as a backup).
>      >
>      > I think my question is: in a template like this:
>      >
>      >    <xsl:for-each select="$docs">
>      >     …
>      >   </xsl:for-each>
>      >
>      > Will the file have already been parsed, meaning I can safely rename it before rewriting it or do I need to write to a temp location and only then delete the original and rename the temp file to the original?
>      >
>      > Or is there a simpler way to do an update-in-place?
>      >
>      > This is using Saxon HE so I don’t have license to use XQuery update.
>      >
>      > Thanks,
>      >
>      > Eliot
>      >
>      > --
>      > Eliot Kimber
>      > http://contrext.com
>      >
>      >
>      >
>      >
>      >
>      > ------------------------------------------------------------------------------
>      > Check out the vibrant tech community on one of the world's most
>      > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>      > _______________________________________________
>      > saxon-help mailing list archived at http://saxon.markmail.org/
>      > [hidden email]
>      > https://lists.sourceforge.net/lists/listinfo/saxon-help
>      
>      
>      ------------------------------------------------------------------------------
>      Check out the vibrant tech community on one of the world's most
>      engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>      _______________________________________________
>      saxon-help mailing list archived at http://saxon.markmail.org/
>      [hidden email]
>      https://lists.sourceforge.net/lists/listinfo/saxon-help
>
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help


--
Ron Wheeler
President
Artifact Software Inc
email: [hidden email]
skype: ronaldmwheeler
phone: 866-970-2435, ext 102


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

Michael Kay

> On 4 Apr 2017, at 10:48, Ron Wheeler <[hidden email]> wrote:
>
> Would it not be easier to iterate over the files at the shell level
> for each file
>    rename it
>    process it to original name
>    delete input file
>

Iterating over files at the shell level and then transforming each one typically involves creating a new Java VM for each transformation, and also recompiling the stylesheet for each transformation, which typically increases the processing time from seconds to hours.

Michael Kay
Saxonica



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

cmarchand
Why don't you give a try to XProc, for this ?
If you have a single XSL, file renaming, it's easy... and I'm sure
Calabash keep the compiled XSL...

See http://exproc.org/proposed/steps/fileutils.html to rename a file,
and calabash support for EXproc :
http://xmlcalabash.com/docs/reference/extsteps.html

Best regards,
Christophe

Le 2017-04-04 12:53, Michael Kay a écrit :

>> On 4 Apr 2017, at 10:48, Ron Wheeler <[hidden email]>
>> wrote:
>>
>> Would it not be easier to iterate over the files at the shell level
>> for each file
>>    rename it
>>    process it to original name
>>    delete input file
>>
>
> Iterating over files at the shell level and then transforming each one
> typically involves creating a new Java VM for each transformation, and
> also recompiling the stylesheet for each transformation, which
> typically increases the processing time from seconds to hours.
>
> Michael Kay
> Saxonica
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

Lizzi, Vincent
In reply to this post by Eliot Kimber-2

Hi Eliot,

 

I hesitate to suggest a different product on the Saxon mailing list, but I have used VTD-XML (http://vtd-xml.sourceforge.net/) for making surgical edits to XML files. It might be worth a look.

 

Vincent

 

 

From: Eliot Kimber [mailto:[hidden email]]
Sent: Tuesday, April 04, 2017 2:54 AM
To: Mailing list for the SAXON XSLT and XQuery processor <[hidden email]>
Subject: Re: [saxon] Using Saxon to Update Files In Place?

 

Yes, I was coming to the same conclusion. In this case I need to make a change that’s just a little bit harder than simple sed search and replace but not so involved that it requires XSLT but I already had a transform that operated on this same data set successfully (but by copying). Not worth the effort to build more involved infrastructure. In addition, the volume of data relative to the space available on the server at hand (not under my control) makes doing a fully copy potentially problematic.

Since XQuery allows for update in place it seems like it would be a useful Saxon extension to allow XSLT to do the same, although I can understand why that might not be either practical or desirable.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com



On 3/31/17, 4:25 PM, "Michael Kay" <[hidden email]> wrote:

Complex question but I think the simplest answer is that making any assumptions about the temporal sequence of events is very dangerous.

* The variable $docs won't be evaluated until it's needed

* It will then be evaluated incrementally, each item in the sequence being materialized on demand

* The collection() function is multi-threaded

My instinct would be write all the new files to a new directory and then rename the directory on final completion. Anything else is at your own risk.

Michael Kay
Saxonica


> On 31 Mar 2017, at 14:42, Eliot Kimber <[hidden email]> wrote:
>
> I need to update a large number of files in place, rather than copying to a new location.
>
> I’m using the Saxon collection function to get the files to process, e.g.:
>
> <xsl:variable name="docs" as="document-node()*"
> select="for $x in collection(concat($sourceDir, '?select=*.xml;recurse=yes;on-error=ignore'))
> return saxon:discard-document($x)"
> />
>
> And then I iterate over the value of $docs to process each doc. This comes from working code that makes a copy of the original files.
>
> But to update the files in place I think I would need to use java to rename each input document, process the renamed file, using the original name as the result URL, and then delete the renamed file (or not, if I want to keep them as a backup).
>
> I think my question is: in a template like this:
>
> <xsl:for-each select="$docs">
> …
> </xsl:for-each>
>
> Will the file have already been parsed, meaning I can safely rename it before rewriting it or do I need to write to a temp location and only then delete the original and rename the temp file to the original?
>
> Or is there a simpler way to do an update-in-place?
>
> This is using Saxon HE so I don’t have license to use XQuery update.
>
> Thanks,
>
> Eliot
>
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

Ron Wheeler
In reply to this post by Michael Kay
Good point.

The relation between file size and number of files determines if this
makes a difference.
Thousands of small files takes a bigger performance hit than dozens of
multi-gigabyte files.

Does Saxon/Java garbage collection handle opening, processing and
closing a lot of files in a single session without eventually collapsing?

Ron

On 04/04/2017 6:53 AM, Michael Kay wrote:

>> On 4 Apr 2017, at 10:48, Ron Wheeler <[hidden email]> wrote:
>>
>> Would it not be easier to iterate over the files at the shell level
>> for each file
>>     rename it
>>     process it to original name
>>     delete input file
>>
> Iterating over files at the shell level and then transforming each one typically involves creating a new Java VM for each transformation, and also recompiling the stylesheet for each transformation, which typically increases the processing time from seconds to hours.
>
> Michael Kay
> Saxonica
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>


--
Ron Wheeler
President
Artifact Software Inc
email: [hidden email]
skype: ronaldmwheeler
phone: 866-970-2435, ext 102


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Saxon to Update Files In Place?

Eliot Kimber-2
In reply to this post by Ihe Onwuka-2

As it happens I worked out how to use find and sed to do what I needed since it could be done with a simple string replacement in this case. I had never actually used sed before, strange as that may sound.

 

But there are still likely cases where a fully transform would be necessary and for that it would be nice to be able to update in place if possible.

 

Cheers,

 

E.

 

--

Eliot Kimber

http://contrext.com

 

 

 

From: Ihe Onwuka <[hidden email]>
Reply-To: <[hidden email]>, Mailing list for the SAXON XSLT and XQuery processor <[hidden email]>
Date: Tuesday, April 4, 2017 at 9:04 AM
To: Mailing list for the SAXON XSLT and XQuery processor <[hidden email]>
Subject: Re: [saxon] Using Saxon to Update Files In Place?

 

 

 

On Tue, Apr 4, 2017 at 2:54 AM, Eliot Kimber <[hidden email]> wrote:

Yes, I was coming to the same conclusion. In this case I need to make a change that’s just a little bit harder than simple sed search and replace but not so involved that it requires XSLT

 

usually has me reaching for awk. 

------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ [hidden email] https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Loading...