Batching into files using streaming

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Batching into files using streaming

Mailing Lists Mail


Hello All,

I have this requirement :

I have to write an XSLT which will create files of a specific batch size. For example
For 44 college elements and For batch size 10, the XSLT will produce 5 files with a max of 10 colleges in each file… There are elements preceding and following <College> (country-info , Politics, sports, etc…)and these will have to be copied as is into each batch file

XML Sample

<?xml version="1.0" encoding="UTF-8"?>

<Country code="GB">

            <country-info>

                        <tourism/>

                        <population/>

                        <counties/>

            </country-info>

            <University name="Oxford University">

                        <Colleges>

                                    <College>All Souls College</College>

                                    <College>Balliol College</College>

                                    <College>Blackfriars</College>

                                    <College>Brasenose College</College>

                                    <College>Campion Hall</College>

                                    <College>Christ Church</College>

                                    <College>Corpus Christi College</College>

                                    <College>Exeter College</College>

                                    <College>Green Templeton College</College>

                                    <College>Harris Manchester College</College>

                                    <College>Hertford College</College>

                                    <College>Jesus College</College>

                                    <College>Keble College</College>

                                    <College>Kellogg College</College>

                                    <College>Lady Margaret Hall</College>

                                    <College>Linacre College</College>

                                    <College>Lincoln College</College>

                                    <College>Magdalen College</College>

                                    <College>Mansfield College</College>

                                    <College>Merton College</College>

                                    <College>New College</College>

                                    <College>Nuffield College</College>

                                    <College>Oriel College</College>

                                    <College>Pembroke College</College>

                                    <College>The Queen's College</College>

                                    <College>Regent's Park College</College>

                                    <College>St Anne's College</College>

                                    <College>St Antony's College</College>

                                    <College>St Benet's Hall</College>

                                    <College>St Catherine's College</College>

                                    <College>St Cross College</College>

                                    <College>St Edmund Hall</College>

                                    <College>St Hilda's College</College>

                                    <College>St Hugh's College</College>

                                    <College>St John's College</College>

                                    <College>St Peter's College</College>

                                    <College>St Stephen's House</College>

                                    <College>Somerville College</College>

                                    <College>Trinity College</College>

                                    <College>University College</College>

                                    <College>Wadham College</College>

                                    <College>Wolfson College</College>

                                    <College>Worcester College</College>

                                    <College>Wycliffe Hall</College>

                        </Colleges>

            </University>

            <politics/>

            <sports/>

            <airports/>

            <science-relegion/>

</Country>

 
..

 

I tried the following code but I realized I am breaking the Streaming rules…

 

<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"

            <xsl:mode name="batch" streamable="yes" on-no-match="shallow-copy”/>

            <xsl:output method="xml" indent="yes"/>

          
            <xsl:param name="fileHref" select="'file:///E:/stylesheets/TestBed/InputSource/University.xml'"/>

            <xsl:param name="BatchSize" select="10"/>

            <xsl:template match="/">

                        <xsl:stream href="{$fileHref}">

                                    <xsl:sequence>

                                                <xsl:for-each-group select="/Country/University/Colleges/College" group-adjacent="(position() -1) idiv $BatchSize">
<xsl:result-documenthref="file:///E:/stylesheets/TestBed/Result/CollegeBatch{position()}.xml">

                                                                        <xsl:stream href="{$fileHref}">

                                                                                    <xsl:sequence>

                                                                                                <xsl:apply-templates mode="batch">

                                                                                                           <xsl:with-param name="current-group" select="current-group()"tunnel="yes"/>

                                                                                                </xsl:apply-templates>

                                                                                    </xsl:sequence>

                                                                        </xsl:stream>

                                                            </xsl:result-document>

                                                </xsl:for-each-group>

                                    </xsl:sequence>

                        </xsl:stream>

            </xsl:template>

            <xsl:template match="*:Colleges" mode="batch">

                        <xsl:param name="current-group" tunnel="yes"/>

                        <BatchedColleges>

                                    <xsl:copy-of select="$current-group"/>

                        </BatchedColleges>

            </xsl:template>

</xsl:transform>

 

I tried to change for-each-group to <xsl:for-each-groupselect="/Country/University/Colleges/College/copy-of(.)" group-adjacent="(position() -1) idiv $BatchSize">

 

Which works but does not copy the right Colleges…or ends up completely messing up with the numbers. Where did I go wrong ?

 
Your help is appreciated.
DakTapaal


------------------------------------------------------------------------------

_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Batching into files using streaming

David Rudel
It seems to me that you can use xsl:iterate with streaming to do this.

The syntax is discussed here:
http://www.saxonica.com/html/documentation/sourcedocs/streaming/stream-with-iterate.html

So you could iterate with two counters: one to keep track of which
batch you are on for filenaming purposes and the other counter counts
from 1 to batch_size. When it hits batch_size, you call
<xsl:result-document> to send off a chunk of output and reset that
counter and iterate the other.

I think the more modern versions of saxon use snapshots for streaming,
so you could also recover ancestor information if you need to put that
into the output.



On Fri, Sep 16, 2016 at 11:48 AM, Mailing Lists Mail
<[hidden email]> wrote:

>
> Hello All,
>
> I have this requirement :
>
> I have to write an XSLT which will create files of a specific batch size.
> For example
> For 44 college elements and For batch size 10, the XSLT will produce 5 files
> with a max of 10 colleges in each file… There are elements preceding and
> following <College> (country-info , Politics, sports, etc…)and these will
> have to be copied as is into each batch file
>
> XML Sample
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <Country code="GB">
>
>             <country-info>
>
>                         <tourism/>
>
>                         <population/>
>
>                         <counties/>
>
>             </country-info>
>
>             <University name="Oxford University">
>
>                         <Colleges>
>
>                                     <College>All Souls College</College>
>
>                                     <College>Balliol College</College>
>
>                                     <College>Blackfriars</College>
>
>                                     <College>Brasenose College</College>
>
>                                     <College>Campion Hall</College>
>
>                                     <College>Christ Church</College>
>
>                                     <College>Corpus Christi
> College</College>
>
>                                     <College>Exeter College</College>
>
>                                     <College>Green Templeton
> College</College>
>
>                                     <College>Harris Manchester
> College</College>
>
>                                     <College>Hertford College</College>
>
>                                     <College>Jesus College</College>
>
>                                     <College>Keble College</College>
>
>                                     <College>Kellogg College</College>
>
>                                     <College>Lady Margaret Hall</College>
>
>                                     <College>Linacre College</College>
>
>                                     <College>Lincoln College</College>
>
>                                     <College>Magdalen College</College>
>
>                                     <College>Mansfield College</College>
>
>                                     <College>Merton College</College>
>
>                                     <College>New College</College>
>
>                                     <College>Nuffield College</College>
>
>                                     <College>Oriel College</College>
>
>                                     <College>Pembroke College</College>
>
>                                     <College>The Queen's College</College>
>
>                                     <College>Regent's Park College</College>
>
>                                     <College>St Anne's College</College>
>
>                                     <College>St Antony's College</College>
>
>                                     <College>St Benet's Hall</College>
>
>                                     <College>St Catherine's
> College</College>
>
>                                     <College>St Cross College</College>
>
>                                     <College>St Edmund Hall</College>
>
>                                     <College>St Hilda's College</College>
>
>                                     <College>St Hugh's College</College>
>
>                                     <College>St John's College</College>
>
>                                     <College>St Peter's College</College>
>
>                                     <College>St Stephen's House</College>
>
>                                     <College>Somerville College</College>
>
>                                     <College>Trinity College</College>
>
>                                     <College>University College</College>
>
>                                     <College>Wadham College</College>
>
>                                     <College>Wolfson College</College>
>
>                                     <College>Worcester College</College>
>
>                                     <College>Wycliffe Hall</College>
>
>                         </Colleges>
>
>             </University>
>
>             <politics/>
>
>             <sports/>
>
>             <airports/>
>
>             <science-relegion/>
>
> </Country>
>
>
> ..
>
>
>
> I tried the following code but I realized I am breaking the Streaming rules…
>
>
>
> <xsl:transform version="3.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
>             <xsl:mode name="batch" streamable="yes"
> on-no-match="shallow-copy”/>
>
>             <xsl:output method="xml" indent="yes"/>
>
>
>             <xsl:param name="fileHref"
> select="'file:///E:/stylesheets/TestBed/InputSource/University.xml'"/>
>
>             <xsl:param name="BatchSize" select="10"/>
>
>             <xsl:template match="/">
>
>                         <xsl:stream href="{$fileHref}">
>
>                                     <xsl:sequence>
>
>                                                 <xsl:for-each-group
> select="/Country/University/Colleges/College" group-adjacent="(position()
> -1) idiv $BatchSize">
> <xsl:result-documenthref="file:///E:/stylesheets/TestBed/Result/CollegeBatch{position()}.xml">
>
>
> <xsl:stream href="{$fileHref}">
>
>
> <xsl:sequence>
>
>
> <xsl:apply-templates mode="batch">
>
>
> <xsl:with-param name="current-group" select="current-group()"tunnel="yes"/>
>
>
> </xsl:apply-templates>
>
>
> </xsl:sequence>
>
>
> </xsl:stream>
>
>
> </xsl:result-document>
>
>                                                 </xsl:for-each-group>
>
>                                     </xsl:sequence>
>
>                         </xsl:stream>
>
>             </xsl:template>
>
>             <xsl:template match="*:Colleges" mode="batch">
>
>                         <xsl:param name="current-group" tunnel="yes"/>
>
>                         <BatchedColleges>
>
>                                     <xsl:copy-of select="$current-group"/>
>
>                         </BatchedColleges>
>
>             </xsl:template>
>
> </xsl:transform>
>
>
>
> I tried to change for-each-group to
> <xsl:for-each-groupselect="/Country/University/Colleges/College/copy-of(.)"
> group-adjacent="(position() -1) idiv $BatchSize">
>
>
>
> Which works but does not copy the right Colleges…or ends up completely
> messing up with the numbers. Where did I go wrong ?
>
>
> Your help is appreciated.
> DakTapaal
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help



--

"A false conclusion, once arrived at and widely accepted is not
dislodged easily, and the less it is understood, the more tenaciously
it is held." - Cantor's Law of Preservation of Ignorance.

------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Fwd: Batching into files using streaming

Mailing Lists Mail
In reply to this post by Mailing Lists Mail

Hopefully this is not a saxon issue ?

---------- Forwarded message ----------
From: "Mailing Lists Mail" <[hidden email]>
Date: Sep 16, 2016 2:48 PM
Subject: Batching into files using streaming
To: "Mailing list for the SAXON XSLT and XQuery processor" <[hidden email]>, <[hidden email]>
Cc:


Hello All,

I have this requirement :

I have to write an XSLT which will create files of a specific batch size. For example
For 44 college elements and For batch size 10, the XSLT will produce 5 files with a max of 10 colleges in each file… There are elements preceding and following <College> (country-info , Politics, sports, etc…)and these will have to be copied as is into each batch file

XML Sample

<?xml version="1.0" encoding="UTF-8"?>

<Country code="GB">

            <country-info>

                        <tourism/>

                        <population/>

                        <counties/>

            </country-info>

            <University name="Oxford University">

                        <Colleges>

                                    <College>All Souls College</College>

                                    <College>Balliol College</College>

                                    <College>Blackfriars</College>

                                    <College>Brasenose College</College>

                                    <College>Campion Hall</College>

                                    <College>Christ Church</College>

                                    <College>Corpus Christi College</College>

                                    <College>Exeter College</College>

                                    <College>Green Templeton College</College>

                                    <College>Harris Manchester College</College>

                                    <College>Hertford College</College>

                                    <College>Jesus College</College>

                                    <College>Keble College</College>

                                    <College>Kellogg College</College>

                                    <College>Lady Margaret Hall</College>

                                    <College>Linacre College</College>

                                    <College>Lincoln College</College>

                                    <College>Magdalen College</College>

                                    <College>Mansfield College</College>

                                    <College>Merton College</College>

                                    <College>New College</College>

                                    <College>Nuffield College</College>

                                    <College>Oriel College</College>

                                    <College>Pembroke College</College>

                                    <College>The Queen's College</College>

                                    <College>Regent's Park College</College>

                                    <College>St Anne's College</College>

                                    <College>St Antony's College</College>

                                    <College>St Benet's Hall</College>

                                    <College>St Catherine's College</College>

                                    <College>St Cross College</College>

                                    <College>St Edmund Hall</College>

                                    <College>St Hilda's College</College>

                                    <College>St Hugh's College</College>

                                    <College>St John's College</College>

                                    <College>St Peter's College</College>

                                    <College>St Stephen's House</College>

                                    <College>Somerville College</College>

                                    <College>Trinity College</College>

                                    <College>University College</College>

                                    <College>Wadham College</College>

                                    <College>Wolfson College</College>

                                    <College>Worcester College</College>

                                    <College>Wycliffe Hall</College>

                        </Colleges>

            </University>

            <politics/>

            <sports/>

            <airports/>

            <science-relegion/>

</Country>

 
..

 

I tried the following code but I realized I am breaking the Streaming rules…

 

<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"

            <xsl:mode name="batch" streamable="yes" on-no-match="shallow-copy”/>

            <xsl:output method="xml" indent="yes"/>

          
            <xsl:param name="fileHref" select="'file:///E:/stylesheets/TestBed/InputSource/University.xml'"/>

            <xsl:param name="BatchSize" select="10"/>

            <xsl:template match="/">

                        <xsl:stream href="{$fileHref}">

                                    <xsl:sequence>

                                                <xsl:for-each-group select="/Country/University/Colleges/College" group-adjacent="(position() -1) idiv $BatchSize">
<xsl:result-documenthref="file:///E:/stylesheets/TestBed/Result/CollegeBatch{position()}.xml">

                                                                        <xsl:stream href="{$fileHref}">

                                                                                    <xsl:sequence>

                                                                                                <xsl:apply-templates mode="batch">

                                                                                                           <xsl:with-param name="current-group" select="current-group()"tunnel="yes"/>

                                                                                                </xsl:apply-templates>

                                                                                    </xsl:sequence>

                                                                        </xsl:stream>

                                                            </xsl:result-document>

                                                </xsl:for-each-group>

                                    </xsl:sequence>

                        </xsl:stream>

            </xsl:template>

            <xsl:template match="*:Colleges" mode="batch">

                        <xsl:param name="current-group" tunnel="yes"/>

                        <BatchedColleges>

                                    <xsl:copy-of select="$current-group"/>

                        </BatchedColleges>

            </xsl:template>

</xsl:transform>

 

I tried to change for-each-group to <xsl:for-each-groupselect="/Country/University/Colleges/College/copy-of(.)" group-adjacent="(position() -1) idiv $BatchSize">

 

Which works but does not copy the right Colleges…or ends up completely messing up with the numbers. Where did I go wrong ?

 
Your help is appreciated.
DakTapaal


------------------------------------------------------------------------------

_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Batching into files using streaming

Michael Kay
In reply to this post by Mailing Lists Mail
I would expect to see something like:

> <xsl:stream href="{$fileHref}">
>
>      <xsl:for-each-group select="/Country/University/Colleges/College" group-adjacent="(position() -1) idiv $BatchSize">
>         <xsl:result-document href="file:///E:/stylesheets/TestBed/Result/CollegeBatch{position()}.xml">
>
>             <batch>
>                     <xsl:copy-of select="current-group()"/>
>
               </batch>
          </xsl:result-document>
     </xsl:for-each-group>
  </xsl:stream>


Michael Kay
Saxonica



------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Batching into files using streaming

Mailing Lists Mail

Yea . That will work but i wont get the other parts of the input tree .... with this code that you are proposing , i will only get the colleges batches. I want college batches embedded between the other elements in the tree..


On Sep 16, 2016 7:05 PM, "Michael Kay" <[hidden email]> wrote:
I would expect to see something like:

> <xsl:stream href="{$fileHref}">
>
>      <xsl:for-each-group select="/Country/University/Colleges/College" group-adjacent="(position() -1) idiv $BatchSize">
>         <xsl:result-document href="file:///E:/stylesheets/TestBed/Result/CollegeBatch{position()}.xml">
>
>             <batch>
>                     <xsl:copy-of select="current-group()"/>
>
               </batch>
          </xsl:result-document>
     </xsl:for-each-group>
  </xsl:stream>


Michael Kay
Saxonica



------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help

------------------------------------------------------------------------------

_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: Batching into files using streaming

cmarchand

So, this is a job for a XSL generator and a two pass process !

See this post of Michael :

http://markmail.org/search/?q=saxon+namespace-alias#query:saxon%20namespace-alias%20from%3A%22Michael%20Kay%22%20list%3Anet.sourceforge.lists.saxon-help+page:1+mid:bnwh7hjxe4irqm7v+state:results


Best,

Christophe


Le 17/09/2016 à 03:11, Mailing Lists Mail a écrit :

Yea . That will work but i wont get the other parts of the input tree .... with this code that you are proposing , i will only get the colleges batches. I want college batches embedded between the other elements in the tree..


On Sep 16, 2016 7:05 PM, "Michael Kay" <[hidden email]> wrote:
I would expect to see something like:

> <xsl:stream href="{$fileHref}">
>
>      <xsl:for-each-group select="/Country/University/Colleges/College" group-adjacent="(position() -1) idiv $BatchSize">
>         <xsl:result-document href="file:///E:/stylesheets/TestBed/Result/CollegeBatch{position()}.xml">
>
>             <batch>
>                     <xsl:copy-of select="current-group()"/>
>
               </batch>
          </xsl:result-document>
     </xsl:for-each-group>
  </xsl:stream>


Michael Kay
Saxonica



------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------


_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 


------------------------------------------------------------------------------

_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help