SXRE0001: Stack overflow (excessive recursion) during regular expression evaluation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SXRE0001: Stack overflow (excessive recursion) during regular expression evaluation

Eric van der Vlist
Hi,

In a text, I want to embed quote delimited substrings in <string>
elements and this is quite straightforward:

    <xsl:template match="/">
        <xsl:analyze-string select="$text" regex='"[^"]*"'>
            <xsl:matching-substring>
                <string>
                    <xsl:value-of select="."/>
                </string>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
 
Now, if I also want to support escaped quotes in the form \", the
regular expressions becomes : regex='"(\\"|[^"])*"'

The problem with that is that as soon as the substring becomes longer
than ~3000 characters I get the following error :


Engine name: Saxon-EE 9.5.1.7
Severity: fatal
Description: SXRE0001: Stack overflow (excessive recursion) during
regular expression evaluation
URL:
http://www.saxonica.com/html/documentation/javadoc/net/sf/saxon/trans/SaxonErrorCode.html#SXRE0001

(simple repro case attached).

Is that expected? normal? and what can I do to avoid these errors?

Thanks,

Eric

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 

repro.xsl (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: SXRE0001: Stack overflow (excessive recursion) during regular expression evaluation

Eric van der Vlist
My real world usage is more complicated than that (I am parsing JSON)
and this complexity gives me an easy workaround: I can replace \" by
\u0022 before applying the regular expression.

I am still interested by the answer, though!

Thanks,

Eric

Le lundi 30 mars 2015 à 20:54 +0200, Eric van der Vlist a écrit :

> Hi,
>
> In a text, I want to embed quote delimited substrings in <string>
> elements and this is quite straightforward:
>
>     <xsl:template match="/">
>         <xsl:analyze-string select="$text" regex='"[^"]*"'>
>             <xsl:matching-substring>
>                 <string>
>                     <xsl:value-of select="."/>
>                 </string>
>             </xsl:matching-substring>
>             <xsl:non-matching-substring>
>                 <xsl:value-of select="."/>
>             </xsl:non-matching-substring>
>         </xsl:analyze-string>
>     </xsl:template>
>  
> Now, if I also want to support escaped quotes in the form \", the
> regular expressions becomes : regex='"(\\"|[^"])*"'
>
> The problem with that is that as soon as the substring becomes longer
> than ~3000 characters I get the following error :
>
>
> Engine name: Saxon-EE 9.5.1.7
> Severity: fatal
> Description: SXRE0001: Stack overflow (excessive recursion) during
> regular expression evaluation
> URL:
> http://www.saxonica.com/html/documentation/javadoc/net/sf/saxon/trans/SaxonErrorCode.html#SXRE0001
>
> (simple repro case attached).
>
> Is that expected? normal? and what can I do to avoid these errors?
>
> Thanks,
>
> Eric
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ [hidden email] https://lists.sourceforge.net/lists/listinfo/saxon-help 



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: SXRE0001: Stack overflow (excessive recursion) during regular expression evaluation

Michael Kay
In reply to this post by Eric van der Vlist
In Saxon 9.5 we incorporated the Apache Jakarta regex engine, and we discovered that its backtracking algorithm sometimes led to very deep recursion (proportional to the length of the input string). We optimized it to eliminate this in some common cases, but not all. For 9.6 we did a significant rewrite to eliminate the problem.

Apart from moving to Saxon 9.6, the best way to avoid the problem is to avoid unnecessary ambiguities in the regex. With

> regex='"(\\"|[^"])*"'

there's an ambiguity (after reading a backslash you don't know which branch of the choice to take) and therefore a possibility of backtracking, and therefore a possibility of stack overflow. You could perhaps eliminate this by writing it as

regex='"(\\"?|[^"\\])*"'

but I'm not sure...

Michael Kay
Saxonica
[hidden email]
+44 (0) 118 946 5893




On 30 Mar 2015, at 19:54, Eric van der Vlist <[hidden email]> wrote:

> Hi,
>
> In a text, I want to embed quote delimited substrings in <string>
> elements and this is quite straightforward:
>
>    <xsl:template match="/">
>        <xsl:analyze-string select="$text" regex='"[^"]*"'>
>            <xsl:matching-substring>
>                <string>
>                    <xsl:value-of select="."/>
>                </string>
>            </xsl:matching-substring>
>            <xsl:non-matching-substring>
>                <xsl:value-of select="."/>
>            </xsl:non-matching-substring>
>        </xsl:analyze-string>
>    </xsl:template>
>
> Now, if I also want to support escaped quotes in the form \", the
> regular expressions becomes : regex='"(\\"|[^"])*"'
>
> The problem with that is that as soon as the substring becomes longer
> than ~3000 characters I get the following error :
>
>
> Engine name: Saxon-EE 9.5.1.7
> Severity: fatal
> Description: SXRE0001: Stack overflow (excessive recursion) during
> regular expression evaluation
> URL:
> http://www.saxonica.com/html/documentation/javadoc/net/sf/saxon/trans/SaxonErrorCode.html#SXRE0001
>
> (simple repro case attached).
>
> Is that expected? normal? and what can I do to avoid these errors?
>
> Thanks,
>
> Eric
> <repro.xsl>------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/_______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: SXRE0001: Stack overflow (excessive recursion) during regular expression evaluation

Michael Kay
In reply to this post by Eric van der Vlist

> My real world usage is more complicated than that (I am parsing JSON)

Is it wise to parse JSON using regular expressions? My usual rule of thumb is, if the BNF grammar is recursive, then you can't parse it using regular expressions.

But perhaps you are only tokenizing using regular expressions? I think that when you use regular expressions for tokenizing long strings, you need to be very careful to avoid ambiguous regular expressions that can result in a lot of backtracking. Backtracking is a big performance problem even if you don't hit the stack overflow issue.

Michael Kay
Saxonica


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: SXRE0001: Stack overflow (excessive recursion) during regular expression evaluation

Eric van der Vlist
In reply to this post by Michael Kay
Hi Michael,

Just to say that, no, removing the ambiguity doesn't fix the error.

Thanks for the explanation!

Eric

Le lundi 30 mars 2015 à 23:09 +0100, Michael Kay a écrit :

> In Saxon 9.5 we incorporated the Apache Jakarta regex engine, and we discovered that its backtracking algorithm sometimes led to very deep recursion (proportional to the length of the input string). We optimized it to eliminate this in some common cases, but not all. For 9.6 we did a significant rewrite to eliminate the problem.
>
> Apart from moving to Saxon 9.6, the best way to avoid the problem is to avoid unnecessary ambiguities in the regex. With
>
> > regex='"(\\"|[^"])*"'
>
> there's an ambiguity (after reading a backslash you don't know which branch of the choice to take) and therefore a possibility of backtracking, and therefore a possibility of stack overflow. You could perhaps eliminate this by writing it as
>
> regex='"(\\"?|[^"\\])*"'
>
> but I'm not sure...
>
> Michael Kay
> Saxonica
> [hidden email]
> +44 (0) 118 946 5893
>
>
>
>
> On 30 Mar 2015, at 19:54, Eric van der Vlist <[hidden email]> wrote:
>
> > Hi,
> >
> > In a text, I want to embed quote delimited substrings in <string>
> > elements and this is quite straightforward:
> >
> >    <xsl:template match="/">
> >        <xsl:analyze-string select="$text" regex='"[^"]*"'>
> >            <xsl:matching-substring>
> >                <string>
> >                    <xsl:value-of select="."/>
> >                </string>
> >            </xsl:matching-substring>
> >            <xsl:non-matching-substring>
> >                <xsl:value-of select="."/>
> >            </xsl:non-matching-substring>
> >        </xsl:analyze-string>
> >    </xsl:template>
> >
> > Now, if I also want to support escaped quotes in the form \", the
> > regular expressions becomes : regex='"(\\"|[^"])*"'
> >
> > The problem with that is that as soon as the substring becomes longer
> > than ~3000 characters I get the following error :
> >
> >
> > Engine name: Saxon-EE 9.5.1.7
> > Severity: fatal
> > Description: SXRE0001: Stack overflow (excessive recursion) during
> > regular expression evaluation
> > URL:
> > http://www.saxonica.com/html/documentation/javadoc/net/sf/saxon/trans/SaxonErrorCode.html#SXRE0001
> >
> > (simple repro case attached).
> >
> > Is that expected? normal? and what can I do to avoid these errors?
> >
> > Thanks,
> >
> > Eric
> > <repro.xsl>------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> > by Intel and developed in partnership with Slashdot Media, is your hub for all
> > things parallel software development, from weekly thought leadership blogs to
> > news, videos, case studies, tutorials and more. Take a look and join the
> > conversation now. http://goparallel.sourceforge.net/_______________________________________________
> > saxon-help mailing list archived at http://saxon.markmail.org/
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/saxon-help
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help 
>



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help 
Reply | Threaded
Open this post in threaded view
|

Re: SXRE0001: Stack overflow (excessive recursion) during regular expression evaluation

Eric van der Vlist
In reply to this post by Michael Kay
As you said, I am using regular expressions for tokenization and I'll
follow your advise and remove ambiguity!

Thanks,

Eric

Le lundi 30 mars 2015 à 23:13 +0100, Michael Kay a écrit :
> > My real world usage is more complicated than that (I am parsing JSON)
>
> Is it wise to parse JSON using regular expressions? My usual rule of thumb is, if the BNF grammar is recursive, then you can't parse it using regular expressions.
>
> But perhaps you are only tokenizing using regular expressions? I think that when you use regular expressions for tokenizing long strings, you need to be very careful to avoid ambiguous regular expressions that can result in a lot of backtracking. Backtracking is a big performance problem even if you don't hit the stack overflow issue.
>
> Michael Kay
> Saxonica



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help