Hashing NodeInfos

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Hashing NodeInfos

Wolfgang Hoschek-2
Hi Mike,

I'd like to improve the efficiency of the Nux DefaultResultSequence  
on dealing with non-XOM nodes. To fully retain identities for XOM  
nodes converted from non-XOM NodeInfos (mostly tinytree nodes  
stemming from XQuery node constructors), it maintains a HashMap
(NodeInfo --> XOM Node). The NodeInfos can come from more than one  
document. Since NodeInfo has no useful equals() and hashCode() the  
keys into the map use isSameNode() and generateId().hashCode()  
instead, as shown below. The problem is that generateId() is a very  
expensive method, involving multiple String concatenations. It shows  
up as a significant hotspot. Is there a way to compute a faster  
hashCode? For example, would it help to make the sequence number in  
tinytrees publicly readable? Any other suggestions welcome :-)

Thanks,
Wolfgang.

     /** A key in the "alreadyConverted" map */
     private static final class NodeInfoIdentityKey {
         private final NodeInfo key;

         public NodeInfoIdentityKey(NodeInfo key) {
             this.key = key;
         }

         public boolean equals(Object other) {
             if (other instanceof NodeInfoIdentityKey) {
                 return this.key.isSameNodeInfo(((NodeInfoIdentityKey)
other).key);
             }
             return false;
         }

         public int hashCode() {
             return this.key.generateId().hashCode();
//            key.getDocumentNumber(), key.getNodeKind(),  
key.tree.nodeNr;
         }
     }



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: Hashing NodeInfos

Michael Kay
Sorry not to respond to this sooner. I've been pondering it.

It would make sense in many ways to require NodeInfo to implement equals()
and hashCode() methods that test for node identity (so equals() becomes a
synonym of isSameNodeInfo()). Probably the main reason this wasn't done
originally was that NodeInfo also implemented DOM interfaces. I'm a little
hesitant to do it now because I know there are quite a few third-party
implementations of NodeInfo around. But I think this is the right way to go.
I'll just do it with a warning that there may be implementations of NodeInfo
that don't behave correctly.

Michael Kay
http://www.saxonica.com/


> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> Wolfgang Hoschek
> Sent: 18 November 2005 16:32
> To: [hidden email]
> Subject: [saxon] Hashing NodeInfos
>
> Hi Mike,
>
> I'd like to improve the efficiency of the Nux DefaultResultSequence  
> on dealing with non-XOM nodes. To fully retain identities for XOM  
> nodes converted from non-XOM NodeInfos (mostly tinytree nodes  
> stemming from XQuery node constructors), it maintains a HashMap
> (NodeInfo --> XOM Node). The NodeInfos can come from more than one  
> document. Since NodeInfo has no useful equals() and hashCode() the  
> keys into the map use isSameNode() and generateId().hashCode()  
> instead, as shown below. The problem is that generateId() is a very  
> expensive method, involving multiple String concatenations. It shows  
> up as a significant hotspot. Is there a way to compute a faster  
> hashCode? For example, would it help to make the sequence number in  
> tinytrees publicly readable? Any other suggestions welcome :-)
>
> Thanks,
> Wolfgang.
>
>      /** A key in the "alreadyConverted" map */
>      private static final class NodeInfoIdentityKey {
>          private final NodeInfo key;
>
>          public NodeInfoIdentityKey(NodeInfo key) {
>              this.key = key;
>          }
>
>          public boolean equals(Object other) {
>              if (other instanceof NodeInfoIdentityKey) {
>                  return
> this.key.isSameNodeInfo(((NodeInfoIdentityKey)
> other).key);
>              }
>              return false;
>          }
>
>          public int hashCode() {
>              return this.key.generateId().hashCode();
> //            key.getDocumentNumber(), key.getNodeKind(),  
> key.tree.nodeNr;
>          }
>      }
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
> Register for a JBoss Training Course.  Free Certification Exam
> for All Training Attendees Through End of 2005. For more info visit:
> http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: Hashing NodeInfos

Wolfgang Hoschek-2
Thanks! Make sense to me.

How would the implementation of hashCode for tiny tree nodes look  
like? Just some cheap integer manipulation on the documentId and  
sequence number (preferable), or the heavy string generation implied  
by generateID()? HashMaps require that equals(a,b) always implies  
hashCode(a) = hashCode(b).

Wolfgang.


On Nov 24, 2005, at 1:30 AM, Michael Kay wrote:

> Sorry not to respond to this sooner. I've been pondering it.
>
> It would make sense in many ways to require NodeInfo to implement  
> equals()
> and hashCode() methods that test for node identity (so equals()  
> becomes a
> synonym of isSameNodeInfo()). Probably the main reason this wasn't  
> done
> originally was that NodeInfo also implemented DOM interfaces. I'm a  
> little
> hesitant to do it now because I know there are quite a few third-party
> implementations of NodeInfo around. But I think this is the right  
> way to go.
> I'll just do it with a warning that there may be implementations of  
> NodeInfo
> that don't behave correctly.
>
> Michael Kay
> http://www.saxonica.com/
>
>
>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of
>> Wolfgang Hoschek
>> Sent: 18 November 2005 16:32
>> To: [hidden email]
>> Subject: [saxon] Hashing NodeInfos
>>
>> Hi Mike,
>>
>> I'd like to improve the efficiency of the Nux DefaultResultSequence
>> on dealing with non-XOM nodes. To fully retain identities for XOM
>> nodes converted from non-XOM NodeInfos (mostly tinytree nodes
>> stemming from XQuery node constructors), it maintains a HashMap
>> (NodeInfo --> XOM Node). The NodeInfos can come from more than one
>> document. Since NodeInfo has no useful equals() and hashCode() the
>> keys into the map use isSameNode() and generateId().hashCode()
>> instead, as shown below. The problem is that generateId() is a very
>> expensive method, involving multiple String concatenations. It shows
>> up as a significant hotspot. Is there a way to compute a faster
>> hashCode? For example, would it help to make the sequence number in
>> tinytrees publicly readable? Any other suggestions welcome :-)
>>
>> Thanks,
>> Wolfgang.
>>
>>      /** A key in the "alreadyConverted" map */
>>      private static final class NodeInfoIdentityKey {
>>          private final NodeInfo key;
>>
>>          public NodeInfoIdentityKey(NodeInfo key) {
>>              this.key = key;
>>          }
>>
>>          public boolean equals(Object other) {
>>              if (other instanceof NodeInfoIdentityKey) {
>>                  return
>> this.key.isSameNodeInfo(((NodeInfoIdentityKey)
>> other).key);
>>              }
>>              return false;
>>          }
>>
>>          public int hashCode() {
>>              return this.key.generateId().hashCode();
>> //            key.getDocumentNumber(), key.getNodeKind(),
>> key.tree.nodeNr;
>>          }
>>      }
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
>> Register for a JBoss Training Course.  Free Certification Exam
>> for All Training Attendees Through End of 2005. For more info visit:
>> http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
>> _______________________________________________
>> saxon-help mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/saxon-help
>>
>>
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through  
> log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD  
> SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

RE: Hashing NodeInfos

Michael Kay
> Just some cheap integer manipulation on the documentId and  
> sequence number (preferable)

Yes indeed. Haven't thought of the details, probably something like
((documentId & 1024)<<22) ^ nodeNr.

Michael Kay
http://www.saxonica.com/




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help
Reply | Threaded
Open this post in threaded view
|

Re: Hashing NodeInfos

Wolfgang Hoschek-2
Sounds good :-)
Wolfgang.

On Nov 24, 2005, at 2:31 PM, Michael Kay wrote:

>> Just some cheap integer manipulation on the documentId and
>> sequence number (preferable)
>>
>
> Yes indeed. Haven't thought of the details, probably something like
> ((documentId & 1024)<<22) ^ nodeNr.
>
> Michael Kay
> http://www.saxonica.com/
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through  
> log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD  
> SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> saxon-help mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/saxon-help
>



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
saxon-help mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/saxon-help