HashCode question

Since I’ve settled on EF4, I’ve gone through and reworked a lot of things, including my base entity. I dropped a lot of things from my old one based on comments here about domain equality, but have a question about hash codes. Below is my current rewrite. I am satisfied with everything as-is, but was wondering if there were any reason to do more with GetHashCode?

namespace Genesis.Infrastructure.Domain
{
    /// <summary>
    /// Entity base class.
    /// </summary>
    /// <typeparam name="T">Type of the Id prpoerty.</typeparam>
    public abstract class Entity<T>
    {
        /// <summary>
        /// The Id property, settable only by the orm.
        /// </summary>
        public T Id { get; protected set; }
        /// <summary>
        /// Determines if the entity is transient or not.
        /// </summary>
        public bool IsTransient { get { return Id.Equals(default(T)); } }
        /// <summary>
        /// Performs domain comparison on the supplied entity.
        /// </summary>
        /// <param name="obj">The entity to compare to.</param>
        /// <returns>bool</returns>
        public override bool Equals(object obj)
        {
            
            // return true if reference is the same
            if (ReferenceEquals(this, obj)) return true;
            // return true if the Id is the same
            if (Equals(this.Id, (obj as Entity<T>).Id)) return true;
            // resort to default
            return Equals(this, obj);
        
        }
        /// <summary>
        /// Returns the hash code for this entity.
        /// </summary>
        /// <returns>int</returns>
        public override int GetHashCode()
        {
            
            // resort to default
            return base.GetHashCode();
        
        }
        /// <summary>
        /// Compares two entities.
        /// </summary>
        /// <param name="primary">The left-side entity.</param>
        /// <param name="secondary">The right-side entity.</param>
        /// <returns>bool</returns>
        public static bool operator ==(Entity<T> primary, Entity<T> secondary)
        {
            // use our custom Equals method
            return primary.Equals(secondary);
        }
        /// <summary>
        /// Compares two entities.
        /// </summary>
        /// <param name="primary">The left-side entity.</param>
        /// <param name="secondary">The right-side entity.</param>
        /// <returns>bool</returns>
        public static bool operator !=(Entity<T> primary, Entity<T> secondary)
        {
            // use our custom Equals method
            return !primary.Equals(secondary);
        }
    }
}

When you compare a value object with a different value object you still have to iterate over each property to see whether they’re equal, so iterating over a cached list or each property is the same. You have a point for the the hash code though, storing a cache of it in the same way the Entity class does would be more efficient.

The best way I can answer that is with another code paste of some unit tests. I’ve just removed the inheritance from ‘Entity’ on the MockEntity (it’s now renamed to MockObject) and copied in the IsTransient and Id code - no equality overrides at all.

Nine of the tests failed, tests that passed with equality overrides. Whether those failures are important depends on the application, in this case they are important. Every test on non transient (persisted) objects that before passed now fails, because equality is determined by the object as a whole, rather than just it’s Id. Likewise with hashcode equality, entities whose Ids are the same but properties are not, no longer generate the same hashcode.

Infact every hashcode test that previously tested where the equality should result in true, now fails, as does every test on persisted objects that previously would have passed because of identical Id values also now fails.

http://codepaste.net/78errd

p.s. I make it my mission to stay away from the pc over the weekends, and it’s 10.30pm here now, so I probably won’t reply until Monday after this post

Ok. I looked at the code above and really liked the simplicity of it. It is very similar to what I had before I started removing things. The two main differences being:

  1. If both entities were transient, I checked only properties marked with signature attribute instead of treating it like a value object. Your way actually makes a lot more sense though.

  2. I used a propertycache class that cached properties on first call, and stored them internally by type, making future property queries less demanding on resources.

Overall though, here’s something to consider. Before I knew what DDD was, and that my “entities” should be more than just data bound classes, it never once occurred to me to override either equals or gethashcode, and things worked just fine. Typically, objects were short lived enough (usually no longer than the duration of a single mvc action) that it really wasn’t required. At least, I never saw any ill effect from not having used it at the time. So I am wondering now, given that I am devolving things back to that point in time, if these overrides actually benefit me or not. Creating a new entity, or fetching an existing one, changing a few properties, and saving, doesn’t leave much opportunity to make use of it. Or so it would seem. What are your thoughts on that?

One step at a time now…here are my current ValueObject and ValueObjectTests as they sit at the moment. I am happy with them so far. Please let me know if there is anything I missed in terms of code or test. All current tests pass, so unless I did miss something, this should be pretty solid.

http://codepaste.net/qu8nud - ValueObject.cs
http://codepaste.net/yfc4hw - ValueObjectTests.cs

I’ll post entity stuff once I verify that I want to stick with the above.

I found these in an old source folder, they might be of use (the comments maybe anyway). The tests pass so I assume I finished them.

ValueObject: http://codepaste.net/s3pnr4
ValueObjectTests: http://codepaste.net/7b6amf
Entity: http://codepaste.net/qfbrhp
EntityTests: http://codepaste.net/enkx8k

I’m certainly no expert on it, I only know what I’ve read and researched, so I’d read into it further for a better understanding.

I’m not 100% certain but I think any class that implements the IEqualityComparer interface will be using the getHashCode method for equality comparisons. If I remember right I think the only default collections are Hashtable and OrderedDictionary but there may be more, plus any custom collections that implement IEqualityComparer.

Unless you’re using those then I guess you don’t have to override it but it’s recommended if you’re overriding Equals. If you are overriding Equals then you’re likely comparing object values (fields, properties) rather than object references (object identity (not an Entity.Id)). If thats the case you should really be overring getHashCode as well to ensure consistency.

NP Matt, have a good weekend, and I’ll see you next week.

It seems the issue is split but related. One is the issue of equality, and the other that of generating a hash code. I suppose what I really need to know is: what is the importance of hash codes?

As far as equality goes, I was also using a Signature attribute at one point, but had decided that it really wasn’t doing all that much, since an entity really get’s it identity from it’s “id” and if it doesn’t have an id, it really doesn’t matter. If two transient entities have the same properties, I considered them different, or at least, not the same (hope that made sense).

For my value objects, I grab a cached version of GetProperties and compare.

That takes care of equality, but is there any reason at all why I should be doing something specific as opposed to just using the default for hash codes? What are they used for exactly?

I should probably clarify that because of what I said about the ‘Signature’ attribute approach.

I iterate through each property in the class that isn’t null or still at it’s default value and combine them for the object’s overall hash code. That makes sense in a value object, if the properties aren’t equal they can’t be the same thing, Address1(Flat 1, That Road, That City) is not the same as Address2(Flat 2, That Road, That City) but it is the same as Address3(Flat 1, That Road, That City).

If the addresses were an entity they’d have an Id. If they’re not transient (have been persisted/have an id) then just compare the Id value - if they’re equal then they’re the same thing. If they aren’t transient then effectively they’re a value object, an entity without an Id. In that case you want to compare the property values to see if they’re the same.

TransientAddress1(Flat 1, That Road, That City)
TransientAddress2(Flat 1, That Road, That City)

I suppose you could argue that they’re not the same, because they’re two different entity objects regardless of what properties they hold, but without being persisted I think they are equal, they contain the same values. You could persist one, in which case they’d no longer be equal because one would have an Id and one wouldn’t. You could then persist the other, they’d now contain the same values but are no longer equal because you’ve persisted them as seperate entities, with different Ids. It’s then down to your validation/constraint checks to determine whether a persisted address (as an entity) exists with the same property values and how you’d like to proceed.

If only comparing Ids I’d also first check whether the objects are of the same type before doing anything else. You could end up comparing the Ids of two entities that both have an Id of 100, but one could be a Recipe and one could be a House, in this case they’re clearly not the same thing despite having equal Ids.

In my value object’s getHashCode I aggregate over each not null property to generate the object hashcode for equality.

My entity then derives from the value object, overriding the equals and getHasHCode methods. If my entity is transient (yet to be persisted) I bubble down to the value object’s getHashCode which just compares the properties to see if they’re equal. If the two entities aren’t transient I ignore any property values and generate the hashcode from the entity id alone.

It’s been a while since I used/looked at it but I can upload it as a zip and pm you a link if you want to look at how I’ve done it. I would paste it here but it’s 150 lines of code between the two classes, plus unit tests.

The hash code in this case is important because your method of determining entity equality changes depending on it’s state. The hashcode for a transient entity depends on the values of it’s (important) fields, the hashcode of a persisted entity depends solely on it’s Id value.

If the object has been persisted and has an Id (is not transient) then the only determining factor in equality between two entities is their Id values, so the only hashcode for a persisted entity that matters is that of the Id. If you somehow had two objects with identical Ids but differing values and generated the hashcode over every field/property then they’ll never be equal, despite their Ids being the same, but one of those objects probably contains incorrect or stale data (unless there’s been some insertion error where two ids have clashed).

If the entity is transient it depends on your application circumstances or each individual field. Most entities won’t have specific unique fields and in that case you could just calculate the hashcode based on every field, if there are particular fields that determine unique instances just override the hashcode generation and generate it from those specific fields alone (signature attributes).

http://stackoverflow.com/questions/371328/why-is-it-important-to-override-gethashcode-when-equals-method-is-overriden-in-c

http://devlicio.us/blogs/billy_mccafferty/archive/2007/04/25/using-equals-gethashcode-effectively.aspx

Thanks Matt. I think I understand that a bit better now. But I still feel a bit lost. Maybe it’s the generational gap thing, who knows. Could you answer this one other question then?

  • What object or collection, other than hash set, utilize calls to GetHashCode?

Currently, I have not a single instance in my code where I personally call Equals OR GetHashCode to compare two entities, nor do I use HashSets. Is this kind of equality testing something I even need to bother with then?