CPE102 - How and why to override the equals method in Java...

Two kinds of object equality...

Often in Java programs you need to compare two objects to determine if they are equal or not.  It turns out there are two different kinds of equality one can determine about objects in Java, reference equality or logical equality.  To explore the differences between these two kinds of equality examine the following code fragment:

// Create three String objects.
String strA = new String("APPLES");
String strB = new String("APPLES");
String strC = new String("ORANGES");

// Create a String reference and assign an existing String's reference to it
// so that both references point to the same String object in memory.
String strD = strA;

// Print out the results of various equality checks
System.out.println(strA == strB);
System.out.println(strA == strC);
System.out.println(strA == strD);

The output of the code fragment above is:

false
false
true

Look carefully at the output of the first comparison - the result of comparing strA to strB (both of whose values are "APPLES") is false!  This is because the equality operator (==) compares the references (addresses in memory) of the two Strings as two different numbers - this is known as reference equality.  In this case both String objects have the exact same value, i.e., "APPLES" but they are in different physical memory locations so their references (addresses in memory) are different and the result is false.

The second comparison between strA and strC also returns false as one would expect when comparing "APPLES" and "ORANGES" but not because the characters in the two String objects are different.  Rather, it is because the references of the two different String objects are different addresses and, therefore, not numerically equal.  Very subtle - depending on the quality of your tests you may or may not notice this kind of logical error in your own code.

The third comparison compares the reference value strA to strD.  In this case strD was assigned the reference (memory address) of strA so both strA and strD are the same reference (memory location) value and the comparsion returns true.

Logical equality compares the data of the objects instead of the value of the references.  Examine the following logical equality checks using the same String object references from the example above:

System.out.println(strA.equals(strB));
System.out.println(strA.equals(strC));
System.out.println(strA.equals(strD));


The output from these comparisons is:

true
false
true

These are the outcomes one would typically expect and want.  Notice the use of the equals method instead of the == operator.  The String class overrides the equals method it inherited from the Object class and implemented logic to compare the two String objects character by character.

Why, you might ask, did the String class override the equals method inherited from the Object class?  Because the equals method inherited from Object performs reference equality!  Here is what the implementation of the equals method in Object looks like:

public boolean equals(Object other)
{
   return this == other;
}


The reason the equals method in the Object class does reference equality is because it does not know how to do anything else.   Remember, every class in Java is an Object (via inheritance).  For the Object class's equals method to work correctly for every class (already written or every to be written in the future) it would need knowledge of the future and need to be infinitely large since there are an infinite number of Java classes that can be written - clearly an impossible task!  So, Object's equals method does the best it can without trying too hard - strict reference equality.

IMPORTANT: So, if you want a class that you write to be able to perform logical equality you must override the equals method


How To Override the equals method inherited from the Object class...

When you override any method you must match the method name, return type, and parameter types exactly.  For the equals method, it must be this:

public boolean equals(Object other)
{
   // Logic here to be discuss below...
}

Notice that the parameter type is Object - it must be Object or you will have overloaded equals instead of overriding it.  The errors that can occur when you do this are subtle.  Your code will work correctly much of the time but fail some of the time.  This is due to polymorphism and runtime binding of methods.  The affect is that, depending on the type of the parameter being passed to equals, sometimes your equals method will execute and sometimes the one in Object (performing strict reference equality) will be execute which, you recall, performs reference equality not logical equality.

The next issue is that any referent can be null.  If you use a null reference you will get a NullPointerException - not good.  So, before you can use the parameter passed to equals you must verify that it is not null and, if it is, return false. This makes sense since something (this) cannot be equal to nothing (null).

public boolean equals(Object other)
{
   if (other == null)
   {
      return false;
   }

   // More logic here to be discuss below...
}

Once you determine that the reference is not null you need to check to see if one object (this) is equal to the object passed in (other).  Notice that the data type of other is Object - this means any kind of reference can be passed to your equals method.  If you assume the type of the parameter is correct (the same type as the class where you are overriding the equals method) it will work as long as no other code passes in anything else - NOT A GOOD ASSUMPTION - since the consequences, if you are wrong, is a ClassCastException.  You can count on your instructor to pass you things that are not expected intentionally and you can expect your peers in your later professional life to do it unintentionally with surprising frequency!  The best way to handle the type issue is to make sure the two object (this and other) are instances of the same class as follows:

public boolean equals(Object other)
{
   if (other == null)
   {
      return false;
   }

   if (this.getClass() != other.getClass())
   {
      return false;
   }

   // More logic here to be discuss below...
}

The getClass method (use above) is one of the methods of the Object class and is inherited by all classes from Object.  It returns a reference to the Class class for the class-type of the reference it is called on. There is only one Class class reference for a particular Class - this is known as a singleton.  All instances of the same Class share the same Class class reference.  If you call getClass on any two references of the same type you'll get the same Class class reference value.  This is one of the rare instance were reference equality (or inequality) is what you want so you do not need to call the equals method of the Class class to compare them (you may, it will work, but it is slightly less efficient and demonstrates lack of understanding).

Finally, after determining that the reference passed in (other) is not null and is the same as this, you typically check that all of the instance variables of each object are equal.  This is not a strict requirement - just typical.  When you write a class you decide what equality means and implement it as such. Remember that for primitive instance variable you may use the equality operators (== and !=) unless, of course, they are floats or doubles, in which case you have to decide what level of precision you want/need.  For instance variables that are object references you must use the equals method of their classes.  Arrays require you to loop through them and do the right thing or use methods of the Java Standard Library that implement loops and do the right thing.  For example, lets assume a class called Simple that has two instance variables, an int called x and a String called str.  Assuming that str could never be null (a class invariant you must maintain in the rest of the code belonging to the class) here is what a correct equals method might look like:

public boolean equals(Object other)
{
   if (other == null)
   {
      return false;
   }

   if (this.getClass() != other.getClass())
   {
      return false;
   }

   if (this.x != ((Simple)other).x)
   {
      return false;
   }

   if (!this.str.equals(((Simple)other).str))
   {
      return false;
   }

   return true;
}

Notice that the use of this in the example above in not necessary and included for clarity only.  Also notice that there are many ways to write the code above that are logically equivalent .  You may write your equals methods in any logically equivalent manner you wish.

Inheritance and the equals Method...

When overriding the equals method in classes making use of inheritance it is important to keep the super class and sub-class as loosely coupled as possible.  Loosely coupled classes make as few assumptions about each other as possible making the code more maintainable over time.  A secondary goal is to avoid duplication of code.  Let's take a look at an example involving three classes, A, B, and C as follows:

public class A
{
   // Class implementation not shown
}

public class B extends A
{
   // Class implementation not shown
}

public class C extends B
{
   // Class implementation not shown
}

As you can see, class A is a sub-class of Object by implicit extension, B is a sub-class of A by explicit extension, and C is a sub-class of B by explicit extension.  Now look at some code making use of some class C objects and checking to see if two objects are equal:

C c1, c2;
c1 = getSomeRandomCObject();
c2 = getSomeRandomCObject();

if (c1.equals(c2))
{
   // Do something, anything, it is not germane to the discussion
}

When class C overrides the equals method it must check that the object reference passes in is not null, that it is the same class (not a requirement, but typical), and that all the instance data is equal - including all superclass instance variables it inherited.  Assuming all C's superclasses maintain strict encapsulation (private instance variables) this presents a problem: How do you access the private instance variables of your superclasses?  One solution would be to use the accessor methods (get-methods) to obtain the data of the super classes - this is an example of tight coupling - a  bad idea!  First, this requires detailed knowledge of the super classes - the sub-class must know all instance data and all methods to access them. The downside of this approach is that it is more work and more likely to be wrong in the first place, and less likely to be maintained properly in the second place.  Imagine someone else wrote class A and you are simply extending it.  By calling the accessor methods of class A in class C's implementation of equals the Class C code is tightly coupled to Class A's implementation.  Now imagine that the person that wrote class A makes changes - how likely are they to realize that class C may be affected by the changes?  Even if they notice will they know how to modify Class C's implementation to work with the new class A?  As you can see, even a relatively simple change to Class A that should be isolated has now become more complicated and error prone.

How do you solve this issue?  You reuse your superclass's implementation of equals when you override equals in your class as follows:

public class C extends B
{
   // Class implementation not shown except for equals

   public boolean equals(Object other)
   {
      if (!super.equals(other))
      {
         return false;
      }

      // Rest of equals method here, i.e., check all class C instance variables for equality
   }
}

Notice the absence of the null-check and class-type check that was stressed earlier in this document.  If you do them here your code will be logically correct but you will be duplicating logic, wasting CPU cycles, and contributing to global-warming - all unnecessary evils.  To see why lets lets follow the flow-of-execution.  The super.equals(other) call in the first line of C's equals method calls its super class's implementation of equals, specifically class B's equals method, which should also call its super.equals in its first line as follows:

public class B extends A
{
   // Class implementation not shown except for equals

   public boolean equals(Object other)
   {
      if (!super.equals(other))
      {
         return false;
      }

      // Rest of equals method here, i.e., check all class B instance variables for equality
   }
}

This super.equals call in class B's equals method calls class A's equals method and class A's equals method looks like this:

public class A
{
   // Class implementation not shown except for equals

   public boolean equals(Object other)
   {
      if (other == null)
      {
         return false;
      }

      if (this.getClass() != other.getClass)
      {
         return false;
      }

      // Rest of equals method here, i.e., check all class B instance variables for equality
   }
}

Remember that class A's super class is, implcitly, Object.  Notice that A's equals does not call super.equals.  This is because Object's equals does reference equality and the entire reason we are overriding equals is to get logical equality instead of reference equality.  Also notice that if we do the null-check first in the most-super-class-prior-to-Object then none of its sub-classes needs to do it directly since it is done via the super.equals call and is done before the reference is used thereby avoiding any possibility of a NullPointerException.  Likewise, the class-type check can be done in the most-super-class-prior-to-Object and reused by the sub-classes via the super.equals call.  Now the classes are loosely coupled - meaning they can change their implementations without affecting the implementation of any sub-class or super class!