CPE102 - How and why to override the equals method in Java...
Two kinds of object equality...
Often in Java programs you need to
compare two objects to determine if they are equal or not. It
turns out there are two different kinds of equality one can determine
about objects in Java, reference equality or logical equality. To explore the differences between these two kinds of equality examine the following code fragment:
// Create three String objects.
String strA = new String("APPLES");
String strB = new String("APPLES");
String strC = new String("ORANGES");
// Create a String reference and assign an existing String's reference to it
// so that both references point to the same String object in memory.
String strD = strA;
// Print out the results of various equality checks
System.out.println(strA == strB);
System.out.println(strA == strC);
System.out.println(strA == strD);
The output of the code fragment above is:
false
false
true
Look carefully at the output of the first comparison - the result of
comparing strA to strB (both of whose values are "APPLES") is false!
This is because the equality operator (==) compares the
references (addresses in memory) of the two Strings as two different
numbers - this is known as reference equality.
In this case both String objects have the exact same value, i.e.,
"APPLES" but they are in different physical memory locations so their
references (addresses in memory) are different and the result is false.
The second comparison between strA and strC also returns false as one
would expect when comparing "APPLES" and "ORANGES" but not because the
characters in the two String objects are different. Rather, it is
because the references of the two different String objects are
different addresses and, therefore, not numerically equal. Very
subtle - depending on the quality of your tests you may or may not
notice this kind of logical error in your own code.
The third comparison compares the reference value strA to strD.
In this case strD was assigned the reference (memory address) of
strA so both strA and strD are the same reference (memory location)
value and the comparsion returns true.
Logical equality
compares the data of the objects instead of the value of the
references. Examine the following logical equality checks using
the same String object references from the example above:
System.out.println(strA.equals(strB));
System.out.println(strA.equals(strC));
System.out.println(strA.equals(strD));
The output from these comparisons is:
true
false
true
These are the outcomes one would typically expect and want. Notice the use of the equals method instead of the == operator.
The String class overrides the equals method it inherited from
the Object class and implemented logic to compare the two String
objects character by character.
Why, you might ask, did the String class override the equals method
inherited from the Object class? Because the equals method
inherited from Object performs reference equality! Here is what the implementation of the equals method in Object looks like:
public boolean equals(Object other)
{
return this == other;
}
The reason the equals method in the Object class does reference
equality is because it does not know how to do anything else.
Remember, every class in Java is an Object (via inheritance). For
the Object class's equals method to work correctly for every class
(already written or every to be written in the future) it would need
knowledge of the future and need to be infinitely large since there are
an infinite number of Java classes that can be written - clearly an
impossible task! So, Object's equals method does the best it can
without trying too hard - strict reference equality.
IMPORTANT: So, if you want a class that you write to be able to perform logical equality you must override the equals method
How To Override the equals method inherited from the Object class...
When you override any method you must match the method name, return type, and parameter types
exactly. For the equals method, it must be this:
public boolean equals(Object other)
{
// Logic here to be discuss below...
}
Notice that the parameter type is Object -
it must be Object or you will have
overloaded equals instead of
overriding
it. The errors that can occur when you do this are subtle.
Your code will work correctly much of the time but fail some of
the time. This is due to polymorphism and runtime binding of
methods. The affect is that, depending on the type of the
parameter being passed to equals, sometimes your equals method
will execute and sometimes the one in Object (performing strict
reference equality) will be execute which, you recall, performs
reference equality not logical equality.
The next issue is that any referent can be null. If you use a
null reference you will get a NullPointerException - not good.
So, before you can use the parameter passed to equals you must
verify that it is not null and, if it is, return false. This makes
sense since something (
this) cannot be equal to nothing (null).
public boolean equals(Object other)
{
if (other == null)
{
return false;
}
// More logic here to be discuss below...
}
Once you determine that the reference is not null you need to check to see if one object (
this) is equal to the object passed in (
other). Notice that the data type of
other
is Object - this means any kind of reference can be passed to your
equals method. If you assume the type of the parameter is correct
(the same type as the class where you are overriding the equals method)
it will work as long as no other code passes in anything else - NOT A
GOOD ASSUMPTION - since the consequences, if you are wrong, is a
ClassCastException. You can count on your instructor to pass you
things that are not expected intentionally and you can expect your
peers in your later professional life to do it unintentionally with
surprising frequency! The best way to handle the type issue is to
make sure the two object (
this and
other) are instances of the same class as follows:
public boolean equals(Object other)
{
if (other == null)
{
return false;
}
if (this.getClass() != other.getClass())
{
return false;
}
// More logic here to be discuss below...
}
The getClass method (use above) is one of the methods of
the Object class and is inherited by all classes from Object. It
returns a reference to the Class class for the class-type of the
reference it is called on. There is only one Class class reference for
a particular Class - this is known as a
singleton.
All instances of the same Class share the same Class class
reference. If you call getClass on any two references of the same
type you'll get the same Class class reference value. This is one
of the rare instance were reference equality (or inequality) is what
you want so you do not need to call the equals method of the Class
class to compare them (you may, it will work, but it is slightly less
efficient and demonstrates lack of understanding).
Finally, after determining that the reference passed in (
other) is not null and is the same as
this, you
typically check that all of the instance variables of each object are
equal. This is not a strict requirement - just typical.
When you write a class you decide what equality means and
implement it as such. Remember that for primitive instance variable you
may use the equality operators (== and !=) unless, of course, they are
floats or doubles, in which case you have to decide what level of
precision you want/need. For instance variables that are object
references you must use the equals method of their classes.
Arrays require you to loop through them and do the right thing or
use methods of the Java Standard Library that implement loops and do
the right thing. For example, lets assume a
class called Simple that has two instance variables, an
int called
x and
a
String called
str.
Assuming that str could never be null (a class invariant you must
maintain in the rest of the code belonging to the class) here is what a
correct equals method might look like:
public boolean equals(Object other)
{
if (other == null)
{
return false;
}
if (this.getClass() != other.getClass())
{
return false;
}
if (this.x != ((Simple)other).x)
{
return false;
}
if (!this.str.equals(((Simple)other).str))
{
return false;
}
return true;
}
Notice that the use of
this
in the example above in not necessary and included for clarity only.
Also notice that there are many ways to write the code above that
are logically equivalent . You may write your equals methods in
any logically equivalent manner you wish.
Inheritance and the equals Method...
When overriding the equals method in classes making use of inheritance it is important to keep the super class and sub-class as
loosely coupled
as possible. Loosely coupled classes make as few assumptions
about each other as possible making the code more maintainable over
time. A secondary goal is to avoid duplication of code.
Let's take a look at an example involving three classes, A, B,
and C as follows:
public class A
{
// Class implementation not shown
}
public class B extends A
{
// Class implementation not shown
}
public class C extends B
{
// Class implementation not shown
}
As you can see, class A is a sub-class of Object by implicit extension,
B is a sub-class of A by explicit extension, and C is a sub-class of B
by explicit extension. Now look at some code making use of some
class C objects and checking to see if two objects are equal:
C c1, c2;
c1 = getSomeRandomCObject();
c2 = getSomeRandomCObject();
if (c1.equals(c2))
{
// Do something, anything, it is not germane to the discussion
}
When class C overrides the equals method it must check that the object
reference passes in is not null, that it is the same class (not a
requirement, but typical), and that
all the instance data is equal -
including
all superclass instance variables it inherited. Assuming all C's
superclasses maintain strict encapsulation (private instance variables)
this presents a problem: How do you access the private instance
variables of your superclasses? One solution would be to use the
accessor methods (get-methods) to obtain the data of the super classes
- this is an example of tight coupling - a bad idea! First,
this requires detailed knowledge of the super classes - the sub-class
must know all instance data and all methods to access them. The
downside of this approach is that it is more work and more likely to be
wrong in the first place, and less likely to be maintained properly in
the second place. Imagine someone else wrote class A and you are
simply extending it. By calling the accessor methods of class A
in class C's implementation of equals the Class C code is tightly
coupled to Class A's implementation. Now imagine that the person
that wrote class A makes changes - how likely are they to realize that
class C may be affected by the changes? Even if they notice will
they know how to modify Class C's implementation to work with the new
class A? As you can see, even a relatively simple change to Class
A that should be isolated has now become more complicated and error
prone.
How do you solve this issue? You reuse your superclass's
implementation of equals when you override equals in your class as
follows:
public class C extends B
{
// Class implementation not shown except for equals
public boolean equals(Object other)
{
if (!super.equals(other))
{
return false;
}
// Rest of equals method here, i.e., check all class C instance variables for equality
}
}
Notice the absence of the null-check and class-type check that was
stressed earlier in this document. If you do them here your code
will be logically correct but you will be duplicating logic, wasting
CPU cycles, and contributing to global-warming - all unnecessary evils.
To see why lets lets follow the flow-of-execution. The
super.equals(other) call in the
first line of
C's equals method calls its super class's implementation of
equals, specifically class B's equals method, which should also
call its super.equals in its first line as follows:
public class B extends A
{
// Class implementation not shown except for equals
public boolean equals(Object other)
{
if (!super.equals(other))
{
return false;
}
// Rest of equals method here, i.e., check all class B instance variables for equality
}
}
This super.equals call in class B's equals method calls class A's equals method and class A's equals method looks like this:
public class A
{
// Class implementation not shown except for equals
public boolean equals(Object other)
{
if (other == null)
{
return false;
}
if (this.getClass() != other.getClass)
{
return false;
}
// Rest of equals method here, i.e., check all class B instance variables for equality
}
}
Remember that class A's super class is, implcitly, Object. Notice
that A's equals does not call super.equals. This is because
Object's equals does reference equality and the entire reason we are
overriding equals is to get
logical equality instead of
reference equality.
Also notice that if we do the null-check first in the
most-super-class-prior-to-Object then none of its sub-classes needs to
do it directly since it is done via the super.equals call and is done
before the reference is used thereby avoiding any possibility of a
NullPointerException. Likewise, the class-type check can be done
in the most-super-class-prior-to-Object and reused by the sub-classes
via the super.equals call. Now the classes are loosely coupled -
meaning they can change their implementations without affecting the
implementation of any sub-class or super class!