CodingBison - Hash Set

As we know the basic property of a Set is that, a Set does not allow duplicates elements. More details about Set interface is available here. One of the two classes that implements the interface Set, is an HashSet. The HashSet class implements all the methods defined in the Set interface. The reason why it is called a HashSet is because it uses an "Hash Table" as its underlying data structure.

Figure: Set Interface and Class Hierarchy

Before we go into the details of the HashSet class, let us take a detour and try to understand the concepts Hash tables, Load Factor, Collisions, Hashing etc.

A hash table is a popular data structure for providing a faster search -- a well-designed hash table can provide an average lookup complexity of O(1). A hash table data structure uses a hashing function that maps a given key to one of the indexes (also referred to as slots) in the hash-table. For a hash-table, indices are identified using an integer and the hashing function converts the passed key (which can be string, number, etc) into an integer that points to an index in the hash table. In other words, if the hash-table has size S and if the first index is indexed as 0, then we can find the table index, y, such that y would be greater than or equal to 0 but but less than S. More details about Hash Table/Hash Functions is available in our data structures modules understanding Hash Tables.

Things To Remember

HashSet can store heterogeneous elements, including NULL. Also, like a Set, a HashSet also does not allow Duplicates.

For hash tables, an import consideration is that it should be able to accommodate collisions. A collision happens when we have two or more elements that are indexed to the same slot. Collisions can potentially increase the search complexity of hash tables and so a good hashing function must be designed to reduce these collisions. The average lookup complexity of O(1) is not possible when there are collisions in the Hashtable.

It is easy to see that if the size of the hash table (S) is smaller than the number of entries (N), then collisions are bound to happen. For an ideal hashing function, if S where to be same as N, then each index would have one entry. In reality, this may not be true. Even if N is smaller than S, collisions can still occur. The ratio of N and S does provide an insight into an average number of collisions. This ratio is often known as load factor of the hash table. Load factor is an important consideration for estimating the amount of collisions in a hash table. In an ideal world, the load factor should be always less than 1.0. In fact, its recommended upper limit is around 0.7. For Instance, if an applications intends to store 700 entries, it is recommended to use an Hash table of size/capacity equal to 1000. so that the load factor will be equal to 700/1000 = 0.70 or 70%.

So, if you are wondering, what is the need to learn all these HashTable concepts now. Well, as we discussed earlier, HashSet uses hash table as its underlying data structure. If an application needs to perform a large number of lookup requests for Set elements, then using HashSet would make lookups faster.

HashSet Class Constructors

HashList class provides 4 Constructors and, here is the table that lists them.

Table: Constructors provided by HashSet Class
Constructor	Description
HashSet()	Creates an empty HashSet with a default size of 16 and the load factor equal to 0.75.
HashSet(Collection c)	Creates a new HashSet with the elements of Collection c. Even if the collection c has any number of duplicate elements, the newly created set will maintain the property of a Set "no Duplicate elements allowed" and thus, ignores duplicate elements from the Collections.
HashSet(int initialSize)	Creates an empty HashSet with a size equal to "initialSize" and load factor equal to 0.75(default)
HashSet(int initialSize, float loadFactor)	Creates an empty HashSet with a size equal to "initialSize" load factor equal to loadFactor.

Here is an Example program, that demonstrates the usage of the above mentioned operations.

 import java.util.ArrayList;
 import java.util.HashSet;

 public class HashSetDemo {
     public static void main(String[] args) {
         HashSet hsetObj1 = new HashSet();
         hsetObj1.add(10);
         hsetObj1.add("abc");
         hsetObj1.add(null);
         hsetObj1.add(null);
         System.out.println(" Hash Set 1: " + hsetObj1);

         ArrayList alistObj1 = new ArrayList();
         alistObj1.add("abc");
         alistObj1.add("abc");
         alistObj1.add(1);
         System.out.println(" Array List: " + alistObj1);
         HashSet hsetObj2 = new HashSet(alistObj1);
         System.out.println(" Hash Set 2: " + hsetObj2);

         HashSet hsetObj3 = new HashSet(30);
         HashSet hsetObj4 = new HashSet(30, (float) 0.80);
     }
 }

Things To Remember

The initial default-size/capacity of an HashSet is 16 and the load factor is 0.75.

The above example program creates 4 new Sets using different constructors of HashSet(). Notice the elements of the HashSet, represented by the object hsetObj1. We will come to know about some of the properties of a Set. First, a HashSet can store heterogeneous elements, Second, NULL element is allowed, Third, by noticing the output(below) you would know that an HashSet is an unordered collection. The order of the elements in an HashSet is not preserved or guaranteed. HashSet object, hsetObj3, and hsetObj4 are created with a capacity and load factor different from the default values.

Let us Compile/run the program, Here is the output:

  Hash Set 1: [null, abc, 10]
  Array List: [abc, abc, 1]
  Hash Set 2: [1, abc]

HashSet Class Methods

As we already know, HashSet is one of the Classes that implements the Set interface. So, HashSet class implements all the methods that are defined in the Set interface. HashSet does not define any specific methods. The method list is similar to the one defined in the table Set Interface Methods List.

Here is an Example program, that demonstrates the usage of the methods.

 import java.util.ArrayList;
 import java.util.HashSet;

 public class HashSetDemo {
     public static void main(String[] args) {
         HashSet hsetObj1 = new HashSet();
         hsetObj1.add(10);
         hsetObj1.add("abc");
         hsetObj1.add(null);
         hsetObj1.add(null);
         System.out.println(" Hash Set 1: " + hsetObj1);

         ArrayList alistObj1 = new ArrayList();
         alistObj1.add("abc");
         alistObj1.add("abc");
         alistObj1.add(1);
         System.out.println(" Array List: " + alistObj1);
         HashSet hsetObj2 = new HashSet(alistObj1);
         System.out.println(" Hash Set 2: " + hsetObj2);
         System.out.println(" Hash Set 2 Size: " + hsetObj2.size());

         hsetObj2.remove(1);
         System.out.println(" Hash Set 2: " + hsetObj2);
         hsetObj2.clear();
         System.out.println(" After Clear, Hash Set 2: " + hsetObj2);
     }
 }

Let us Compile/run the program, Here is the output:

  Hash Set 1: [null, abc, 10]
  Array List: [abc, abc, 1]
  Hash Set 2: [1, abc]
  Hash Set 2 Size: 2
  Hash Set 2: [abc]
  After Clear, Hash Set 2: []

Clone Operation on the HashSet

Set interface provides an other important operation called "clone", where one can get to make a shallow copy of the HashSet. So, what a shallow copy mean? It means the elements of the Set are not copied, it just just creates an instance of the HashSet. A change made using one object, gets reflected to the other object. Basically, if you are familiar with "C" programming it like 2 pointers to the same variable, where the pointer points to the address of a variable and changes made to the variable at that address will be seen by all the pointers pointing to that address.

An example program demonstrating the concept of "Clone" method is available in Array List. The concept of "Clone" is same, be it an ArrayList or HashSet.