How do I implement an erase function for a hash table?

3.8k views Asked by At

I have a hash table using linear probing. I've been given the task to write an erase(int key) function with the following guidelines.

 void erase(int key);

 Preconditions:  key >= 0
 Postconditions: If a record with the specified key exists in the table, then
 that record has been removed; otherwise the table is unchanged.

I was also given some hints to accomplish the task

  • It is important to realize that the insert function will allow you to add a new entry to the table, or to update an existing entry in the table.

  • For the linear probing version, notice that the code to insert an item has two searches. The insert() function calls function findIndex() to search the table to see if the item is already in the table. If the item is not in the table, a second search is done to find the position in the table to insert the item. Adding the ability to delete an entry will require that the insertion process be modified. When searching for an existing item, be sure that the search does not stop when it comes to a location that was occupied but is now empty because the item was deleted. When searching for a position to insert a new item, use the first empty position - it does not matter if the position has ever been occupied or not.

So I've started writing erase(key) and I seem to have run into the problem that the hints are referring to, but I'm not positive what it means. I'll provide code in a second, but what I've done to test my code is set up the hash table so that it will have a collision and then I erase that key and rehash the table but it doesn't go into the correct location.

For instance, I add a few elements into my hash table:

The hash table is:
Index  Key    Data
    0   31     3100
    1    1     100
    2    2     200
    3   -1
    4   -1
    5   -1
    6   -1
    7   -1
    8   -1
    9   -1
   10   -1
   11   -1
   12   -1
   13   -1
   14   -1
   15   -1
   16   -1
   17   -1
   18   -1
   19   -1
   20   -1
   21   -1
   22   -1
   23   -1
   24   -1
   25   -1
   26   -1
   27   -1
   28   -1
   29   -1
   30   -1

So all of my values are empty except the first 3 indices. Obviously key 31 should be going into index 1. But since key 1 is already there, it collides and settles for index 0. I then erase key 1 and rehash the table but key 31 stays at index 0.

Here are the functions that may be worth looking at:

void Table::insert( const RecordType& entry )
{
   bool alreadyThere;
   int index;

   assert( entry.key >= 0 );

   findIndex( entry.key, alreadyThere, index );
   if( alreadyThere )
      table[index] = entry;   
   else
   {
      assert( size( ) < CAPACITY );
      index = hash( entry.key );
      while ( table[index].key != -1 )
         index = ( index + 1 ) % CAPACITY;
      table[index] = entry;
      used++;
   }
}

Since insert uses findIndex, I'll include that as well

void Table::findIndex( int key, bool& found, int& i ) const
{
   int count = 0;

   assert( key >=0 );

   i = hash( key );
   while ( count < CAPACITY && table[i].key != -1 && table[i].key != key )
   {
      count++;
      i = (i + 1) % CAPACITY;
   }   
   found = table[i].key == key;
}

And here is my current start on erase

void Table::erase(int key) 
{
    assert(key >= 0);

    bool found, rehashFound;
    int index, rehashIndex;

    //check if key is in table
    findIndex(key, found, index);

    //if key is found, remove it
    if(found)
    {
        //remove key at position
        table[index].key = -1;
        table[index].data = NULL;
        cout << "Found key and removed it" << endl;
        //reduce the number of used keys
        used--;
        //rehash the table

        for(int i = 0; i < CAPACITY; i++)
        {
            if(table[i].key != -1)
            {
                cout << "Rehashing key : " << table[i].key << endl;
                findIndex(table[i].key, rehashFound, rehashIndex);
                cout << "Rehashed to index : " << rehashIndex << endl;
                table[rehashIndex].key = table[i].key;
                table[rehashIndex].data = table[i].data;
            }
        }
    }
}

Can someone explain what I need to do to make it rehash properly? I understand the concept of a hash table, but I seem to be doing something wrong here.

EDIT

As per user's suggestion:

void Table::erase(int key)
{
    assert(key >= 0);
    bool found;
    int index;

    findIndex(key, found, index);

    if(found) 
    {
        table[index].key = -2;
        table[index].data = NULL;
        used--;

    }

}


//modify insert(const RecordType & entry)

while(table[index].key != -1 || table[index].key != -2)


//modify findIndex

while(count < CAPACITY && table[i].key != -1
      && table[i].key != -2 && table[i].key != key)
2

There are 2 answers

9
user2357112 On BEST ANSWER

When deleting an item from the table, don't move anything around. Just stick a "deleted" marker there. On an insert, treat deletion markers as empty and available for new items. On a lookup, treat them as occupied and keep probing if you hit one. When resizing the table, ignore the markers.

Note that this can cause problems if the table is never resized. If the table is never resized, after a while, your table will have no entries marked as never used, and lookup performance will go to hell. Since the hints mention keeping track of whether an empty position was ever used and treating once-used cells differently from never-used, I believe this is the intended solution. Presumably, resizing the table will be a later assignment.

13
Tony Delroy On

It's not necessary to rehash the entire table every time a delete is done. If you want to minimise degradation in performance, then you can compact the table by considering whether any of the elements after (with wrapping from end to front allowed) the deleted element but before the next -1 hash to a bucket at or before the deleted element - if so, then they can be moved to or at least closer to their hash bucket, then you can repeat the compaction process for the just-moved element.

Doing this kind of compaction will remove the biggest flaw in your current code, which is that after a little use every bucket will be marked as either in use or having been used, and performance for e.g. find of a non-existent value will degrade to O(CAPACITY).

Off the top of my head with no compiler/testing...

int Table::next(int index) const
{
    return (index + 1) % CAPACITY;
}

int Table::distance(int from, int to) const
{
    return from < to ? to - from : to + CAPACITY - from;
}

void Table::erase(int key)
{
    assert(key >= 0);
    bool found;
    int index;

    findIndex(key, found, index);

    if (found) 
    {
        // compaction...
        int limit = CAPACITY - 1;
        for (int compact_from = next(index);
             limit-- && table[compact_from].key >= 0;
             compact_from = next(compact_from))
        {
            int ideal = hash(table[compact_from].key);
            if (distance(ideal, index) <
                distance(ideal, compact_from))
            {
                table[index] = table[compact_from];
                index = compact_from;
            }
        }

        // deletion
        table[index].key = -1;
        delete table[index].data; // or your = NULL if not a leak? ;-.
        --used;
    }
}