EDIT: I have fixed the mistakes and the program returns correct output for all input. The main mistakes were:
fillGap-method was not implemented correctly. It moved the next element after the gap to fill the gap, even if that element didn't hash to that index. fillGap should only fill the empty space with an element that hashes to that index.
"size" was decremented incorrectly in the remove-method. My program reduced size even if the element didn't exist in the array. I could've used my contains-method, but didn't want to because I think it is inefficient. There are of course other ways to make sure the element exists in the array before reducing size, without using the contains-method.
Below is my correct code (it could probably be more efficient, but it works):
import java.io.File;
import java.io.FileNotFoundException;
import java.util.*;
class Hashing{
protected int size = 0;
protected Scanner fileScanner = null;
protected int[] array;
protected boolean isRehashing = false;
public Hashing(){
array = new int[5];
emptySpots();
}
public int hashValue(int x){
return (x % array.length + array.length) % array.length;
}
public double loadFactor(){
return (double) size / array.length;
}
public void emptySpots(){
for (int i = 0; i < array.length; i++){
array[i] = Integer.MIN_VALUE;
}
}
public boolean contains(int x){
int hashValue = hashValue(x);
if (array[hashValue] == Integer.MIN_VALUE){ // Empty spot
return false;
}
else{ // Spot is not empty -- we explore
if (array[hashValue] == x){ // x is at the index it hashes to
return true;
}
// x must be at at an index after hashValue, or it does not exist and we return false
int index = hashValue;
while (array[index] != Integer.MIN_VALUE){
index = (index + 1) % array.length;
if (index == hashValue){
return false;
}
if (array[index] == x){
return true;
}
}
}
return false;
}
public void rehash(){
System.out.println("rehash");
isRehashing = true;
int[] oldArray = array;
array = new int[oldArray.length * 2];
emptySpots();
for (int i = 0; i < oldArray.length; i++){
if (oldArray[i] != Integer.MIN_VALUE){
insert(oldArray[i]);
}
}
isRehashing = false;
}
public void insert (int x){
System.out.println("insert");
if (contains(x)){ // We do not allow duplicates
return;
}
int hashValue = hashValue(x);
if (array[hashValue] == Integer.MIN_VALUE){ // The spot that x hashes to is empty
array[hashValue] = x;
}
else{
int index = hashValue;
while (array[index] != Integer.MIN_VALUE){
index = (index + 1) % array.length;
if (array[index] == Integer.MIN_VALUE){
array[index] = x;
break;
}
}
}
if (!(isRehashing)){
size++;
if (loadFactor() > 0.7){ // If the array is more than 70% full, we rehash
rehash();
}
}
}
public void remove(int x){
int hashValue = hashValue(x);
int index = hashValue;
if (array[hashValue] == Integer.MIN_VALUE){ // There are no elements with this hashValue in the array
return;
}
if (array[hashValue] == x){ // x is at the index it hashes to
array[hashValue] = Integer.MIN_VALUE;
size--; // Decrement size as element is removed
fillGap(hashValue, hashValue);
return;
}
// We have to iterate through the array and see if we can find x
index = (index + 1) % array.length;
while (index != hashValue){
if (array[index] == x){
array[index] = Integer.MIN_VALUE;
size--; // Decrement size as element is removed
fillGap(index, index);
return;
}
index = (index + 1) % array.length;
}
}
public void fillGap(int index, int hashValue){ // Recursive
System.out.println("fillGap");
int nextIndex = (index + 1) % array.length;
if (array[nextIndex] == Integer.MIN_VALUE){ // The next spot is empty, so we return
return;
}
if (hashValue(array[nextIndex]) == hashValue){ // Found a replacement at the next spot
array[hashValue] = array[nextIndex];
array[nextIndex] = Integer.MIN_VALUE;
fillGap(nextIndex, nextIndex);
}
else{
fillGap(nextIndex, hashValue);
}
}
public int size(){
return size;
}
public void readFile(String fileName){
try {
fileScanner = new Scanner(new File(fileName));
}
catch (FileNotFoundException e) {
System.out.println("Could not read from file " + fileName);
System.exit(-1);
}
while (fileScanner.hasNextLine()){
String[] parts = fileScanner.nextLine().split(" ");
if (parts[0].contains("insert")){
insert(Integer.parseInt(parts[1]));
}
if (parts[0].contains("remove")){
remove(Integer.parseInt(parts[1]));
}
if (parts[0].contains("contains")){
System.out.println(contains(Integer.parseInt(parts[1])));
}
if (parts[0].contains("size")){
System.out.println(size());
}
}
fileScanner.close();
}
public static void main(String[] args){
Hashing hashing = new Hashing();
if (args.length > 0){
hashing.readFile(args[0]);
}
}
}
EDIT: I made several changes to the code below while trying to solve the issue, so the code below is not the same as the one I inserted here the first time. I still wanted to include a wrong version of my code, if it helps anyone.
I am writing a Linear Probing-program and testing it with input-files. I compare my output with output-files I've been provided, and my "size" is off by a few numbers.
This is my code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.*;
class Hashing{
protected int size = 0;
protected Scanner fileScanner = null;
protected int[] array;
protected boolean isRehashing = false;
public Hashing(){
array = new int[5];
emptySpots();
}
public int hashValue(int x){
return (x % array.length + array.length) % array.length;
}
public double loadFactor(){
return (double) size / array.length;
}
public void emptySpots(){
for (int i = 0; i < array.length; i++){
array[i] = Integer.MIN_VALUE;
}
}
public boolean contains(int x){
int hashValue = hashValue(x);
if (array[hashValue] == Integer.MIN_VALUE){ // Empty spot
return false;
}
else{ // Spot is not empty -- we explore
if (array[hashValue] == x){ // x is at the index it hashes to
return true;
}
// x must be at at an index after hashValue, or it does not exist and we return false
int index = hashValue;
while (array[index] != Integer.MIN_VALUE){
index = (index + 1) % array.length;
if (index == hashValue){
return false;
}
if (array[index] == x){
return true;
}
}
}
return false;
}
public void rehash(){
System.out.println("rehash");
isRehashing = true;
int[] oldArray = array;
array = new int[oldArray.length * 2];
emptySpots();
for (int i = 0; i < oldArray.length; i++){
if (oldArray[i] != Integer.MIN_VALUE){
insert(oldArray[i]);
}
}
isRehashing = false;
}
public void insert (int x){
System.out.println("insert");
if (contains(x)){ // We do not allow duplicates
return;
}
int hashValue = hashValue(x);
if (array[hashValue] == Integer.MIN_VALUE){ // The spot that x hashes to is empty
array[hashValue] = x;
}
else{
int index = hashValue;
while (array[index] != Integer.MIN_VALUE){
index = (index + 1) % array.length;
if (array[index] == Integer.MIN_VALUE){
array[index] = x;
break;
}
}
}
if (!(isRehashing)){
size++; // Since rehashing calls upon insert but doesn't affect the size, we do not increase size when rehashing
if (loadFactor() > 0.7){ // If the array is more than 70% full, we rehash
rehash();
}
}
}
public void remove(int x){
System.out.println("remove");
int hashValue = hashValue(x);
int index = hashValue;
if (array[hashValue] == Integer.MIN_VALUE){ // There are no elements with this hashValue in the array
return;
}
if (array[hashValue] == x){ // x is at the index it hashes to
array[hashValue] = Integer.MIN_VALUE;
}
else{ // We have to iterate through the array and see if we can find x
while (array[index] != Integer.MIN_VALUE){
index = (index + 1) % array.length;
if (index == hashValue){
return;
}
if (array[index] == x){
array[index] = Integer.MIN_VALUE;
}
}
}
size--;
fillGap(index);
}
public void fillGap(int index){ // Recursive
int nextIndex = (index + 1) % array.length;
if (array[nextIndex] == Integer.MIN_VALUE){ // The next spot is empty, so we return
return;
}
array[index] = array[nextIndex];
array[nextIndex] = Integer.MIN_VALUE;
fillGap(nextIndex);
}
public int size(){
return size;
}
public void readFile(String fileName){
try {
fileScanner = new Scanner(new File(fileName));
}
catch (FileNotFoundException e) {
System.out.println("Could not read from file " + fileName);
System.exit(-1);
}
while (fileScanner.hasNextLine()){
String[] parts = fileScanner.nextLine().split(" ");
if (parts[0].contains("insert")){
insert(Integer.parseInt(parts[1]));
}
if (parts[0].contains("remove")){
remove(Integer.parseInt(parts[1]));
}
if (parts[0].contains("contains")){
System.out.println(contains(Integer.parseInt(parts[1])));
}
if (parts[0].contains("size")){
System.out.println(size());
}
}
fileScanner.close();
}
public static void main(String[] args){
Hashing hashing = new Hashing();
if (args.length > 0){
hashing.readFile(args[0]);
}
/*System.out.println("List:");
int count = 0;
for (int i = 0; i < hashing.array.length; i++){
System.out.println(hashing.array[i]);
if (hashing.array[i] != Integer.MIN_VALUE){
count++;
}
}
System.out.println("Count: " + count);*/
}
}
I think
remove()
is implemented incorrectly: you clear the slot if the value is found, but ignore the fact that subsequent slots may contain values that hashed to the same or preceding slots. This makes these values unreachable in subsequent callsHere's an example to help you understand the problem. Imagine you have a hash table with 3 slots, all initially empty (
[-1, -1, -1]
) and you insert two values, a and b, which both happen to hash to slot 0. Then after inserting, your hash table looks like this:[a, b, -1]
. Now if you erase e.g. element a, which is in slot 0, the hash table becomes[-1, b, -1]
. Since slot 0 is empty, value b is no longer findable.contains(b)
will return false (incorrectly) andinsert(b)
will re-insert it into slot 0 (incorrectly), creating a hash table[b, b, -1]
that contains the same value twice, which explains why the size is wrong.The standard solutions to this problem are to either rewrite the empty slot with a tombstone value (e.g. -2) which doesn't terminate the search in
contains()
orinsert()
, or to remove the gap in the linear probing sequence by moving subsequent elements backward (e.g. when deleting a,[a, b, -1]
becomes[b, -1, -1]
).For the first solution, you must ensure that tombstones are still counted towards the load factor. The second solution is possible but tricky to get right, because a linear probing sequence might contain elements that hash to different slots, and they don't even need to occur in order! So if this is a toy problem, you probably want to stick with tombstones since they're much easier to implement correctly.