I ran check50 and I got a frowny face for "spell-checking is case-insensitive" and "handles most basic words properly."
How could I fix this? Does it have to do with my hash function?
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strings.h>
#include "dictionary.h"
// Represents a node in a hash table
typedef struct node
{
char word[LENGTH + 1];
struct node *next;
}
node;
// Number of words in dictionary
int word_count = 0;
// Number of buckets in hash table
const unsigned int N = 26;
// Hash table
node *table[N];
// Returns true if word is in dictionary, else false
bool check(const char *word)
{
unsigned int n = hash(word);
node *cursor = table[n];
while (cursor != NULL)
{
if (strcasecmp(word, cursor->word) == 0)
{
return true;
}
cursor = cursor->next;
}
return false;
}
// Hashes word to a number
// Function credit to staff on CS50 reddit page
unsigned int hash(const char *word)
{
unsigned int hash_value = 0;
for (int i = 0, n = strlen(word); i < n; i++)
{
hash_value = (hash_value << 2) ^ word[i];
}
return hash_value % N;
}
// Loads dictionary into memory, returning true if successful else false
bool load(const char *dictionary)
{
// Open dictionary and check for memory issue
//Function guide credit to CS50 Guide by Anvea on YouTube
FILE *dict = fopen(dictionary, "r");
char word[LENGTH + 1];
// Check for memory issue with dict
if (dict == NULL)
{
printf("Dictionary is null\n");
unload();
return false;
}
while (fscanf(dict, "%s", word) != EOF)
{
node *n = malloc(sizeof(node));
if (n == NULL)
{
return false;
}
strcpy(n->word, word);
word_count++;
// Index word using hash function
int dict_index = hash(word);
// Insert into hash table if already empty
if (table[dict_index] == NULL)
{
n->next = NULL;
}
// Insert work as new node if not empyty
else
{
n->next = table[dict_index];
}
table[dict_index] = n;
}
// Close dictionary file
fclose(dict);
// Indicate success
return true;
}
// Returns number of words in dictionary if loaded else 0 if not yet loaded
unsigned int size(void)
{
return word_count;
return 0;
}
// Unloads dictionary from memory, returning true if successful else false
bool unload(void)
{
for (int i = 0; i < N; i++)
{
node *cursor = table[i];
while (cursor)
{
node *tmp = cursor;
cursor = cursor->next;
free(tmp);
}
}
return true;
}
A few issues ...
check, it callshashon the mixed case text, so it can generate different hash values for certain words (e.g.)Won't this work?andNo, it won't.check, we need to lowercase the string before callinghashso we get the same hash value/index. As a beneficial side effect of this, we can replacestrcasecmpwithstrcmp(which is faster).load, there is no need to have a special case for whentable[n]isNULL, so we can simplify this.hashdoing<<may produce a hash value that isn't evenly distributed, slowing down the search incheck.load, we can replacefscanfwithfgets[stripping the newline] asfgetsis much faster.gcc8.3.1, the compiler flaggednode *table[N];because of:const unsigned int N = 26;Unlike C++, this isn't allowed in C--it requires a literal constant and not justconst. So, I changed this toenum { N = 26 };x -> yinstead ofx->y.Here is the corrected code. I've checked it against all possible input files and dictionaries. It is annotated with the bugs and fixes:
In the code above, I've used
cppconditionals to denote old vs. new code:Note: this can be cleaned up by running the file through
unifdef -kHere is the cleaned up version (run through
unifdef -kwith some comment cleanup):Although the above code is [now] correct, it can be further improved.
For some additional fixes and speedups, see my cs50 speller answer: cs50 pset5 Speller optimisation
Here is a comparison of the elapsed time for the
checkfunction between your [fixed] version vs my [optimized] version from the link: