I ran check50 and I got a frowny face for "spell-checking is case-insensitive" and "handles most basic words properly."
How could I fix this? Does it have to do with my hash function?
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strings.h>
#include "dictionary.h"
// Represents a node in a hash table
typedef struct node
{
char word[LENGTH + 1];
struct node *next;
}
node;
// Number of words in dictionary
int word_count = 0;
// Number of buckets in hash table
const unsigned int N = 26;
// Hash table
node *table[N];
// Returns true if word is in dictionary, else false
bool check(const char *word)
{
unsigned int n = hash(word);
node *cursor = table[n];
while (cursor != NULL)
{
if (strcasecmp(word, cursor->word) == 0)
{
return true;
}
cursor = cursor->next;
}
return false;
}
// Hashes word to a number
// Function credit to staff on CS50 reddit page
unsigned int hash(const char *word)
{
unsigned int hash_value = 0;
for (int i = 0, n = strlen(word); i < n; i++)
{
hash_value = (hash_value << 2) ^ word[i];
}
return hash_value % N;
}
// Loads dictionary into memory, returning true if successful else false
bool load(const char *dictionary)
{
// Open dictionary and check for memory issue
//Function guide credit to CS50 Guide by Anvea on YouTube
FILE *dict = fopen(dictionary, "r");
char word[LENGTH + 1];
// Check for memory issue with dict
if (dict == NULL)
{
printf("Dictionary is null\n");
unload();
return false;
}
while (fscanf(dict, "%s", word) != EOF)
{
node *n = malloc(sizeof(node));
if (n == NULL)
{
return false;
}
strcpy(n->word, word);
word_count++;
// Index word using hash function
int dict_index = hash(word);
// Insert into hash table if already empty
if (table[dict_index] == NULL)
{
n->next = NULL;
}
// Insert work as new node if not empyty
else
{
n->next = table[dict_index];
}
table[dict_index] = n;
}
// Close dictionary file
fclose(dict);
// Indicate success
return true;
}
// Returns number of words in dictionary if loaded else 0 if not yet loaded
unsigned int size(void)
{
return word_count;
return 0;
}
// Unloads dictionary from memory, returning true if successful else false
bool unload(void)
{
for (int i = 0; i < N; i++)
{
node *cursor = table[i];
while (cursor)
{
node *tmp = cursor;
cursor = cursor->next;
free(tmp);
}
}
return true;
}
A few issues ...
check
, it callshash
on the mixed case text, so it can generate different hash values for certain words (e.g.)Won't this work?
andNo, it won't.
check
, we need to lowercase the string before callinghash
so we get the same hash value/index. As a beneficial side effect of this, we can replacestrcasecmp
withstrcmp
(which is faster).load
, there is no need to have a special case for whentable[n]
isNULL
, so we can simplify this.hash
doing<<
may produce a hash value that isn't evenly distributed, slowing down the search incheck
.load
, we can replacefscanf
withfgets
[stripping the newline] asfgets
is much faster.gcc
8.3.1, the compiler flaggednode *table[N];
because of:const unsigned int N = 26;
Unlike C++, this isn't allowed in C--it requires a literal constant and not justconst
. So, I changed this toenum { N = 26 };
x -> y
instead ofx->y
.Here is the corrected code. I've checked it against all possible input files and dictionaries. It is annotated with the bugs and fixes:
In the code above, I've used
cpp
conditionals to denote old vs. new code:Note: this can be cleaned up by running the file through
unifdef -k
Here is the cleaned up version (run through
unifdef -k
with some comment cleanup):Although the above code is [now] correct, it can be further improved.
For some additional fixes and speedups, see my cs50 speller answer: cs50 pset5 Speller optimisation
Here is a comparison of the elapsed time for the
check
function between your [fixed] version vs my [optimized] version from the link: