Relative beginner in C here so go easy on me! Consider the following structures and types:
#include <stdint.h>
typedef uint32_t u32;
typedef u32 color;
typedef struct Vertice* vertice; // Puntero al Vertice
typedef struct Lado* lado; // Puntero al Lado
typedef struct GrafoSt* Grafo;
struct GrafoSt {
u32 n; // Numero de vertices del grafo
u32 m; // Numero de lados del grafo
u32 delta; // Max de grados
u32 sigma; // Min de grados
vertice* _vertices; // Arreglo de punteros a Vertice
lado* _lados; // Max m*2
};
struct Vertice {
u32 nombre; // Nombre del vertice
u32 grado; // Numero de vecinos
color color_;
// Modificar
u32* Vecinos; // Puntero al indice del primer vecino
};
struct Lado {
u32 x;
u32 y;
};
As you can see, _vertices is an array of pointers to Vertice. On initialization, this array is set to size Grafo -> n, the number of vertices in the graph.
I'm dealing with very large graph structures (potentially millions of vertices, and consequently even more edges). Thus, efficiency becomes a serious issue. I've been advised to use hashing to store each pointer to Vertice in the _vertices array. The general idea of the hashing function would be to store a Vertice of nombre = x in (Grapho -> _vertices)[my_hash_function(x)].
I'm new to the concept of hashing and have found a few issues. I found in StackOverflow the following hash function:
u32 Hashv(u32 x) {
x = ((x >> 16) ^ x) * 0x45d9f3b;
x = ((x >> 16) ^ x) * 0x45d9f3b;
x = (x >> 16) ^ x;
return (x);
}
which reportedly is good. However, the range of this function is u32; in other words, it is not restricted to [0, (Graph -> n) - 1] the possible index values of the _vertices array. My workaround was to use mod; this is, to store each each Vertice with nombre = x in (Graph -> _vertices)[Hashv(x) % Graph -> n]. This generated a whole of collisions, which I resolved using linear probing. However, in very large graphs, the great number of collisions makes the implementation suboptimal.
Is the idea of hashing verteces a good approach indeed? If so, how could it be improved in this implementation?
EDIT: The purpose of working with graphs is to design and test graph coloring algorithms. This makes it important to have an efficient access to the neighbors (Vecinos) of a gien vertex. For example, in a Greedy coloring algorithm, to color a vertex x I'll need to know the color of its neighbors.
The way I'm loading the graphs and initializing the struct is not really important I think so I'll skip that. I'm using Ubuntu Linux.