I am trying to recursively calculate SHA256 sum of all files in directory using OpenSSL.
This is my code:
#include <stdlib.h>
#include <stdio.h>
#include <dirent.h>
#include <string.h>
#include <openssl/sha.h>
#include <openssl/md5.h>
#define _MAX_LINE_ 256
int sha256_file (char* path, char output[65]){
FILE* file = fopen(path, "rb");
unsigned char hash[SHA256_DIGEST_LENGTH];
const int bufSize = 32768;
char* buffer = malloc(bufSize);
int bytesRead = 0;
SHA256_CTX sha256;
if(!file)
return -1;
if(!buffer)
return -1;
SHA256_Init(&sha256);
while((bytesRead = fread(buffer, 1, bufSize, file))){
SHA256_Update(&sha256, buffer, bytesRead);
}
SHA256_Final(hash, &sha256);
sha256_hash_string(hash, output);
fclose(file);
free(buffer);
return 0;
}
void sha256_hash_string (unsigned char hash[SHA256_DIGEST_LENGTH], char outputBuffer[65]){
int i = 0;
for(i = 0; i < SHA256_DIGEST_LENGTH; i++){
sprintf(outputBuffer + (i * 2), "%02x", (unsigned char)hash[i]);
}
outputBuffer[64] = 0;
}
void traverse_dirs(char* base_path){
char path[_MAX_LINE_];
struct dirent* dp;
DIR* dir = opendir(base_path);
unsigned char file_sha[65];
char* md5_command;
if(!dir)
return;
while((dp = readdir(dir)) != NULL){
if(strcmp(dp->d_name, ".") != 0 && strcmp(dp->d_name, "..") != 0){
// calculate the sha256 sum of the file
sha256_file(dp->d_name, file_sha);
// print the name of the file followed by the sha256 sum
printf("%s -> %s\n", dp->d_name, file_sha);
strcpy(path, base_path);
strcat(path, "/");
strcat(path, dp->d_name);
traverse_dirs(path);
}
}
closedir(dir);
}
int main(int argc, char* argv[]){
if(argc < 2){
printf("Usage: <executable> <dirname>\n");
exit(-1);
}
traverse_dirs(argv[1]);
return 0;
}
The sha256_file() function produces the correct sha256sum for each file, as I have tested manually.
The traverse_dirs() function also works fine, as it correctly prints the contents of the directory provided.
The problem is they don't work together. I have figured out that the file is not opening correctly in the sha256_file() function (fopen returns NULL) but I don't get why. If I use it manually on every file, it works just fine.
Any ideas why?
This
sha256_file(dp->d_name, file_sha)won't work because you are not in the directory that contains that name. You need to instead use the path that you construct inpath[].You should only be calling
sha256_file(path)ifpathis a regular file or a symbolic link (if you want to process symbolic links — your call), only callingtraverse_dirs(path)ifpathis a directory, or doing nothing with the entry otherwise. You can check for those usingd_type. See the man page for dirent.For efficiency, you could have just one
path[]that gets passed through thetraverse()calls, appending topath[]for each entry. That would use much less stack space, and would be faster as well with much less copying. You would also allocatefile_sha[]only in the block that computes the SHA-256, so you're not wasting recursing stack space on that either.Something like:
which would be called initially with
*pathhaving enough space for the maximum path size plus one, and containing the path to open as a directory. The second argument would be the length of the path, or zero if the path is the root,"/"(to avoid double slashes). You should get the maximum path length from limits.h asPATH_MAX, assuming your system is POSIX compliant. Or if not, 32K should be safe. 256 isn't.If your
struct direnthasd_namlen, then you can use that instead of thestrlen(). Or if you havestpcpy(), you can use that instead ofstrcpy()and compute the end from its return value. You can also consider guarding against writing past the end ofpath[]. If you don't mind globals,pathcould be a global and not pass the same pointer unnecessarily in every call.Move your
malloc()after the firstreturn, and it if it fails, close the open file before returning. Or just allocate the 32K on the stack. That's a small amount for the stack.