Memory leak in a C program

198 views Asked by At

I am learning C programming. As a newbie, I have to say this language is really tough, especially memory management. I used to program with Java and Python, which you don't have to worry about memory leakage. However, it's totally another story with C. Recently I wrote a small program and I ran it with valgrind which said the program caused memory leak. The code is pasted.

#include <stdio.h>
#include <glib.h>
#include <string.h>
#define MAX_LENGTH 256
#define TOKEN_NUM 2

char *tokArr[MAX_LENGTH];

char** strSplit(char *oneStr, const char* delim)
{
    char *localStr = oneStr;
    int index = 0;
    char *token = strtok(localStr, delim);

    while (token != NULL) {
        tokArr[index++] = token;
        token = strtok(NULL, delim);
    }
    free(token);
    return tokArr;
}

void printHash(gpointer key, gpointer val, gpointer userData)
{
    printf(userData, key, val);
    free(key);
    free(val);
}

int main(int argc, char const *argv[])
{   
    if (argc != 2) {
        printf("need a file name as an argument");
        return -1;
    }

    const char *filePath = argv[1];
    FILE *fp = fopen(filePath, "r");

    if (fp == NULL) {
        printf("error: cannot open document %s", filePath);
        return -1;
    }

    GHashTable* hash = g_hash_table_new(g_str_hash, g_str_equal);
    char buffer[MAX_LENGTH];
    while (fgets(buffer, MAX_LENGTH, fp)) {
        char **splitedStr = strSplit(buffer, "@@@@");
        char *key = strdup(splitedStr[0]);
        char *val = strdup(splitedStr[1]);
        g_hash_table_insert(hash, key, val);
        free(splitedStr);
    }
    fclose(fp);
    g_hash_table_foreach(hash, (GHFunc)printHash, "Name: %s, age: %s\n");
    g_hash_table_destroy(hash);
    return 0;
}

As you see, this program is pretty simply. It reads in a text file line by line. Then each line will be split into two parts which are separated by a string "@@@@". Next, the first part of the line is stored into a hash table as the key with the second part of the line as the value. The hash table provided by GLib-2.0 was utilized in this program.

I tried my best to free all possible malloc memories, however, memory leak warnings still popping out. Would anyone help me to figure which parts caused the problem? Many thanks.

3

There are 3 answers

4
martried On

Unclear why someone would recommend to "stay away" from specific libraries. Just RTFM and try to understand what you are dealing with. After all, these libraries themselves could contain memory leaks... You should have a look at Electric Fence, which replaces malloc calls by something more debug-friendly: https://elinux.org/Electric_Fence

Your code could be simplified when keeping in mind that strings are always NUL-terminated. Avoiding memory copies at all, you do not need to use strdup() and work on your input buffer alone.

  • Your key pointer is always the beginning of the buffer, so just get a point to it and replace the first @ character by 0x00.
  • Your val pointer starts after the search pattern, so you can also just get a pointer to the memory location after the last @ character.
  • The tricky part is to ensure that what you are reading in with fgets truly has a NUL termination character at the end: buffer[MAX_LENGTH] = 0;

In order to get acquainted with the way this is done in C, I recommend that you implement the pattern matching at a very basic level. Create a while loop that scans through every character in your buffer and searches for the separation pattern. When it hits 4 times in a row, you have found your pointer.

PS: If you prefer a more convenient solution, try fscanf() instead of fgets().

EDIT: The documentation on GHashTable recommends to use g_strdup(), however reading the entire file into memory would still work as described above. The buffer should be allocated using malloc() and may be enlarged using realloc() after every invocation of fgets(). Special care should be taken when a line is longer than MAX_LENGTH. It is also advisable to only increase the buffer size by MAX_LENGTH plus the number of bytes already read in i.e. from calling ftell().

Since key is a pointer to the buffer, it can be passed to free() when destroying the hash table. If you used realloc(), this is only needed once with the pointer returned from malloc().

0
Kuba hasn't forgotten Monica On

I used to program with Java and Python, which you don't have to worry about memory leakage.

You do have to worry about it. The definition of leaked memory in Java and Python is quite different than in C, but it's still a problem.

In C, a memory leak is when you have allocated a memory block but don't have a pointer to it. This can't happen in Java and Python by definition.

In Java and Python, a memory leak is when you inadvertently hold a reference to an object you don't need anymore.

Dynamically typed languages like Python and Javascript are especially "easy going" in that respect, and you can leak memory in all sorts of creative ways, while still having a reference somewhere to the "leaked" memory. It's a common problem when using introspection, some runtime debugging features, when attaching objects to complex data structures like the DOM in a browser, etc.

It becomes a big problem when using complex frameworks. You'd think that JavaScript running in a browser would be a benign environment in terms of memory leaks, but it's pretty bad. When combining multiple JS libraries and the browser DOM, it becomes hard to reason about who possibly holds references to objects, and memory leaks are commonplace. It takes quite a bit of care to avoid them. Many websites that use JS leak memory, and some leak like sieves.

It also has to be understood what is the impact of a memory leak in a C program that runs in batch mode, i.e. to completion. When the leaked blocks are not repeatedly allocated, it is benign, and especially if those blocks are used till the end of the program. All you'd end up doing is free them right before exiting.

When a lot of memory has been allocated, it is faster to just exit to the operating system without freeing that memory. It won't make the slightest difference: when the C program terminates, all the memory it allocated is freed automatically by the OS, in a very efficient manner - much faster than millions of calls to free() ever could manage.

I ran it with valgrind which said the program caused memory leak

Try that same program with most of the code removed. Sometimes even an "empty" program that uses additional libraries will "leak" memory, even though those leaks are benign.

0
oppressionslayer On

I think valgrind is finding that token points inside oneStr, which you do not dynamically allocate in your code (I don't think this is an issue though as previous comments state), but here is a change and output from valgrind.


#include <stdio.h>
#include <glib.h>
#include <string.h>
#define MAX_LENGTH 256
#define TOKEN_NUM 2

char *tokArr[MAX_LENGTH];

char** strSplit(char *oneStr, const char* delim)
{
    char *localStr = oneStr;
    int index = 0;
    char *token = strtok(localStr, delim);

    while (token != NULL && index < MAX_LENGTH) {
        tokArr[index++] = token;
        token = strtok(NULL, delim);
    }
    return tokArr;
}

void printHash(gpointer key, gpointer val, gpointer userData)
{
    printf("%s -> %s\n", (char*)key, (char*)val);
}

int main(int argc, char const *argv[])
{   
    // if (argc != 2) {
    //     printf("Usage: %s <filename>\n", argv[0]);
    //    return -1;
    // }
    char testStr[] = "Hello@@@@World@@@@What@@@@";
    char **tokens = strSplit(testStr, "@@@@");
    for (int i = 0; tokens[i] != NULL; ++i) {
        printf("%s\n", tokens[i]);
    }

    return 0;
}

Output:

gcc tokens.c -o tokens `pkg-config --cflags --libs glib-2.0`


$ valgrind ./tokens 
==8955== Memcheck, a memory error detector
==8955== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8955== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==8955== Command: ./tokens
==8955== 
Hello
World
What
==8955== 
==8955== HEAP SUMMARY:
==8955==     in use at exit: 0 bytes in 0 blocks
==8955==   total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==8955== 
==8955== All heap blocks were freed -- no leaks are possible
==8955== 
==8955== For lists of detected and suppressed errors, rerun with: -s
==8955== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)