How to fix D "memory leaks"

150 views Asked by At

So I've been searching for a solution to this problem for some time. I've written a program to take data from two separate text files, parse it, and output to another text file and an ARFF file for analysis by Weka. The problem I'm running into is that the function I wrote to handle the data read and parsing operations doesn't de-allocate memory properly. Every successive call uses an additional 100MB or so and I need call this function over 60 times over the course of the function. Is there a way to force D to de-allocate memory, with respect to arrays, dynamic arrays, and associative arrays in particular?

An example of my problem:

struct Datum {
     string Foo;
     int Bar;
} 

Datum[] Collate() {
    Datum[] data;
    int[] userDataSet;
    int[string] secondarySet;
    string[] raw = splitLines(readText(readFile)).dup;

    foreach (r; raw) {
        userDataSet ~= parse(r);
        secondarySet[r.split(",").dup] = parseSomeOtherWay(r);
    }

    data = doSomeOtherCalculation(userDataSet, secondarySet);

    return data;
}
2

There are 2 answers

1
Vladimir Panteleev On BEST ANSWER

Are the strings in the returned data still pointing inside the original text file?

Array slicing operations in D do not make a copy of the data - instead, they just store a pointer and length. This also applies to splitLines, split, and possibly to doSomeOtherCalculation. This means that as long as a substring of the original file text exists anywhere in the program, the entire file's contents cannot be freed.

If the data you're returning is only a small fraction of the size of the text file you're reading, you can use .dup to make a copy of the string. This will prevent the small strings from pinning the entire file's contents in memory.

1
Abstract type On

If the content of the Collate() result is duplicated after the call, it's probable that it's not collected by the GC and thus resides in memory while it's not used anymore. If so then you can use a global container that you reset for each Collate():

void Collate(out Datum[] data) {
    // data content is cleared because of 'out' param storage class
    // your processing to fill data
}