I'm reading lists from a large file, which I eventually want to store as array.array
s. Because
map(int, line.split())
is very slow, I wrote a small C module which does strtok and a faster version of atoi:
inline long
minhashTables_myatoi(const char* s)
{
int r;
for (r = 0; *s; r = r * 10 + *s++ - '0');
return r;
}
static PyObject*
minhashTables_ints(PyObject *self, PyObject *args)
{
char* s;
Py_ssize_t slen;
if(!PyArg_ParseTuple(args, "s#", &s, &slen))
return NULL;
long* buf = malloc(sizeof(long) * (slen+1)/2);
const char* tok = strtok(s, " ");
buf[0] = minhashTables_myatoi(tok);
Py_ssize_t i;
for(i = 1; (tok = strtok(NULL, " ")) != NULL; i++)
buf[i] = minhashTables_myatoi(tok);
Py_ssize_t buflen = i;
PyObject* list = PyList_New(buflen);
PyObject *o;
for(i = 0; i < buflen; i++)
{
o = PyInt_FromLong(buf[i]);
PyList_SET_ITEM(list, i, o);
}
free(buf);
return list;
}
So my python script calls ints()
with a string and passes it to the array.array
constructor and saves the resulting array in a list
.
My problem is, that now the script leaks memory, which it did not with the map instead of the ints()
function, of course.
Also using my own version of Pythons int()
using a C module does not leak memory.
Thanks for your help!
Edit: To valgrind the module I used this script:
import minhashTables
data = ' '.join(map(str, range(10)))
print 'start'
foo = minhashTables.ints(data)
del data
del foo
print 'stop'
And I run valgrind --tool=memcheck --leak-check=full --show-reachable=yes python test.py
, but there so no output from valgrind between start
and stop
, through there are tons before and afterwards.
Edit: Code for confirming it's leaking: import minhashTables
for i in xrange(1000000000):
data = ' '.join(map(str, range(10, 10000)))
foo = minhashTables.ints(data)
I have to recreate the string, because strtok changes it. By the way, copying the string into another memory location doesn't change the behavior.
Try this