Is there a nice way, in Python, to measure the number of memory accesses, or mems, used by a function?

106 views Asked by At

When people ask about memory, they're often asking how much memory is being used, but this is not what I mean.

Rather, I'm reading through Donald Knuth's The Art of Computer Programming, and recreating some of the algorithms in Python. Knuth measures the time it took one of his programs to run in mems, meaning the number of times an area of memory was read from or written to. This is a nice way to measure the time an algorithm takes as an exact number that is more independent of chip architecture or speed.

As an example of what I'm looking for, consider this example script:

mylist = [1, 2, 3]
x = 2
y = mylist[x]

You might say there are 5 mems here, like this:

mylist = [1, 2, 3]  # Write to `mylist`. 1 mem
x = 2               # Write to `x`.      1 mem
y = mylist[x]       # Read from `x`, `mylist`; write to `y`. 3 mems

You could also argue that the assignment to mylist ought to count as multiple writes because it represents more memory usage than a single value.

Right now I'm just trying to solve the high-level problem of having some (any) way to reasonably measure mems, ideally without having to do any fancy magic-looking coding :) Later on, I may start to worry more about details like "what is the best way," or "how many mems should this line count as," but this question is focused on "what is the first way to begin doing this?"

And I do mean programmatically, as in, I run a function, and somewhere in Python is a variable which has kept track of the number of mems used as the function's been run. (As opposed to, say, a human statically analyzing the program to get a count, or manually adding n += 1 for every memory access.)

1

There are 1 answers

0
pabloabur On

I don't really know of a nice way of doing this, but I think it would actually go against Knuth's original recommendation. In The Stanford GraphBase, he declared

#define o mems++
#define oo mems += 2
#define ooo mems += 3

and then proceeded to manually add those macros, for instance

...
o, a->from = v;
oo, a->klink = aucket[l];
...

His reasons for doing so were

(1) The macros can be inserted easily and quickly using a text editor. (2) An implementation need not pay for mems that could be avoided by a suitable optimizing compiler or by making the C program text slightly more complex; thus, authors can use their good judgment to keep programs more readable than if the code were overly hand-optimized. (3) The programmer should be able to see exactly where mems are being charged, as an aid to bottleneck elimination. Occurrences of o and oo make this plain without messing up the program text. (4) An implementation need not be charged for mems that merely provide diagnostic output, or mems that do redundant computations just to double-check the validity of "proven" assertions as a program is being tested.


Regarding the point that "the metric is not meaningful" because Python is not like C or Assembly, note that Knuth proposed this to compare algorithms in terms of memory references. So even if Python deals with pointers, this can still be a helpful comparison between competing implementations.