I'm experimenting with MATLAB OOP, as a start I mimicked my C++'s Logger classes and I'm putting all my string helper functions in a String class, thinking it would be great to be able to do things like a + b
, a == b
, a.find( b )
instead
of strcat( a b )
, strcmp( a, b )
, retrieve first element of strfind( a, b )
, etc.
The problem: slowdown
I put the above things to use and immediately noticed a drastic slowdown. Am I doing it wrong (which is certainly possible as I have rather limited MATLAB experience), or does MATLAB's OOP just introduce a lot of overhead?
My test case
Here's the simple test I did for string, basically just appending a string and removing the appended part again:
Note: Don't actually write a String class like this in real code! Matlab has a native
string
array type now, and you should use that instead.
classdef String < handle
....
properties
stringobj = '';
end
function o = plus( o, b )
o.stringobj = [ o.stringobj b ];
end
function n = Length( o )
n = length( o.stringobj );
end
function o = SetLength( o, n )
o.stringobj = o.stringobj( 1 : n );
end
end
function atest( a, b ) %plain functions
n = length( a );
a = [ a b ];
a = a( 1 : n );
function btest( a, b ) %OOP
n = a.Length();
a = a + b;
a.SetLength( n );
function RunProfilerLoop( nLoop, fun, varargin )
profile on;
for i = 1 : nLoop
fun( varargin{ : } );
end
profile off;
profile report;
a = 'test';
aString = String( 'test' );
RunProfilerLoop( 1000, @(x,y)atest(x,y), a, 'appendme' );
RunProfilerLoop( 1000, @(x,y)btest(x,y), aString, 'appendme' );
The results
Total time in seconds, for 1000 iterations:
btest 0.550 (with String.SetLength 0.138, String.plus 0.065, String.Length 0.057)
atest 0.015
Results for the logger system are likewise: 0.1 seconds for 1000 calls
to frpintf( 1, 'test\n' )
, 7 (!) seconds for 1000 calls to my system when using the String class internally (OK, it has a lot more logic in it, but to compare with C++: the overhead of my system that uses std::string( "blah" )
and std::cout
at the output side vs plain std::cout << "blah"
is on the order of 1 millisecond.)
Is it just overhead when looking up class/package functions?
Since MATLAB is interpreted, it has to look up the definition of a function/object at run time. So I was wondering that maybe much more overhead is involved in looking up class or package function vs functions that are in the path. I tried to test this, and it just gets stranger. To rule out the influence of classes/objects, I compared calling a function in the path vs a function in a package:
function n = atest( x, y )
n = ctest( x, y ); % ctest is in matlab path
function n = btest( x, y )
n = util.ctest( x, y ); % ctest is in +util directory, parent directory is in path
Results, gathered same way as above:
atest 0.004 sec, 0.001 sec in ctest
btest 0.060 sec, 0.014 sec in util.ctest
So, is all this overhead just coming from MATLAB spending time looking up definitions for its OOP implementation, whereas this overhead is not there for functions that are directly in the path?
I've been working with OO MATLAB for a while, and ended up looking at similar performance issues.
The short answer is: yes, MATLAB's OOP is kind of slow. There is substantial method call overhead, higher than mainstream OO languages, and there's not much you can do about it. Part of the reason may be that idiomatic MATLAB uses "vectorized" code to reduce the number of method calls, and per-call overhead is not a high priority.
I benchmarked the performance by writing do-nothing "nop" functions as the various types of functions and methods. Here are some typical results.
Similar results on R2008a through R2009b. This is on Windows XP x64 running 32-bit MATLAB.
The "Java nop()" is a do-nothing Java method called from within an M-code loop, and includes the MATLAB-to-Java dispatch overhead with each call. "Java nop() from Java" is the same thing called in a Java for() loop and doesn't incur that boundary penalty. Take the Java and C timings with a grain of salt; a clever compiler could optimize the calls away completely.
The package scoping mechanism is new, introduced at about the same time as the classdef classes. Its behavior may be related.
A few tentative conclusions:
obj.nop()
syntax is slower than thenop(obj)
syntax, even for the same method on a classdef object. Same for Java objects (not shown). If you want to go fast, callnop(obj)
.Saying why this is so would just be speculation on my part. The MATLAB engine's OO internals aren't public. It's not an interpreted vs compiled issue per se - MATLAB has a JIT - but MATLAB's looser typing and syntax may mean more work at run time. (E.g. you can't tell from syntax alone whether "f(x)" is a function call or an index into an array; it depends on the state of the workspace at run time.) It may be because MATLAB's class definitions are tied to filesystem state in a way that many other languages' are not.
So, what to do?
An idiomatic MATLAB approach to this is to "vectorize" your code by structuring your class definitions such that an object instance wraps an array; that is, each of its fields hold parallel arrays (called "planar" organization in the MATLAB documentation). Rather than having an array of objects, each with fields holding scalar values, define objects which are themselves arrays, and have the methods take arrays as inputs, and make vectorized calls on the fields and inputs. This reduces the number of method calls made, hopefully enough that the dispatch overhead is not a bottleneck.
Mimicking a C++ or Java class in MATLAB probably won't be optimal. Java/C++ classes are typically built such that objects are the smallest building blocks, as specific as you can (that is, lots of different classes), and you compose them in arrays, collection objects, etc, and iterate over them with loops. To make fast MATLAB classes, turn that approach inside out. Have larger classes whose fields are arrays, and call vectorized methods on those arrays.
The point is to arrange your code to play to the strengths of the language - array handling, vectorized math - and avoid the weak spots.
EDIT: Since the original post, R2010b and R2011a have come out. The overall picture is the same, with MCOS calls getting a bit faster, and Java and old-style method calls getting slower.
EDIT: I used to have some notes here on "path sensitivity" with an additional table of function call timings, where function times were affected by how the Matlab path was configured, but that appears to have been an aberration of my particular network setup at the time. The chart above reflects the times typical of the preponderance of my tests over time.
Update: R2011b
EDIT (2/13/2012): R2011b is out, and the performance picture has changed enough to update this.
I think the upshot of this is that:
foo(obj)
syntax. So method speed is no longer a reason to stick with old style classes in most cases. (Kudos, MathWorks!)Update: R2014a
I've reconstructed the benchmarking code and run it on R2014a.
Update: R2015b: Objects got faster!
Here's R2015b results, kindly provided by @Shaked. This is a big change: OOP is significantly faster, and now the
obj.method()
syntax is as fast asmethod(obj)
, and much faster than legacy OOP objects.Update: R2018a
Here's R2018a results. It's not the huge jump that we saw when the new execution engine was introduced in R2015b, but it's still an appreciable year over year improvement. Notably, anonymous function handles got way faster.
Update: R2018b and R2019a: No change
No significant changes. I'm not bothering to include the test results.
Update: R2021a: Even faster objects!
Looks like classdef objects have gotten significantly faster again. But structs have gotten slower.
Source Code for Benchmarks
I've put the source code for these benchmarks up on GitHub, released under the MIT License. https://github.com/apjanke/matlab-bench