Consider the following simple speed test for
T = 4000; N = 500; x = randn(T, N); Func1 = @(a) (3*a^2 + 2*a - 1); tic Soln1 = ones(T, N); for t = 1:T for n = 1:N Soln1(t, n) = Func1(x(t, n)); end end toc tic Soln2 = arrayfun(Func1, x); toc
On my machine (Matlab 2011b on Linux Mint 12), the output of this test is:
Elapsed time is 1.020689 seconds. Elapsed time is 9.248388 seconds.
arrayfun, while admittedly a cleaner looking solution, is an order of magnitude slower. What is going on here?
Further, I did a similar style of test for
cellfun and found it to be about 3 times slower than an explicit loop. Again, this result is the opposite of what I expected.
My question is: Why are
cellfun so much slower? And given this, are there any good reasons to use them (other than to make the code look good)?
Note: I'm talking about the standard version of
arrayfun here, NOT the GPU version from the parallel processing toolbox.
EDIT: Just to be clear, I'm aware that
Func1 above can be vectorized as pointed out by Oli. I only chose it because it yields a simple speed test for the purposes of the actual question.
EDIT: Following the suggestion of grungetta, I re-did the test with
feature accel off. The results are:
Elapsed time is 28.183422 seconds. Elapsed time is 23.525251 seconds.
In other words, it would appear that a big part of the difference is that the JIT accelerator does a much better job of speeding up the explicit
for loop than it does
arrayfun. This seems odd to me, since
arrayfun actually provides more information, ie, its use reveals that the order of the calls to
Func1 do not matter. Also, I noted that whether the JIT accelerator is switched on or off, my system only ever uses one CPU...