Asked  8 Months ago    Answers:  5   Viewed   45 times

First of all, I understand in 90% of applications the performance difference is completely irrelevant, but I just need to know which is the faster construct. That and...

The information currently available on them on the net is confusing. A lot of people say foreach is bad, but technically it should be faster since it's suppose to simplify writing a array traversal using iterators. Iterators, which are again suppose to be faster, but in PHP are also apparently dead slow (or is this not a PHP thing?). I'm talking about the array functions: next() prev() reset() etc. well, if they are even functions and not one of those PHP language features that look like functions.

To narrow this down a little: I'm not interesting in traversing arrays in steps of anything more than 1 (no negative steps either, ie. reverse iteration). I'm also not interested in a traversal to and from arbitrary points, just 0 to length. I also don't see manipulating arrays with more than 1000 keys happening on a regular basis, but I do see a array being traversed multiple times in the logic of a application! Also as for operations, largely only string manipulation and echo'ing.

Here are few reference sites:
http://www.phpbench.com/
http://www.php.lt/benchmark/phpbench.php

What I hear everywhere:

  • foreach is slow, and thus for/while is faster
  • PHPs foreach copies the array it iterates over; to make it faster you need to use references
  • code like this: $key = array_keys($aHash); $size = sizeOf($key);
    for ($i=0; $i < $size; $i++)
    is faster than a foreach

Here's my problem. I wrote this test script: http://pastebin.com/1ZgK07US and no matter how many times I run the script, I get something like this:

foreach 1.1438131332397
foreach (using reference) 1.2919359207153
for 1.4262869358063
foreach (hash table) 1.5696921348572
for (hash table) 2.4778981208801

In short:

  • foreach is faster than foreach with reference
  • foreach is faster than for
  • foreach is faster than for for a hash table

Can someone explain?

  1. Am I doing something wrong?
  2. Is PHP foreach reference thing really making a difference? I mean why would it not copy it if you pass by reference?
  3. What's the equivalent iterator code for the foreach statement; I've seen a few on the net but each time I test them the timing is way off; I've also tested a few simple iterator constructs but never seem to get even decent results -- are the array iterators in PHP just awful?
  4. Are there faster ways/methods/constructs to iterate though a array other than FOR/FOREACH (and WHILE)?

PHP Version 5.3.0


Edit: Answer With help from people here I was able to piece together the answers to all question. I'll summarize them here:
  1. "Am I doing something wrong?" The consensus seems to be: yes, I can't use echo in benchmarks. Personally, I still don't see how echo is some function with random time of execution or how any other function is somehow any different -- that and the ability of that script to just generate the exact same results of foreach better than everything is hard to explain though just "you're using echo" (well what should I have been using). However, I concede the test should be done with something better; though a ideal compromise does not come to mind.
  2. "Is PHP foreach reference thing really making a difference? I mean why would it not copy it if you pass by reference?" ircmaxell shows that yes it is, further testing seems to prove in most cases reference should be faster -- though given my above snippet of code, most definitely doesn't mean all. I accept the issue is probably too non-intuitive to bother with at such a level and would require something extreme such as decompiling to actually determine which is better for each situation.
  3. "What's the equivalent iterator code for the foreach statement; I've seen a few on the net but each time I test them the timing is way off; I've also tested a few simple iterator constructs but never seem to get even decent results -- are the array iterators in PHP just awful?" ircmaxell provided the answer bellow; though the code might only be valid for PHP version >= 5
  4. "Are there faster ways/methods/constructs to iterate though a array other than FOR/FOREACH (and WHILE)?" Thanks go to Gordon for the answer. Using new data types in PHP5 should give either a performance boost or memory boost (either of which might be desirable depending on your situation). While speed wise a lot of the new types of array don't seem to be better than array(), the splpriorityqueue and splobjectstorage do seem to be substantially faster. Link provided by Gordon: http://matthewturland.com/2010/05/20/new-spl-features-in-php-5-3/

Thank you everyone who tried to help.

I'll likely stick to foreach (the non-reference version) for any simple traversal.

 Answers

96

My personal opinion is to use what makes sense in the context. Personally I almost never use for for array traversal. I use it for other types of iteration, but foreach is just too easy... The time difference is going to be minimal in most cases.

The big thing to watch for is:

for ($i = 0; $i < count($array); $i++) {

That's an expensive loop, since it calls count on every single iteration. So long as you're not doing that, I don't think it really matters...

As for the reference making a difference, PHP uses copy-on-write, so if you don't write to the array, there will be relatively little overhead while looping. However, if you start modifying the array within the array, that's where you'll start seeing differences between them (since one will need to copy the entire array, and the reference can just modify inline)...

As for the iterators, foreach is equivalent to:

$it->rewind();
while ($it->valid()) {
    $key = $it->key();     // If using the $key => $value syntax
    $value = $it->current();

    // Contents of loop in here

    $it->next();
}

As far as there being faster ways to iterate, it really depends on the problem. But I really need to ask, why? I understand wanting to make things more efficient, but I think you're wasting your time for a micro-optimization. Remember, Premature Optimization Is The Root Of All Evil...

Edit: Based upon the comment, I decided to do a quick benchmark run...

$a = array();
for ($i = 0; $i < 10000; $i++) {
    $a[] = $i;
}

$start = microtime(true);
foreach ($a as $k => $v) {
    $a[$k] = $v + 1;
}
echo "Completed in ", microtime(true) - $start, " Secondsn";

$start = microtime(true);
foreach ($a as $k => &$v) {
    $v = $v + 1;
}
echo "Completed in ", microtime(true) - $start, " Secondsn";

$start = microtime(true);
foreach ($a as $k => $v) {}
echo "Completed in ", microtime(true) - $start, " Secondsn";

$start = microtime(true);
foreach ($a as $k => &$v) {}    
echo "Completed in ", microtime(true) - $start, " Secondsn";

And the results:

Completed in 0.0073502063751221 Seconds
Completed in 0.0019769668579102 Seconds
Completed in 0.0011849403381348 Seconds
Completed in 0.00111985206604 Seconds

So if you're modifying the array in the loop, it's several times faster to use references...

And the overhead for just the reference is actually less than copying the array (this is on 5.3.2)... So it appears (on 5.3.2 at least) as if references are significantly faster...

Wednesday, March 31, 2021
 
muaddhib
answered 8 Months ago
78

In order to echo out the bits you have to select their index in each array -

foreach($array as $arr){
    echo '<a href="'.$arr[0].'">'.$arr[1].'</a>';
}

Here is an example.

Wednesday, March 31, 2021
 
StampyCode
answered 8 Months ago
28
Stream.of("AAA","BBB","CCC").parallel().forEach(s->System.out.println("Output:"+s));
Stream.of("AAA","BBB","CCC").parallel().forEachOrdered(s->System.out.println("Output:"+s));

The second line will always output

Output:AAA
Output:BBB
Output:CCC

whereas the first one is not guaranted since the order is not kept. forEachOrdered will processes the elements of the stream in the order specified by its source, regardless of whether the stream is sequential or parallel.

Quoting from forEach Javadoc:

The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism.

When the forEachOrdered Javadoc states (emphasis mine):

Performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order.

Wednesday, June 2, 2021
 
Pegues
answered 5 Months ago
22

as if because, in one case, a giant pointless loop from zero to length would be is executed, but not in the other case.


According to the ECMA documentation:

  1. The .forEach method will loop through all array elements by its .length property.
  2. The callback passed to .forEach will only be invoked if an element is not empty.

To demonstrate this you can simply do:

function doNothing(){}
let perf;


console.log('Array with 50 million length and 1 non-empty element:');
const a = [];
a[49999999] = 'a';
console.log('a.length:', a.length);

perf = performance.now();
a.forEach(doNothing);
console.log('a:', performance.now() - perf);
console.log('');


console.log('Array with 0 length:');
const b = [];
b.foo = 'a';
console.log('b.length:', b.length);

perf = performance.now();
b.forEach(doNothing);
console.log('b:', performance.now() - perf);
console.log('');


console.log('Array with 50 million length and 0 non-empty element:');
const c = [];
c.length = 50000000;
console.log('c.length:', c.length);

perf = performance.now();
c.forEach(doNothing);
console.log('c:', performance.now() - perf);
Thursday, August 19, 2021
 
Peter Thomas
answered 3 Months ago
59

This is an apples to oranges comparison. See http://redis.io/topics/benchmarks

Redis is an efficient remote data store. Each time a command is executed on Redis, a message is sent to the Redis server, and if the client is synchronous, it blocks waiting for the reply. So beyond the cost of the command itself, you will pay for a network roundtrip or an IPC.

On modern hardware, network roundtrips or IPCs are suprisingly expensive compared to other operations. This is due to several factors:

  • the raw latency of the medium (mainly for network)
  • the latency of the operating system scheduler (not guaranteed on Linux/Unix)
  • memory cache misses are expensive, and the probability of cache misses increases while the client and server processes are scheduled in/out.
  • on high-end boxes, NUMA side effects

Now, let's review the results.

Comparing the implementation using generators and the one using function calls, they do not generate the same number of roundtrips to Redis. With the generator you simply have:

    while time.time() - t - expiry < 0:
        yield r.get(fpKey)

So 1 roundtrip per iteration. With the function, you have:

if r.exists(fpKey):
    return r.get(fpKey)

So 2 roundtrips per iteration. No wonder the generator is faster.

Of course you are supposed to reuse the same Redis connection for optimal performance. There is no point to run a benchmark which systematically connects/disconnects.

Finally, regarding the performance difference between Redis calls and the file reads, you are simply comparing a local call to a remote one. File reads are cached by the OS filesystem, so they are fast memory transfer operations between the kernel and Python. There is no disk I/O involved here. With Redis, you have to pay for the cost of the roundtrips, so it is much slower.

Monday, September 6, 2021
 
jonboy
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share