Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursion instead of goto #3

Closed
baghayi opened this issue Sep 22, 2014 · 198 comments
Closed

Recursion instead of goto #3

baghayi opened this issue Sep 22, 2014 · 198 comments

Comments

@baghayi
Copy link

baghayi commented Sep 22, 2014

May I ask if there is a reason why you prefer using goto instead of function recursion?

@igorw
Copy link
Owner

igorw commented Sep 22, 2014

Why hello! Thank you for asking this most excellent question!

I have indeed considered alternatives to the goto. I have evaluated them to a great extent, and I am happy to present the results to you here.

When the PHP parser reads a source file, that source code is compiled down to a series of opcodes, that will then be executed by the Zend (tm) (r) Engine. The compiler does some basic optimizations, but is really quite stupid. And so, depending on what code you write, it will generate different opcodes. This has a direct performance impact.

There are several way in which a loop can be written. Let's start with the one you have proposed, the recursive call.

function retry($retries, callable $fn)
{
    try {
        return $fn();
    } catch (\Exception $e) {
        if (!$retries) {
            throw new FailingTooHardException('', 0, $e);
        }
        retry($retries - 1, $fn)
    }
}

When you give this code to the PHP compiler, it will generate these opcodes:

function name:  igorw\retry
number of ops:  24
compiled vars:  !0 = $retries, !1 = $fn, !2 = $e
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   7     0  >   RECV                                             !0      
         1      RECV                                             !1      
  11     2      INIT_FCALL_BY_NAME                                       !1
         3      DO_FCALL_BY_NAME                              0  $0      
         4    > RETURN                                                   $0
  12     5*     JMP                                                      ->23
         6  >   CATCH                                        17          'Exception', !2
  13     7      BOOL_NOT                                         ~1      !0
         8    > JMPZ                                                     ~1, ->17
  14     9  >   FETCH_CLASS                                   4  :2      'igorw%5CFailingTooHardException'
        10      NEW                                              $3      :2
        11      SEND_VAL                                                 ''
        12      SEND_VAL                                                 0
        13      SEND_VAR                                                 !2
        14      DO_FCALL_BY_NAME                              3          
        15    > THROW                                         0          $3
  15    16*     JMP                                                      ->17
  16    17  >   INIT_NS_FCALL_BY_NAME                                    
        18      SUB                                              ~5      !0, 1
        19      SEND_VAL                                                 ~5
        20      SEND_VAR                                                 !1
        21      DO_FCALL_BY_NAME                              2  $6      
        22    > RETURN                                                   $6
  18    23*   > RETURN                                                   null

As you can see, it is generating 24 instructions. The most expensive portion of this is the function calls, since the arguments need to be sent individually, and there is an additional instruction (DO_FCALL_BY_NAME) for the actual function call.

There is no reason why this would be necessary. As described by Steele in his paper Lambda: The Ultimate GOTO, tail calls can be compiled to very efficient instructions. The PHP compiler however, does not take advantage of this technique, so calls are quite costly.

Let's try and improve this. By using a while loop.

function retry($retries, callable $fn)
{
    while (true) {
        try {
            return $fn();
        } catch (\Exception $e) {
            if (!$retries) {
                throw new FailingTooHardException('', 0, $e);
            }
            $retries--;
        }
    }
}

Here is what the compiler gives us:

function name:  igorw\retry
number of ops:  23
compiled vars:  !0 = $retries, !1 = $fn, !2 = $e
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   7     0  >   RECV                                             !0      
         1      RECV                                             !1      
   9     2  >   FETCH_CONSTANT                                   ~0      'igorw%5Ctrue'
         3    > JMPZ                                                     ~0, ->22
  11     4  >   INIT_FCALL_BY_NAME                                       !1
         5      DO_FCALL_BY_NAME                              0  $1      
         6    > RETURN                                                   $1
  12     7*     JMP                                                      ->21
         8  >   CATCH                                        15          'Exception', !2
  13     9      BOOL_NOT                                         ~2      !0
        10    > JMPZ                                                     ~2, ->19
  14    11  >   FETCH_CLASS                                   4  :3      'igorw%5CFailingTooHardException'
        12      NEW                                              $4      :3
        13      SEND_VAL                                                 ''
        14      SEND_VAL                                                 0
        15      SEND_VAR                                                 !2
        16      DO_FCALL_BY_NAME                              3          
        17    > THROW                                         0          $4
  15    18*     JMP                                                      ->19
  16    19  >   POST_DEC                                         ~6      !0
        20      FREE                                                     ~6
  18    21    > JMP                                                      ->2
  19    22  > > RETURN                                                   null

This already looks a bit better. But there is a rather inefficient FETCH_CONSTANT instruction right at the top. This requires doing a namespace lookup against igorw\true. We can optimize that, by replacing while (true) with while (\true).

This gets rid of the FETCH_CONSTANT call, and puts the boolean true inline:

line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   7     0  >   RECV                                             !0      
         1      RECV                                             !1      
   9     2  > > JMPZ                                                     true, ->21

But JUMPZ with argument true is a redundant expression. true is never zero. So ideally we would simply eliminate this check.

PS: for (;;) also has redundant jumps, so let's not use that.

So can we eliminate the redundant jump? Let's try a do-while loop!

function name:  igorw\retry
number of ops:  21
compiled vars:  !0 = $retries, !1 = $fn, !2 = $e
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   7     0  >   RECV                                             !0      
         1      RECV                                             !1      
  11     2  >   INIT_FCALL_BY_NAME                                       !1
...
        15    > THROW                                         0          $3
  15    16*     JMP                                                      ->17
  16    17  >   POST_DEC                                         ~5      !0
        18      FREE                                                     ~5
  18    19    > JMPNZ                                                    true, ->2
  19    20  > > RETURN                                                   null

Awesome! The extra initial JMPZ has been removed! But, it comes at the cost of a JMPNZ at the end. So it will be more efficient for the success case, but for the retry, we will still do redundant checks for a jump that should be unconditional.

And there is a way to eliminate that final conditional, by using the excellent goto feature built into PHP. It produces the identical opcodes as do...while, but makes that final jump non-conditional!

And there you have it. This is the most efficient solution to do non-conditional loops in PHP. All other options are simply too slow.

@igorw igorw closed this as completed Sep 22, 2014
@ssx
Copy link

ssx commented Sep 22, 2014

Igor will you marry me <3

@michellesanver
Copy link

SCOTT!!! :(

@ssx
Copy link

ssx commented Sep 22, 2014

BUT LOOK AT THIS MAN

@dazz
Copy link

dazz commented Sep 22, 2014

@michellesanver Don't worry, I sit next to him and I was first!

@mickaelandrieu
Copy link

Thanks for this explanation, your PHP knowledge is amazing o_O

@baghayi
Copy link
Author

baghayi commented Sep 23, 2014

Hello @iGore
Thank you for this great explanation.
It was very helpful.

@b-viguier
Copy link

So, you have to rename your Twitter account igorgoto, it will be more efficient... :)

@asgrim
Copy link

asgrim commented Sep 23, 2014

This is the best justification for using "goto" in real life ever. Nicely done Igor.

@rainbow-alex
Copy link

Why not manually unroll the loop a few times. Most invocations will be for small n, so you might eliminate jumps alltogether.

@docteurklein
Copy link

+1 for marrying me too :)

@umpirsky
Copy link

@igorw You should write a book "Answering GitHub Issues The Right Way".

@acleon
Copy link

acleon commented Sep 23, 2014

Thanks for the explanation, but I'm unclear as to where the do...while approach falls down. Where would the overhead be on something like:

do {
    try {
        return $fn();
    }
    catch (\Exception $e) { }
}
while($retries--);

throw new FailingTooHardException('', 0, $e);

@davestevens
Copy link

@acleon I was a bit confused when I was reading this too, if you look at the actually code in the repo:
https://github.com/igorw/retry/blob/6c85162523e6a036f5b65c8cb2ec16c65b120d44/src/retry.php
You'll see that by using goto rather than a loop they are removing the use of conditional jump operands.

@jcoleman
Copy link

"All other options are simply too slow."

Have you actually benchmarked this? You do realize that the cost of network operations is several orders of magnitude larger than the cost of a few extra opcodes?

This is a textbook illustration of premature optimization that gains almost no optimization and sacrifices lots of maintainability in the process. If your purpose is purely academic, then go for it, I suppose. Otherwise...

@umpirsky
Copy link

...otherwise you can still go for it since implementation is fairly simple and consists of only several lines of code, so it is not a big deal.

@jcoleman
Copy link

So readability and maintainability are "not a big deal"?

@umpirsky
Copy link

@jcoleman I don't see any readability nor maintainability issues in this project. Do you?

@tminard
Copy link

tminard commented Sep 23, 2014

goto

@dave1010
Copy link

@davestevens @igorw are conditional jump operands (while($retries)) any different to doing a conditaional (if (!$retries)), and then a jump (goto beginning)?

@jcoleman
Copy link

For reference: https://files.ifi.uzh.ch/rerg/arvo/courses/kvse/uebungen/Dijkstra_Goto.pdf

This is a great demonstration of why so many computer science people look down on the PHP community. Your justification is basically "why not", when my point is that "it doesn't actually gain you anything.

@jbrooksuk
Copy link

@igorw since you're an internal PHP developer, can't you do something about this?

@ankitml
Copy link

ankitml commented Sep 23, 2014

Any improvement which is not done to the worst performing step is not an improvement at all. The code is as fast as the slowest step. So irrespective of this detailed analysis, you need to first show that this loop is the slowest thing in the program. The whole point is you should have strong reasons to go against the standard practices of good programming.

@umpirsky
Copy link

@jcoleman Peronaly, I never use gotos, because such tiny performance improvements does not change anything in apps I write.
But @igorw stated that performance is the most important thing he considered when choosing between different implementations.
You said:

You do realize that the cost of network operations is several orders of magnitude larger than the cost of a few extra opcode.

I don't see any network operations in this project, so your argument is not apply-able here.

@jcoleman
Copy link

"I don't see any network operations in this project, so your argument is not apply-able here."

The readme specifically states that the primary use-case for this project is for retrying network operations. Where else would you use this construct?

@umpirsky
Copy link

@jcoleman Hehe, right. I agree, in case of retrying network operations, this tiny benefit is irrelevant.

@hausdorff
Copy link

@jcoleman @umpirsky Yeah, but your point remains: this code is not unmaintainable. You can complain that it doesn't fit a style standard, but that's about as far as @jcoleman is going to get here. Particularly since the GOTO is restricted just like it is in C (where it is used heavily and where it is even considered good style for things like error handling), which means most of the complaints that Dijkstra is making aren't even applicable here.

@jcoleman
Copy link

This is not C. Is there another language where GOTO is considered good style? Not in common languages.

@josegonzalez
Copy link

Semi-related:

I thought you could not namespace the following constants

  • NULL
  • TRUE
  • FALSE
  • ZEND_THREAD_SAFE
  • ZEND_DEBUG_BUILD

Though this php eval says you can for any version of PHP that supports namespaces (though not on HHVM).

The best part is that if you don't use define to define the named constant, it does show the behavior PHP says isn't possible:

http://3v4l.org/1vAHs

@eddiejaoude
Copy link

Very good reply! 👍

@josegonzalez
Copy link

@bloodyowl says the guy whose entire life depends on not blocking a single thread...

@vita10gy
Copy link

I don't understand the "this is a micro library so one little goto doesn't count the same as it otherwise might" argument. "It's 19 lines, get over it @jcoleman"

Any well designed codebase will have plenty of things encapsulated into little one job routines like this. You should be working with 20 lines of code, give or take, at a time, all the time. The same "well surely a goto is fine in this little ol' function...for speed" argument would apply constantly, and that's how in the big picture you wind up with an unmaintainable turd sandwich.

WE are the expense. WE should strive to do everything we can to ensure everything we do can be maintained as easily and painlessly as possible. If that isn't the fastest possible code, then spend a comparative pittance on upgrading the sever.

Recursion should probably be avoided where a loop is just as painless and readable, but being worried about the "cost" of a loop is, IMO, silly. There are almost certainly always better places for you to invest your time than avoiding the cost of a bleeping while loop in anything you'd be using PHP* to accomplish.

*Edit: This was not a "bash PHP" point. But you're not writing a first person shooter or other such thing where wringing every clock cycle possible out of a predefined system is important, in PHP. If all the "extra" memory of all the while loops you use is effecting your server, then spend $50 on some RAM. Don't spend your time on stuff like this.

@josegonzalez
Copy link

Has anyone here made a pull request to change this to a faster method?

@darkrain
Copy link

now i know how i can get womens

@cloudward
Copy link

@vita10gy yes... this isn't about high level coding style guides... it's just a low level view into how PHP CAN be optimized. in all my years writing PHP, i think i have maybe 2 goto commands in the million+ lines of code i've written, and they were in a compiler for a dynamic language that lives side by side with PHP. using the goto was by far the best solution speed wise and maintainability wise. i'm passionate about this because the "never use goto" camp is strong, and i don't want to see it deprecated. there are very few use cases where it makes a lot of sense, but they do exist.

BTW, thanks for submitting all those portals in phoenix park ;) we live close.

@vita10gy
Copy link

@cloudward You ENL or RES?

Edit: also, as far as this goes, I'm not sure you can in one breath say that you've used goto 2 times in all your years, and in the next worry that the 'no goto ever' brigade is trying to rid the world of a useful tool. Almost by definition something can't be almost never useful and super useful.

I think you could make a good argument that even if it has a very rare proper usage, (which while I can't think of any, I'm sure it does), and lots of ways to completely misuse it, that it would be for the greater good to just not have it.

I mean, I can think of a couple use cases why rigging the windows in my house with explosives would be a good idea. All that really matters is if those use cases are worth all the times I blow guests or my wife up.

@jcoleman
Copy link

@cloudward I have to say that I'd totally agree with most of that. I'm not advocating eliminating GOTO, and I'm not advocating never using it (remember, I've used GOTO in production too).

If the purpose of this code usage was purely and exercise in how to do low-level optimization in PHP, then I think that's cool too.

I just don't think that it's necessary (or that the speed it potentially gains is relevant) here in this use case.

@cloudward
Copy link

@vita10gy goto is like nuclear weapons... if you're using one, you better have a good reason, and most of the people who have them in good faith plan on not using them, but they're still building more and more of them.

i'm ENL

@vita10gy
Copy link

Figures, ENL would be GOTO lovers. ;)

@cloudward
Copy link

@jcoleman i added in the do while to the test, and used as many gotos as i could. the case in point is "n = 0"... where in wall time goto and do while are often exactly equal, and sometimes either one of them are faster. that directly relates to your post about adding a useless MOV to significantly speed up a system... there might be timings that are ripe for better overall performance, if everyone is doing things as conservatively as possible, that might block everyone from doing anything productively. something about good will hunting wanting a blonde. but technically, in the n = 0 case (which is expected to happen most frequently), the goto does a single less thing at the lowest possible level.

<?php

$n = 1000;
$a = 1;

start:

$start = microtime(true);

try {
    $a = 1;
    while(\true) {
        if($a > $n) {
            throw new Exception('done');
        }
        $a++;
    }
} catch(Exception $e) {
}

$after_while = microtime(true);

try {
    $a = 1;
    start_goto:
    if($a > $n) {
        throw new Exception('done');
    }
    $a++;
    goto start_goto;
} catch(Exception $e) {
}

$after_goto = microtime(true);

try {
    $a = 1;
    do {
        if($a > $n) {
            throw new Exception('done');
        }
        $a++;
    } while(\true);
} catch(Exception $e) {
}

$end = microtime(true);

echo "n = $n\n\twhile    " . ($after_while-$start) . "\n\tgoto     " . ($after_goto-$after_while) . "\n\tdo while " . ($end-$after_goto) . "\n\n";

$n = $n / 10;
if($n >= 1) {
    goto start;
} elseif($n > 0) {
    $n = 0;
    goto start;
}

?>

@cloudward
Copy link

n = 1000
while 0.00028491020202637
goto 0.0002291202545166
do while 0.00021195411682129

n = 100
while 5.793571472168E-5
goto 4.6014785766602E-5
do while 5.0067901611328E-5

n = 10
while 3.6954879760742E-5
goto 2.8133392333984E-5
do while 2.6941299438477E-5

n = 1
while 2.598762512207E-5
goto 2.598762512207E-5
do while 2.598762512207E-5

n = 0
while 2.6941299438477E-5
goto 2.5033950805664E-5
do while 2.598762512207E-5

@tamasimrei
Copy link

on a personal note: it's amazing that people actually have time for this.
and like someone said, yeah, let's talk about indentation! :)

@jgoux
Copy link

jgoux commented Sep 25, 2014

@cloudward
Copy link

@tamasimrei the code i pasted was indented with tabs, but github seems to ignore them. i tried wrapping in <pre> and <code>, but it just made it worse. i guess markdown wants me to add 4 spaces to the start of every line. you're right, i don't have time for that.

@evert
Copy link

evert commented Sep 25, 2014

@cloudward you start and end your code with triple-backtick to turn it into a codeblock as such:

```php
// code goes here
```

If you're going to show the world the importance of your argument, you might as well do it with syntax highlighting enabled!

@hassankhan
Copy link

@jgoux Holy shit that was my thread!

@igorw
Copy link
Owner

igorw commented Oct 28, 2014

In case you need some more reasons why the GOTO statement is a feasible option for performance critical code such as the code in this library, I can highly recommend this paper by Knuth:

@vita10gy
Copy link

This is a joke right?

@jcoleman
Copy link

"Performance critical code" such as saving an instruction or two when
retrying web requests. If this isn't trolling, I don't know what is.

On Tue, Oct 28, 2014 at 8:40 PM, vita10gy notifications@github.com wrote:

This is a joke right?


Reply to this email directly or view it on GitHub
#3 (comment).

@DGuidi
Copy link

DGuidi commented Feb 17, 2015

here from this post
thank you all, you made my day.

@PurHur
Copy link

PurHur commented Sep 3, 2019

"Performance critical code" such as saving an instruction or two when
retrying web requests. If this isn't trolling, I don't know what is.

After all this, I just wanted to tell you that you do not always have to follow your textbooks. It makes totaly sense and if you cant read/maintain that single goto, just learn PHP.

@pedroac
Copy link

pedroac commented Nov 17, 2019

That's an old thread, but I'd like to comment some of the @jcoleman replies.

He cited the famous Edgar Dijkstra letter, that criticized "The unbridled use of the go to statement" and remarked that maybe it should be restricted "to alarm exits". Linux has lots of goto statements for that.

Remember that Donald Knuth replied Dijkstra through the also famous paper "Structured Programming with go to Statements".

Supposedly Dijkstra wrote: "Please don't fall into the trap of believing that I am terribly dogmatical about [the go to statement]. I have the uncomfortable feeling that others are making a religion out of it, as if the conceptual problems of programming could be solved by a single trick, by a simple form of coding discipline."

Well, the aphorism "premature optimization is the root of all evil" was written (and maybe invented) by Donald Knuth. For instance, in "The Art of Computer Programming" (1974):

"Another important aspect of program quality is the efficiency with which the computer's resources are actually being used. I am sorry to say that many people nowadays are condemning program efficiency, telling us that it is in bad taste. The reason for this is that we are now experiencing a reaction from the time when efficiency was the only reputable criterion of goodness, and programmers in the past have tended to be so preoccupied with efficiency that they have produced needlessly complicated code; the result of this unnecessary complexity has been that net efficiency has gone down, due to difficulties of debugging and maintenance.

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming."

If the idea is merely to cite famous works, well, Dijkstra is not dogmatic about "goto" statements and Knuth clearly favors optimization. They're just against abuses.

It's trivial to understand the code and it's very efficient, specially compared to slow recursive functions in PHP. They're two wins specially in a library. I don't understand what's the fuss, except dogmatically avoid a tool that is available and was, seemingly, correctly used.

@jcoleman
Copy link

@pedroac Please read my actual comments (like #3 (comment) ) and note that I am not dogmatically opposed to the use of goto. Nor am I opposed to optimization (it’s one of ,y favorite things to do).

But the speed difference between recursive functions and goto in the intended use case (a web request) is! mathematically speaking, purely statistical noise.

Oh, and if we’re that concerned about speed...probably PHP is the wrong language to use.

@pgl
Copy link

pgl commented Nov 18, 2019

PHP is the wrong language to use

Perhaps PHP isn't the wrong language to use to figure out the most efficient way to do something in PHP. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests