Introduction
You probably came across the term ‘Tail Recursion’ or ‘Tail Recursive’ before. You even have written a piece of Tail Recursive functions/algorithms without knowing it. So, what is ‘Tail Recursion’ and how is it different from other recursion (the traditional ones) ? This article is going to explain the differences.
Recursive methods are either Tail recursive or Non-tail recursive.
Definition: Tail recursive method has the recursive call as the last statement in the method. Recursive methods that are not tail recursive are called non-tail recursive.
Example 1 – Factorial
Let’s start by looking a text-book example of writing first recursive function that computes the factorials
1 2 3 4 5 6 | int factorial(int x) { if (x <= 1) { return 1; // terminal condition } return x * factorial(x - 1);// multiplication pending } |
int factorial(int x) { if (x <= 1) { return 1; // terminal condition } return x * factorial(x - 1);// multiplication pending }
When returning back from a recursive call, there is still one pending operation, multiplication. Therefore, the above factorial is a non-tail recursive method.
Let’s put this function in a C++ project by Visual Studio 2010 and see what the compiler produces. It is known that the compiler will generate different machine/binary code according to different compilation settings. For example, at DEBUG mode, when robustness is preferred over performance, the code is not optimised. Below is the ‘DEBUG’ settings we will use in Visual Studio C++ project.
Noted that the ‘Optimization’ is disabled meaning that the compiler will not attempt to optimize any code, the code will be produced as the way it looks. The ‘Omit Frame Pointers’ will save an additional register EBP because no stack frames are created. It will speed up function calls as well but this might not be so debug-friendly because the debugger needs to preserve stack points to obtain debug information. Therefore, suppressing the frame pointer will not be compatible with other debug complier option (/Z7, /Zi, /ZI).
So, let us see what compiler generates for the above recursion function to compute factorial.
When a function is called, the IP (Instruction Pointer, the address where the current program executes) is pushed on the stack. When the function ends, the IP is restored and the stack is cleared. The program will restore to its previous state.
The above recursive function computes the value for a smaller input and use the value to multiply current input, which gives the final value. Therefore, the final value will not be computed until all sub recursive calls return, which require the stacks to preserve the states.
The recursive calls expanded (go deeper) into smaller sub functions and restored backwards when the deepest recursive call finishes.
How about the code generated in favour of performance? Let’s look at the ‘RELEASE’ mode in Visual Studio 2010 C++.
Apart from the ‘/O2’ optimisation, we turn on the ‘Omit Frame Pointers’ to speed up function calls by omitting the frame pointers i.e. Register EBP (in fact, you have one more register to do the base pointer indexing)
Apart from ‘no ebp is saved’, we can hardly notice any code difference. The recursive functions are still based on assembly ‘call’ instruction that will push and pop the IP.
The stack is a piece of fixed-size memory block (normally 1MB, but can be increased by specific compiler directives). The recursion will use the stack space every time it jumps into sub recursive calls. The precious stack space will be used up quickly if the depth of recursion is large (especially if there is no terminating conditions). In this case, a dreadful run time error will be caught.
If you look at the call stacks, you can see the calls break into smaller calls.
One can do such experiment very easily by defining a endless recursive function.
1 2 3 | int deadly(int x) { return deadly(x - 1); } |
int deadly(int x) { return deadly(x - 1); }
In DEBUG mode, the compiler will generate the stacks and using ‘call’ for recursive calls.
In RELEASE mode (when optimization is on), the compiler will use a direct jump instead of pushing and popping stacks.
Hence, whenever you stop the execution and set a breakpoint, you will only have the latest one stack. There will no more ‘stack-over-flow’ error. The call stacks is like this:
So, it is safer to write recursive functions that do not throw up ‘stack-over-flow’. But not all recursive functions can be converted e.g. ACKMan. (there are nested recursive calls).
Let’s re-write the factorial function in Tail form.
1 2 3 4 5 6 | int factorial_tail(int x, int ans) { if (x == 1) { return ans; } return factorial_tail(x - 1, ans * x); } |
int factorial_tail(int x, int ans) { if (x == 1) { return ans; } return factorial_tail(x - 1, ans * x); }
We have added another parameter to save the intermediate result. The recursive call is the final operation in the current iteration, therefore, the value of the current iteration is the value of the next recursive call.
However, not all compilers will do the optimization on tail recursive functions by default (even some are not clever enough to figure this out).
The stack pointers are not required and the implication of this is once you are ready to perform your next recursive call, you do not actually need the current stack frame any more. This allows for some optimization. Actually, with an appropriately modern compiler, you should never have a stack overflow with a tail recursive call. Simply reuse the current stack frame for the next recursive call.
In DEBUG mode, the compiler fails to optimize the tail recursive function.
The tail-recursive version of factorial function still suffer from ‘Stack Overflow’ error when recursion depth becomes large.
In RELEASE mode, the Visual Studio generates optimal code and removes the recursion.
Instead of ‘call’, the complier translates the tail-recursion using direct jmp which virtually is a loop. It means, that it is not longer a recursion function but you actually code it in recursive form (straightforward). The performance advantage speaks for itself.
Tail Recursive methods are easy to convert to iterative.
For example, the above factorial tail-recursive function can be just simply re-written as:
int factorial_iter(int x) { int ans = 1; for (int i = 0; i < x; i ++) ans *= i; return ans; }
Smart compilers can detect tail recursion and convert it to iterative to optimize code. The tail-recursive functions are often used to implement loops in languages that do not support loop structures explicitly (e.g. prolog)
In general, the tail-recursive form such as:
1 2 3 4 5 | F(x) { if (P(x)) return G(x); return F(H(x)); } |
F(x) { if (P(x)) return G(x); return F(H(x)); }
can be converted to iterative approach:
1 2 3 4 5 6 7 8 | F(x) { int temp_x = x; while (P(x) is not true) { temp_x = x; x = H(temp_x); } return G(x); } |
F(x) { int temp_x = x; while (P(x) is not true) { temp_x = x; x = H(temp_x); } return G(x); }
P(x) is true, the value of F(x) is the value of some other function G(x). Otherwise, the value of F(x) is the value of the function F on some other value, H(x).
Example 2 – Fibonacci Numbers
Fibonacci is another classic example of using recursion. We can easily write such function base on definition of equations.
1 2 3 4 5 6 7 8 | int fib(int x) { if (x == 1) { return 0; } else if (x == 2) { return 1; } return fib(x - 1) + fib(x - 2); } |
int fib(int x) { if (x == 1) { return 0; } else if (x == 2) { return 1; } return fib(x - 1) + fib(x - 2); }
The code generated in DEBUG mode is un-optimised.
We achieve quite similar code apart from the stack pointer at the begining in the RELEASE mode:
Unless we convert this to tail-recursive function, the compiler finds it hard to optimize:
1 2 3 4 5 6 | int fib_tail(int n, int res, int next) { if (n == 1) { return res; } return fib_tail(n - 1, next, res + next); } |
int fib_tail(int n, int res, int next) { if (n == 1) { return res; } return fib_tail(n - 1, next, res + next); }
Here, the first parameter denotes the level of recursion, when it reaches downwards to 1, it means it is time to give a value for recursive function otherwise the recursion won’t stop (and we will run forever in RELEASE or another stackoverflow error in DEBUG mode).
The second parameter stores the intermediate result (that is the n-th Fibonacci number). The third parameter adds up the previous two and pass it to next level. So, virtually, the value of current iteration is the value of the next recursive step.
In DEBUG mode, again, no much things that the compiler can do.
However, in RELEASE mode, the tail recursion is eliminated by a loop instead.
Example 3 – Indirect Recursion
If X makes a recursive call to X itself, it is called direct recursion. Indirect recursive methods call themselves indirectly through calling other methods. In general, indirect recursion is a circular sequence of two or more recursive calls: X1 –> X2 –> … –> X1.
One example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | /* How do we know if a number is even? we know 0 is even. we know that if n is even, then n-1 must be odd. */ bool is_even(int n) { if (n==0) return true; else return (is_odd(n-1)); } // How do we know if a number is odd? It's not even! bool is_odd(int n) { return (!is_even(n)); } |
/* How do we know if a number is even? we know 0 is even. we know that if n is even, then n-1 must be odd. */ bool is_even(int n) { if (n==0) return true; else return (is_odd(n-1)); } // How do we know if a number is odd? It's not even! bool is_odd(int n) { return (!is_even(n)); }
Such indirection recursion are not easy to convert to tail-recursion and therefore, it is recommended to convert to a single-recursive function.
1 2 3 4 5 | bool is_even(int n) { if (n==0) return true; else return (!is_even(n-1)); } |
bool is_even(int n) { if (n==0) return true; else return (!is_even(n-1)); }
Look, it is already tail-recursive, and in RELEASE mode, it will eventually be converted into:
1 2 3 4 5 6 7 8 9 10 11 12 13 | bool is_even(int n) { if (n==0) return true; else return(!is_even(n-1)); 00A81000 mov eax,dword ptr [esp+4] 00A81004 dec eax 00A81005 jne is_even+11h (0A81011h) 00A81007 mov al,1 00A81009 xor ecx,ecx 00A8100B test al,al 00A8100D sete al } 00A81010 ret |
bool is_even(int n) { if (n==0) return true; else return(!is_even(n-1)); 00A81000 mov eax,dword ptr [esp+4] 00A81004 dec eax 00A81005 jne is_even+11h (0A81011h) 00A81007 mov al,1 00A81009 xor ecx,ecx 00A8100B test al,al 00A8100D sete al } 00A81010 ret
There is no recursive call at all, everything is converted into loop.
–EOF (The Ultimate Computing & Technology Blog) —
Last Post: How to Implement Integer Square Root in C/C++?
Next Post: Linux C Programming Coding Exercise - Fork