A Study of Recursive Programs

13 minute read

I’m going to analyze a certain type of program which I’ll call recursive programs and then I’ll attempt to learn something interesting about them.

What is Recursion?

A recursive function is a function which solves a problem by breaking it up into smaller pieces of the same problem.

For instance, the factorial \(n!\) is equal to the product of all integers \(1\) to \(n\). So \(5! = 5\cdot4\cdot3\cdot2\cdot1 = 120\).

A factorial function can be implemented by either multiplying every number from \(1\) to \(n\) or by doing recursive decomposition of the function.

A recursive factorial function could be implemented in Python like this:

f = lambda n: n * f(n-1) if n > 0 else 1

Now, \(f(n) == n \cdot f(n-1)\), so the factorial function is recursively decomposed. This is in contrast to above where the factorial was defined in terms of solving a single instance of the problem, namely multiplying the numbers from \(1\) to \(n\).

What are Recursive Programs?

Recursion is a problem-solving method that involves solving a problem by breaking it up into multiple smaller pieces. A recursive algorithm is an algorithm that utilizes recursion to solve a problem. A recursive function is some concrete implementation of a recursive algorithm.

A recursive program will loosely be defined as a program which is able to run itself to solve a recursive problem. For instance, a Python script that can be run once and it will run itself again.

Note that while a recursive function involves code that calls itself, a recursive program will implement a recursive algorithm via running itself as a program.

A recursive program is identical to a recursive algorithm with one change: the self-referential piece changes from a code reference (the name of the recursive function) to the name of the program.

Recursive Program Strategies

There are a number of different strategies that can be employed to build recursive programs. The key differences are the mechanisms used for input and output.

Recursive Program via sys args

# factorial
import subprocess
import sys

n = int(sys.argv[1])
if n == 0:
    print(1)
else:
    v = subprocess.check_output(['python', sys.argv[0], str(n-1)])
    i = int(v)
    print(n * i)

This is a simple recursive factorial program. Program arguments are being used as input and stdout is being used as output.

Its interface is just an argument for the number n:

>>> python factorial.py 5
120

Recursive Program via local file

We can also use a temporary file for caching intermittent data between calls:

# Factorial
import subprocess
import sys

filename = 'recursion-temp.txt'
if len(sys.argv) > 1:
  f = open(filename, 'w')
  f.write(sys.argv[1])
  f.close()

f = open(filename, 'r')
n = int(f.read())
f.close()
# re-open file to get to the beginning (rewind it instead?)
if n == 0:
    f = open(filename, 'w')
    f.write(str(1))
    f.close()
else:
    f = open(filename, 'w+')
    f.write(str(n-1))
    f.close()
    subprocess.run(['python', sys.argv[0]])
    f = open(filename, 'r')
    v = int(f.read())
    f.close()
    f = open(filename, 'w')
    f.write(str(n * v))
    f.close()

if len(sys.argv) > 1:
    print(open(filename).read())

This is pretty messy, but it’s using a file for the input to the current recursive subproblem and output to the previous recursive subproblem. This would end up being simpler using channels in Go instead of opening and closing files here like I did. This kind of file manipulation is really error-prone, so we’ll want to have a cleaner way to work with it if we want to still employ it as a strategy for communicating between recursive programs.

Summarizing two Recursive Program Styles

The first example with sys.argv is simpler because it involves less code, but it also resembles recursion using function parameters so it’s not that interesting yet.

Follow-up: Is Fibonacci more interesting?

# Fibonacci
import subprocess
import sys

n = int(sys.argv[1])
if n == 0 or n == 1:
    print(1)
else:
    a = subprocess.check_output(['python', sys.argv[0], str(n-1)])
    b = subprocess.check_output(['python', sys.argv[0], str(n-2)])
    print(int(a) + int(b))

It’s not really that interesting yet, except it’s even slower than an already-inefficient uncached Python Fibonacci algorithm because it incurs the Python startup time for each exponential and recursive function call.

Can we add some kind of caching?

A cool next step would be to implement a caching mechanism, such as having one file for each subproblem or reading and writing to an amendable data structure. One thing that’s cool about Python is that it’s really easy to do terrible things in a small amount of code. I’m going to commit one such atrocity here.

I’m going to append to a file and have each line correspond to the result of that subproblem, so the 7th line would be equal to fib(7). Another equally terrible idea which I’m not going to try here would be: use pickle to load / dump a python dictionary between runs of the program and store newly cached variables in the dump.

import subprocess
import sys

n = int(sys.argv[1])
if n < 0:
    raise Exception("n must be larger than 0")

try:
    cache = list(map(lambda x: int(x.strip()), open('f.txt').readlines()))
except FileNotFoundError as fnfe:
    cache = [0, 1]
    f = open('f.txt', 'w')
    f.write('0\n1\n')
    f.close()

if n in [0, 1]:
    print(cache[n])
    sys.exit(0)

if len(cache) <= n:
    a = subprocess.check_output(['python', sys.argv[0], str(n-1)])
    b = subprocess.check_output(['python', sys.argv[0], str(n-2)])
    f = open('f.txt', 'a')
    fibn = int(a) + int(b)
    f.write(str(fibn) + '\n')
    f.close()
    print(fibn)
else:
    print(cache[n-1] + cache[n-2])

After running this and the previous version once, it’s clear that this cached version is a lot faster. But the speedup is mostly due to the fact that Fibonacci has an exponential blowup.

Speed Lineup: Fibonacci, Cached and Uncached

Fibonacci Uncached

>>> time python fibonacci-uncached.py 13
377

real	0m11.336s
user	0m9.663s
sys	0m1.828s

Fibonacci Cache

First let’s compare fibonacci(13) in both.

time python fibonacci-cached.py 13
233

real	0m0.392s
user	0m0.339s
sys	0m0.059s

On cold start it’s definitely faster but still super inefficient.

On cache read, though, it’s blazing (comparatively)!

time python fibonacci-cached.py 13
233

real	0m0.019s
user	0m0.019s
sys	0m0.000s

Most of the time being spent is starting the Python vm, so in a recursive call this is happening n times.

In my mental model, I assume that file reads / writes take basically 0 time. So with n = 13, we have ~13 recursive calls and that would mean around 0.030s are spent for each n. Since the case where it’s cached runs in 0.019s, that means .030s - .019s = 0.011s is spent doing other stuff, mainly subprocess shenanigans.

Just to show that this is still terrible but the cached version at least doesn’t have as bad of a blowup is that it’s possible to compute the 250th Fibonacci number in 7.7 seconds using a cached recursive program:

time python fibonacci-cached.py 250
7896325826131730509282738943634332893686268675876375

real	0m7.760s
user	0m6.616s
sys	0m1.246s

Future Work: DFS and Recoverable Search

Some future work that that might be interesting to follow-up on is a dfs with a self-propogating program. Because I want to do something involving trees. This could either have the program be run from one folder or have the program get propgated to each subdirectory and then run there locally.

Having a program write into each directory is a way to mark that directory as accessed by the algorithm, but this could also be done with a local data structure. If you store which parts of the file tree you’ve fully accessed or have partially accessed, you could easily have an auto-recoverable search algorithm. An auto-recoverable search algorithm would be a search algorithm which can start, be cancelled or killed, and be able to recover and start from the last checkpoint.

Appendix: Interesting Side-Effects of Recursive Programs

When I was programming the recursive factorial program I realized it’s possible to implement recursive logic in an invalid program. An exception was raised for each iteration of the algorithm. Usually a recursive function will raise one exception then terminate the program, so seeing the same error condition happen for each one exposes something unique about this problem style.

read 3
read 2
read 1
read 0
read? 1
Traceback (most recent call last):
  File "test.py", line 30, in <module>
    v = int(f.read())
ValueError: invalid literal for int() with base 10: ''
read? 2
Traceback (most recent call last):
  File "test.py", line 30, in <module>
    v = int(f.read())
ValueError: invalid literal for int() with base 10: ''
read? 3
Traceback (most recent call last):
  File "test.py", line 30, in <module>
    v = int(f.read())
ValueError: invalid literal for int() with base 10: ''