Flattening recursive algorithms is like taking the stairs...
...it's rarely necessary, but you'll be better for it.
Okay, a forced analogy.
2008-01-19: I've learned a lot since Tuesday. Please read!
I'm sure many remember their first algorithms class and their first introduction to recursion: the infamous Tower of Hanoi problem. I'd been coding for a while when I was presented with this puzzle and wondered (probably out loud), "Why don't we use a more concrete example, like the file system?"
I guess I've always been more of a software engineer than a computer scientist.
While the idea of recursion is useful in many situations, the costs are rarely discussed. As our friends at Wikipedia mention:
...in today's programming languages, the stack space available to a thread is often much less than the space available in the heap, and recursive algorithms tend to require more stack space than iterative algorithms.
Well said.
I was hacking on some work stuff that required recursion-- specifically flattening a hierarchy. I realized I'd done this before, in the bag-o-tricks. I also realized how expensive it was.
Not only was I paying in call stack space for every nested call, but I was also allocating a lot of enumerators.
Code for the SelectedDirectories getter.
{
foreach (SelectableDirectory sd in m_directories)
{
if (sd.IsSelected)
{
yield return sd;
}
foreach (SelectableDirectory sd2 in sd.SelectedDirectories)
{
yield return sd2;
}
}
}
So how does one improve this? Replace the call stack with your own stack.
directoryStack.Push(this);
while (directoryStack.Count > 0)
{
SelectableDirectory currentDir = directoryStack.Pop();
if (currentDir.m_directories != null)
{
for (int i = (currentDir.m_directories.Count - 1); i >= 0; i--)
{
directoryStack.Push(currentDir.m_directories[i]);
}
}
if (currentDir.IsSelected)
{
yield return currentDir;
}
}
The savvy reader will notice that the second implementation is slightly different: the target node is included if it is selected. Just a detail.
So, what has improved? Only one enumerator is created. The call stack space for the recursion is also eliminated. The cost is the creation of the local Stack<> to keep track of the location in the hierarchy, which is likely much smaller than the space required for a call stack frame.
Since we want to keep the original order of items returned, we also have to Push each level of directories into the Stack in reverse order, so that the Stack returns them in the original order.
Now how does this match up with our masters?
It's certainly less readable. Since the code's not documented, it would probably take longer for even a savvy dev to figure out what's going on. This means it's less maintainable--certainly a negative.
On the plus size, the scalability of this algorithm is now limited by the size of the heap, which is much larger than the call stack. This only matters if one plans to parse very deep directory structures, though.
Another plus is likely performance, since fewer objects are instantiated and the cost of manipulating a local Stack<> is likely much cheaper than churning the call stack. I say 'likely' because I'm not really sure. I should absolutely measure the performance to be certain. Even then, the perf gain is likely trivial unless this code is called a lot. In that case, it might make sense to cache the result of the method.
In the end (and as always), context really matters. I think this is a good exercise to understand the costs and to challenge one's understanding of recursion. If you can flatten a recursive implementation, you will have a better understanding of what's really going on under the covers--something that is always useful.
Happy hacking!
Note: the updated code for Folder Picker has been committed to the bag-o-tricks directory in my SVN repository. You can take a look at the old implementation using Tortoise. Right-click on Data.cs in the shell. Pick 'Show log' out of the Tortoise menu. Then right-click on revision 58 and pick 'Update item to revision'.