Recursion, Linq, etc. - 3rd Times the Charm
The joke about Microsoft is that it always takes them three times to get it right.
If you look at my posts on the game Set in WPF, it would seem I follow the same pattern.
Well, here I go again.
Let's start simple: I have a path to a directory and I want to enumerate over all of it's subdirectories recursively. To be super concrete:
IEnumerable<string> subPaths = SelectRecursivePaths(startPaths);
Easy enough.
What's the scenario-specific implementation of SelectRecursive? Well, our friend System.IO.Directory provides what we need.
{
foreach (string path in paths)
{
yield return path;
foreach (string subPath in SelectRecursivePaths(Directory.GetDirectories(path)))
{
yield return subPath;
}
}
}
This code does the job for doing an in-order, recursive traversal of all of all of the directories. But it isn't very Linq-ish, is it?
Getting Linq-ish: SelectRecursiveSimple
Could there be a way to extract out the call to Directory.GetDirectories such that this method could be used over any homogenous, hierarchical structure? Yup.
this IEnumerable<TSource> source,
Func<TSource, IEnumerable<TSource>> recursiveSelector)
{
foreach (TSource element in source)
{
yield return element;
foreach (TSource subElement in recursiveSelector(element).SelectRecursiveSimple(recursiveSelector))
{
yield return subElement;
}
}
}
Now we change the call site to embody the call to GetDirectories:
I love lambda expressions and extension methods.
With me so far?
Getting Clever: SelectRecursiveClever
Now we go back to my clever idea from Tuesday and Wednesday about flattening recursion. The code I ended up with:
this IEnumerable<TSource> source,
Func<TSource, IEnumerable<TSource>> recursiveSelector)
{
Stack<TSource> stack = new Stack<TSource>();
source.Reverse().ForEach(stack.Push);
while (stack.Count > 0)
{
TSource current = stack.Pop();
yield return current;
recursiveSelector(current).Reverse().ForEach(stack.Push);
}
}
Note: ForEach is an extension method I created that does what Array.ForEach does: calls an Action<T> delegate on each item. In this case, pushing every item in the sequence into the stack.
Now the caller-side behavior of this is exactly the same as the "simple" version above. The implementation tries to be "clever" though, by discarding the "naive" use of recursion by replacing the call stack with a local stack. I was quite proud of this implementation. As I said on Tuesday:
If you can flatten a recursive implementation, you will have a better understanding of what's really going on under the covers.
Gulp. Perhaps I spoke to soon.
The Problem with Clever
I showed a co-worker this code. It took a bit of work to explain the use of .Reverse(). Stacks are Last-in-first-out, so if you want to get the first item out first, you have to put it in last.
Explaining this hurt my head a bit. In the "simple"--read "naive"--implementation, one didn't have to reverse the enumeration. Why should the flattened version be different?
The answer: it shouldn't.
At some point in the "clever" implementation, every item is on the stack. This means if you have a very wide tree, the stack could grow to be huge.
Imagine your recursiveSelector is smart about not loading all items at a given level into memory at one time--the benefit of sequences, right? It doesn't matter, because SelectRecursiveClever will load everything in one level into a stack. It trades the potential benefit of a smaller call stack with the potential blow-up of your working set when iterate over a folder with 10,000 sub-folders.
Oops.
You have to keep track of your depth in the recursion somewhere: either the runtime stack or your own stack.
But all you have to keep track of is where you are in each sequence at each level. You don't have to queue up all of the items at a given level.
Turns out the CLR has this obscure feature to handle this: IEnumerator<T>.
Yeah, not so obscure.
It's actually what the simple version of this code does. When you call down into the nested SelectRecursiveSimle, an IEnumerator<T> will be waiting patiently at the the current spot for the call to return.
Getting Smart(er): SelectRecursive
The code is more ugly, but it also does a better job of illuminating what is happening in the truly recursive, "simple" version.
this IEnumerable<TSource> source,
Func<TSource, IEnumerable<TSource>> recursiveSelector)
{
Stack<IEnumerator<TSource>> stack = new Stack<IEnumerator<TSource>>();
stack.Push(source.GetEnumerator());
while (stack.Count > 0)
{
if (stack.Peek().MoveNext())
{
TSource current = stack.Peek().Current;
yield return current;
stack.Push(recursiveSelector(current).GetEnumerator());
}
else
{
stack.Pop();
}
}
}
2008-01-19: Turns out this code is wrong, too.
The stack now holds on to what it should: the IEnumerator<T> at each level. The while loop runs through IEnumerator<T>.MoveNext(), going deeper with every stack.Push() until there are no more items and then "returning" with stack.Pop().
Popping the stack
It's pretty easy to implement a recursive construct using Linq. The first "simple" implementation works great and it's really useful when partying over the file system or any other hierarchy.
I still claim that flattening recursive code is a good thing to do for reasons I mentioned on Tuesday.
Just make sure you do it correctly.
Put another way: pay attention to your masters.
If you can't be smart, be simple.
Don't ever be clever.
Clever will burn you.
Happy hacking.
PS: SVN has been updated again with the latest code. Hopefully the last time I'll touch RecursiveSelect for a while.