Showing posts with label Linq. Show all posts
Showing posts with label Linq. Show all posts

Friday, January 18, 2008

Source code, HTML, and an Exercise for the Reader

No clue why I want to be vague about this. Probably because I'm late for a party, but really wanted to share.

And you're smart. You'll figure it out.

<Window x:Class="MainWindow"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    Title="VS Copy-Paste to HTML" >
    <Grid>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="*" />
            <ColumnDefinition Width="*" />
        </Grid.ColumnDefinitions>
        <RichTextBox Name="m_richTextBox" TextChanged="m_richTextBox_TextChanged" />
        <TextBox Grid.Column="1" Name="m_textBoxHTML" />
    </Grid>
</Window>

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Web;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Documents;
using System.Windows.Media;

public partial class MainWindow : Window
{
    public MainWindow()
    {
        InitializeComponent();
    }

    private void m_richTextBox_TextChanged(object sender, TextChangedEventArgs e)
    {
        m_textBoxHTML.Clear();
        m_textBoxHTML.AppendText("<div style=\"font-family:monospace;\">" + Environment.NewLine);
        foreach (Paragraph paragraph in m_richTextBox.Document.Blocks.Cast<Paragraph>())
        {
            foreach (Run run in Decompose(paragraph.Inlines))
            {
                m_textBoxHTML.AppendText(toCleanedColoredSpan(run));
            }
            m_textBoxHTML.AppendText("<br/>" + Environment.NewLine);
        }
        m_textBoxHTML.AppendText("</div>");
    }

    private string toCleanedColoredSpan(Run run)
    {
        SolidColorBrush solidColorBrush = run.Foreground as SolidColorBrush;
        if (solidColorBrush == null)
        {
            return cleanString(run.Text);
        }
        else
        {
            if (solidColorBrush.Color == Colors.Black)
            {
                return cleanString(run.Text);
            }
            else
            {
                string colorString = solidColorBrush.Color.ToString();
                Debug.Assert(colorString.Length == 9);

                colorString = colorString.Substring(3);

                return string.Format("<span style=\"color:#{0};\">{1}</span>",
                    colorString, cleanString(run.Text));
            }
        }
    }

    private string cleanString(string source)
    {
        source = HttpUtility.HtmlEncode(source);
        source = source.Replace(" ""&nbsp;");
        return source;
    }

    private static IEnumerable<Run> Decompose(IEnumerable<Inline> inlines)
    {
        foreach (Inline inline in inlines)
        {
            if (inline is Span)
            {
                foreach (Run run in Decompose(((Span)inline).Inlines))
                {
                    yield return run;
                }
            }
            else if (inline is Run)
            {
                yield return (Run)inline;
            }
            else
            {
                throw new NotSupportedException();
            }
        }
    }
}

Recursion, Linq, etc. - 3rd Times the Charm

The joke about Microsoft is that it always takes them three times to get it right.

If you look at my posts on the game Set in WPF, it would seem I follow the same pattern.

Well, here I go again.

Let's start simple: I have a path to a directory and I want to enumerate over all of it's subdirectories recursively. To be super concrete:

string[] startPaths = new string[] { @"D:\j832_svn\PublicSoftware\" }; 
IEnumerable<string> subPaths = SelectRecursivePaths(startPaths);

Easy enough.

What's the scenario-specific implementation of SelectRecursive? Well, our friend System.IO.Directory provides what we need.

private static IEnumerable<string> SelectRecursivePaths(IEnumerable<string> paths)
{
    foreach (string path in paths)
    {
        yield return path;
        foreach (string subPath in SelectRecursivePaths(Directory.GetDirectories(path)))
        {
            yield return subPath;
        }
    }
}

This code does the job for doing an in-order, recursive traversal of all of all of the directories. But it isn't very Linq-ish, is it?

Getting Linq-ish: SelectRecursiveSimple

Could there be a way to extract out the call to Directory.GetDirectories such that this method could be used over any homogenous, hierarchical structure? Yup.

public static IEnumerable<TSource> SelectRecursiveSimple<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, IEnumerable<TSource>> recursiveSelector)
{
    foreach (TSource element in source)
    {
        yield return element;
        foreach (TSource subElement in recursiveSelector(element).SelectRecursiveSimple(recursiveSelector))
        {
            yield return subElement;
        }
    }
}

Now we change the call site to embody the call to GetDirectories:

IEnumerable<string> subPaths = startPaths.SelectRecursiveSimple(path => Directory.GetDirectories(path));

I love lambda expressions and extension methods.

With me so far?

Getting Clever: SelectRecursiveClever

Now we go back to my clever idea from Tuesday and Wednesday about flattening recursion. The code I ended up with:

public static IEnumerable<TSource> SelectRecursiveClever<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, IEnumerable<TSource>> recursiveSelector)
{
    Stack<TSource> stack = new Stack<TSource>();
    source.Reverse().ForEach(stack.Push);
    while (stack.Count > 0)
    {
        TSource current = stack.Pop();
        yield return current;
        recursiveSelector(current).Reverse().ForEach(stack.Push);
    }
}

Note: ForEach is an extension method I created that does what Array.ForEach does: calls an Action<T> delegate on each item. In this case, pushing every item in the sequence into the stack.

Now the caller-side behavior of this is exactly the same as the "simple" version above. The implementation tries to be "clever" though, by discarding the "naive" use of recursion by replacing the call stack with a local stack. I was quite proud of this implementation. As I said on Tuesday:

If you can flatten a recursive implementation, you will have a better understanding of what's really going on under the covers.

Gulp. Perhaps I spoke to soon.

The Problem with Clever

I showed a co-worker this code. It took a bit of work to explain the use of .Reverse(). Stacks are Last-in-first-out, so if you want to get the first item out first, you have to put it in last.

Explaining this hurt my head a bit. In the "simple"--read "naive"--implementation, one didn't have to reverse the enumeration. Why should the flattened version be different?

The answer: it shouldn't.

At some point in the "clever" implementation, every item is on the stack. This means if you have a very wide tree, the stack could grow to be huge.

Imagine your recursiveSelector is smart about not loading all items at a given level into memory at one time--the benefit of sequences, right? It doesn't matter, because SelectRecursiveClever will load everything in one level into a stack. It trades the potential benefit of a smaller call stack with the potential blow-up of your working set when iterate over a folder with 10,000 sub-folders.

Oops.

You have to keep track of your depth in the recursion somewhere: either the runtime stack or your own stack.

But all you have to keep track of is where you are in each sequence at each level. You don't have to queue up all of the items at a given level.

Turns out the CLR has this obscure feature to handle this: IEnumerator<T>.

Yeah, not so obscure.

It's actually what the simple version of this code does. When you call down into the nested SelectRecursiveSimle, an IEnumerator<T> will be waiting patiently at the the current spot for the call to return.

Getting Smart(er): SelectRecursive

The code is more ugly, but it also does a better job of illuminating what is happening in the truly recursive, "simple" version.

public static IEnumerable<TSource> SelectRecursive<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, IEnumerable<TSource>> recursiveSelector)
{
    Stack<IEnumerator<TSource>> stack = new Stack<IEnumerator<TSource>>();
    stack.Push(source.GetEnumerator());

    while (stack.Count > 0)
    {
        if (stack.Peek().MoveNext())
        {
            TSource current = stack.Peek().Current;
            yield return current;
            stack.Push(recursiveSelector(current).GetEnumerator());
        }
        else
        {
            stack.Pop();
        }
    }
}

2008-01-19: Turns out this code is wrong, too.

The stack now holds on to what it should: the IEnumerator<T> at each level. The while loop runs through IEnumerator<T>.MoveNext(), going deeper with every stack.Push() until there are no more items and then "returning" with stack.Pop().

Popping the stack

It's pretty easy to implement a recursive construct using Linq. The first "simple" implementation works great and it's really useful when partying over the file system or any other hierarchy.

I still claim that flattening recursive code is a good thing to do for reasons I mentioned on Tuesday.

Just make sure you do it correctly.

Put another way: pay attention to your masters.

If you can't be smart, be simple.

Don't ever be clever.

Clever will burn you.

 

Happy hacking.

 

kick it on DotNetKicks.com

PS: SVN has been updated again with the latest code. Hopefully the last time I'll touch RecursiveSelect for a while.

Wednesday, January 16, 2008

Flattening recursive algorithms with Linq

I wake up this morning to find a comment from Stefan regarding my post yesterday about flattening recursion:

In the future version of C# we may receive a new language construct yield foreach which will eliminate this problem. Read this paper by Erik Meijer or this blog post by Wes Dyer if you are interested.

Ironically, I had read the paper and associated blog post last week.

And it got me thinking: could I abstract out my recursive traversal?

Even better: could I abstract it out in a way that was Linq-friendly?

The answer to both: oh yeah.

Check out the browser-friendly source here.

Now the code to find all of the selected directories becomes

SubDirectories
.SelectRecursive(directory => directory.SubDirectories)
.Where(directory => directory.IsSelected);

How's that for straight-forward?

Note 1: The savvy reader will notice that the third implementation is now like the original: the target node is not included in SelectedDirectories if it is selected. I like this more.

Note 2: Since I made SelectRecursive more general, I'm not quite as efficient about creating a bunch of enumerable instances. I still (likely) save a lot in call stack space and having a general implementation is pretty cool.

Note 3: You'll notice that SelectRecursive handles the case where there recursiveSelector returns null. This is different than how default extensions like SelectMany work. It made my usage in this scenario cleaner, but it's important to note. Or not. This bothered me for reasons I'll go into in another post. The code as checked-in now throws if null is returned. These subtle implementation details can cause headaches later.

As I mentioned yesterday, you can get the full source code from my SVN repository. The new methods extension methods have been added to Extensions.cs in J832.Common.

Happy recursive Linq-ing!