Jungle Coder
written by night, read by day
Just added post themes! Witness The Colors, and Exult!

Performance Gotchas in .Net 2 - Regex timeouts

Every programming framework has certain corner cases suck the performance out of an application. The .Net Framework is no exception. I’ve discovered a few in my work with C#, and blog about them as I find time.

I love using regex for search/replace/text manipulation tasks in programming. You don’t have to go to full perl mode for the awesomeness that is Regex.Replace("(\w+)-(\d+)", "$2-$1");. This awesomeness is counter-balanced with the risk of catastrophic backtracking when using the non-regular varieties of regex, like Perl, Javascript and, most pressing for me, .NET.

Catastophic backtracking takes exponential time, leading to long page load times, unresponsive applications and denial of service attacks on websites that use broken regexes. Jeff Atwood covered some of the bare basics of horrendous regex performance here, ending with a wish for a way to keep regular expressions from going full into a full ReDoS on your server. Some years later in .NET 4.5, his wish has been granted. C# and VB.NET now allow you to specify a TimeSpan that denotes how long a regular expression is allowed to take before giving up.

Below is a C# snippet based on Atwood’s code that can be run in LINQpad to illustrate the timeout in action:

string pattern = "(x+x+)+y";
RegexOptions options = RegexOptions.None;
Regex re = new Regex(pattern, options, TimeSpan.FromMilliseconds(1000));

string success = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxy";
string failure = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";

re.Match(success).Dump();
re.Match(failure).Dump();

Even though this is no excuse to keep from writing proper regular expressions, I like that it creates another layer of defense in depth against denial of service attacks based on the clever intern’s bad regex from last summer.

Comments

Previously: Life as a Pedestrian: My First Few Days in Bellingham
Next: Counting lines of C# using Powershell