Micro-optimizing string comparison in .NET

Christian Nesmark

With this blog post, I'm targeting the Most Useful Blog Post of the Year Award™. Anyway, I find this stuff interesting, and hope to enlighten others who believe that this might actually matter. TL;DR: It doesn't.
 

What puzzled me

A while ago, I came across this piece of code during a code review:

private bool IsSomething(string somethingToCheck)  
{
    return somethingToCheck.First() == 'd';
}

My first reaction was to comment on this, and tell the developer to use string.StartsWith() instead, because I had an idea that it was a more suitable method than Linq - it has a better name for this particular task, it is optimized for strings, it has better performance etc.
 

But I was wrong

Or, that depends. I did start wondering about what method would be the best to use for this type of string comparison, though. As long as only a single character is to be tested, maybe the use of direct array access would be better? Or string.IndexOf()? Eventually, my wondering lead to this.

It is a range of various methods for comparing substrings. I present each one here.
 

Direct array access

A string is an array of characters. Thus, checking the first character can be done like this, and in theory, it feels like this should be faster than wrapping the same function in Linq:

haystack[0] == 'd'  

 

Linq.First()

Linq is great for working with collections. As mentioned, a string is really just an array of characters, so it is a collection by nature. Linq provides some very useful wrapping around collections, and our particular string comparison is written like this:

haystack.First() == 'd'  

 

String.StartsWith()

The native String class is full of methods optimized for string manipulation, and would be a natural choice for this task. It even has a method called StartsWith() which is exactly what we are doing here. Another good thing with this method is that you can specify stuff like case sensitivity and culture. If we know for sure that what we are looking for is a lowercase "d", that functionality is redundant, though.

haystack.StartsWith("d")  

 

String.IndexOf()

Not my first choice, but still, checking if the needle we are looking for is positioned at the 0th position would also let us achieve our goal.

haystack.IndexOf("d") == 0  

 

Comparing the methods

In my test suite, I ran each of the string comparison methods 10 000 000 times. Yes, ten million. That's how many iterations it took to show some real performance difference.

The tests called "...Index0" and "...IndexLength" are using direct array access.

Does string "asdf" start with character 'a'?  
FC_StringIndex0               TRUE       66 ms  
FC_StringIndex0Equals         TRUE       89 ms  
FC_LinqFirst                  TRUE      456 ms  
FC_StringStartsWith           TRUE     2907 ms  
FC_StringIndexOf              TRUE      162 ms

Does string "asdf" start with character 'f'?  
FC_StringIndex0               FALSE      54 ms  
FC_StringIndex0Equals         FALSE      69 ms  
FC_LinqFirst                  FALSE     499 ms  
FC_StringStartsWith           FALSE    2761 ms  
FC_StringIndexOf              FALSE     185 ms

Does string "jklø" start with character 'ø'?  
FC_StringIndex0               FALSE      57 ms  
FC_StringIndex0Equals         FALSE      72 ms  
FC_LinqFirst                  FALSE     486 ms  
FC_StringStartsWith           FALSE    1867 ms  
FC_StringIndexOf              FALSE     130 ms

Does string "asdf" end with character 'a'?  
LC_StringIndexLength          FALSE     101 ms  
LC_StringIndexLengthEquals    FALSE      98 ms  
LC_LinqLast                   FALSE     685 ms  
LC_StringEndsWith             FALSE    4925 ms  
LC_StringIndexOf              FALSE     113 ms

Does string "asdf" end with character 'f'?  
LC_StringIndexLength          TRUE       84 ms  
LC_StringIndexLengthEquals    TRUE       99 ms  
LC_LinqLast                   TRUE      683 ms  
LC_StringEndsWith             TRUE     3338 ms  
LC_StringIndexOf              TRUE      130 ms

Does string "jklø" end with character 'ø'?  
LC_StringIndexLength          TRUE       84 ms  
LC_StringIndexLengthEquals    TRUE       99 ms  
LC_LinqLast                   TRUE      701 ms  
LC_StringEndsWith             TRUE     2891 ms  
LC_StringIndexOf              TRUE      133 ms  

 

Substring comparison

I was curious whether the same numbers would apply when comparing substrings, not just a single character, and wrote tests to check for this, too.

Does string "asdf" start with string "asd"?  
SSW_CharArray                 TRUE      266 ms  
SSW_StringStartsWith          TRUE     2988 ms  
SSW_StringIndexOf             TRUE     3147 ms

Does string "asdf" start with string "abc"?  
SSW_CharArray                 FALSE     262 ms  
SSW_StringStartsWith          FALSE    2753 ms  
SSW_StringIndexOf             FALSE    5262 ms

Does string "asdf" end with string "sdf"?  
SEW_CharArray                 TRUE      305 ms  
SEW_StringEndsWith            TRUE     4246 ms  
SEW_StringIndexOf             TRUE     3398 ms

Does string "asdf" end with string "cba"?  
SEW_CharArray                 FALSE     305 ms  
SEW_StringEndsWith            FALSE    4401 ms  
SEW_StringIndexOf             FALSE    3679 ms  

 

Conclusion

Clearly, this does not matter in most applications. Running a single iteration did not show any noticeable difference between the various methods. The first sign of difference came around 10 000 iterations, and even then, the slowest methods only took about 3-5 ms. But, if you are using .NET to create a high-performance system with millions of transactions per second, you should consider using direct array access for string comparison. For the rest of us, it is basically just about readibility and niceness.

Personally, I actually prefer this one for single character comparison, as it is by far the fastest alternative, most compact code, and to my eye, most readable - as long as you keep in mind that strings are really character arrays.

haystack[0] == 'd'  

 

comments powered by Disqus