Micro-optimizing string comparison in .NET
With this blog post, I'm targeting the Most Useful Blog Post of the Year Award™. Anyway, I find this stuff interesting, and hope to enlighten others who believe that this might actually matter. TL;DR: It doesn't.
What puzzled me
A while ago, I came across this piece of code during a code review:
private bool IsSomething(string somethingToCheck)
{
return somethingToCheck.First() == 'd';
}
My first reaction was to comment on this, and tell the developer to use string.StartsWith()
instead, because I had an idea that it was a more suitable method than Linq - it has a better name for this particular task, it is optimized for strings, it has better performance etc.
But I was wrong
Or, that depends. I did start wondering about what method would be the best to use for this type of string comparison, though. As long as only a single character is to be tested, maybe the use of direct array access would be better? Or string.IndexOf()
? Eventually, my wondering lead to this.
It is a range of various methods for comparing substrings. I present each one here.
Direct array access
A string is an array of characters. Thus, checking the first character can be done like this, and in theory, it feels like this should be faster than wrapping the same function in Linq:
haystack[0] == 'd'
Linq.First()
Linq is great for working with collections. As mentioned, a string is really just an array of characters, so it is a collection by nature. Linq provides some very useful wrapping around collections, and our particular string comparison is written like this:
haystack.First() == 'd'
String.StartsWith()
The native String class is full of methods optimized for string manipulation, and would be a natural choice for this task. It even has a method called StartsWith()
which is exactly what we are doing here. Another good thing with this method is that you can specify stuff like case sensitivity and culture. If we know for sure that what we are looking for is a lowercase "d", that functionality is redundant, though.
haystack.StartsWith("d")
String.IndexOf()
Not my first choice, but still, checking if the needle
we are looking for is positioned at the 0th position would also let us achieve our goal.
haystack.IndexOf("d") == 0
Comparing the methods
In my test suite, I ran each of the string comparison methods 10 000 000 times. Yes, ten million. That's how many iterations it took to show some real performance difference.
The tests called "...Index0" and "...IndexLength" are using direct array access.
Does string "asdf" start with character 'a'?
FC_StringIndex0 TRUE 66 ms
FC_StringIndex0Equals TRUE 89 ms
FC_LinqFirst TRUE 456 ms
FC_StringStartsWith TRUE 2907 ms
FC_StringIndexOf TRUE 162 ms
Does string "asdf" start with character 'f'?
FC_StringIndex0 FALSE 54 ms
FC_StringIndex0Equals FALSE 69 ms
FC_LinqFirst FALSE 499 ms
FC_StringStartsWith FALSE 2761 ms
FC_StringIndexOf FALSE 185 ms
Does string "jklø" start with character 'ø'?
FC_StringIndex0 FALSE 57 ms
FC_StringIndex0Equals FALSE 72 ms
FC_LinqFirst FALSE 486 ms
FC_StringStartsWith FALSE 1867 ms
FC_StringIndexOf FALSE 130 ms
Does string "asdf" end with character 'a'?
LC_StringIndexLength FALSE 101 ms
LC_StringIndexLengthEquals FALSE 98 ms
LC_LinqLast FALSE 685 ms
LC_StringEndsWith FALSE 4925 ms
LC_StringIndexOf FALSE 113 ms
Does string "asdf" end with character 'f'?
LC_StringIndexLength TRUE 84 ms
LC_StringIndexLengthEquals TRUE 99 ms
LC_LinqLast TRUE 683 ms
LC_StringEndsWith TRUE 3338 ms
LC_StringIndexOf TRUE 130 ms
Does string "jklø" end with character 'ø'?
LC_StringIndexLength TRUE 84 ms
LC_StringIndexLengthEquals TRUE 99 ms
LC_LinqLast TRUE 701 ms
LC_StringEndsWith TRUE 2891 ms
LC_StringIndexOf TRUE 133 ms
Substring comparison
I was curious whether the same numbers would apply when comparing substrings, not just a single character, and wrote tests to check for this, too.
Does string "asdf" start with string "asd"?
SSW_CharArray TRUE 266 ms
SSW_StringStartsWith TRUE 2988 ms
SSW_StringIndexOf TRUE 3147 ms
Does string "asdf" start with string "abc"?
SSW_CharArray FALSE 262 ms
SSW_StringStartsWith FALSE 2753 ms
SSW_StringIndexOf FALSE 5262 ms
Does string "asdf" end with string "sdf"?
SEW_CharArray TRUE 305 ms
SEW_StringEndsWith TRUE 4246 ms
SEW_StringIndexOf TRUE 3398 ms
Does string "asdf" end with string "cba"?
SEW_CharArray FALSE 305 ms
SEW_StringEndsWith FALSE 4401 ms
SEW_StringIndexOf FALSE 3679 ms
Conclusion
Clearly, this does not matter in most applications. Running a single iteration did not show any noticeable difference between the various methods. The first sign of difference came around 10 000 iterations, and even then, the slowest methods only took about 3-5 ms. But, if you are using .NET to create a high-performance system with millions of transactions per second, you should consider using direct array access for string comparison. For the rest of us, it is basically just about readibility and niceness.
Personally, I actually prefer this one for single character comparison, as it is by far the fastest alternative, most compact code, and to my eye, most readable - as long as you keep in mind that strings are really character arrays.
haystack[0] == 'd'