Sunday, January 6, 2013

Benchmarking C# code

Getting the codez:
Since writing this post, I've made improvements to the code described below. You can read about them in a newer post. The NuGet and GitHub links to the right are for the new version.
When doing casual benchmarking, I have often just thrown together a StopWatch and loop. I hate to admit the number of times I’ve wasted time thinking that a tweak I made was an improvement (or the opposite), only to discover after some head scratching that I had forgotten I’d changed the loop limit, and was comparing apples and oranges with regard to the previous benchmarks.
I’m working on some code that has me doing a fair amount of benchmarking and optimizing, so I decided to write a little utility to make benchmarking easier, and remove the source of past red herrings. I had the following goals:
  • Simple to use, with minimal coding needed
  • Produce results that are comparable, and independent of loop counts
  • Automatically figure out an appropriate number of times to execute the target code
  • Provide the ability to exclude set up code from the timing
  • Be able to benchmark small bits of code with very short run times
  • Remove the benchmarking overhead time (loop processing, etc.) from results
This post presents my first pass at it, and it seems to do the things I was looking for. The benchmark class achieves this using the StopWatch class, and a main timing loop. It takes an Action delegate for the code being benchmarked, which can be a normal method, or an anonymous method. In the case of code that has very small run times (under 1 microsecond), it’s difficult to get accurate results with a single loop that repeatedly calls the target method. The substantial overhead of looping and calling the method drowns out the tiny amount of time used by the target code, which gets lost in the noise. For those cases, the target method can have its own simple inner for loop. The count for the loop is passed in to the target method. This has two benefits. Since the benchmark utility knows the loop count, it can better calculate the associated overhead of the for loop and remove that from the result. Another benefit is that you can let the benchmark utility figure out the appropriate loop count your target method should use. In some cases you’ll want to control the loop count yourself, in cases where the loop count is an intrinsic part of what you’re benchmarking. For example, if I was benchmarking adding to a list, I might want to always benchmark the time to add 10,000 items. So I’d want to specify the count myself.

This benchmarking utility is all contained within a class called Bench. The following code snippet shows some examples of using the Bench class.

public class Program
{
    public static void Main(string[] args) {
        TimeSleep();
        TimeStringConcat();
        TimeDictionaryRemove();
        Console.Write("done");
        Console.ReadKey();
    }
    
    public static void TimeSleep() {
        new Bench().Time("Sleep", () => { Thread.Sleep(2); });
    }

    public static void TimeStringConcat() {
        string s1 = DateTime.UtcNow.ToString();
        string s2 = DateTime.UtcNow.ToString();
        string s;
        new Bench().Time("String Concat", (count) => {
            for (int i = 0; i < count; i++) s = s1 + s2;
        } );
    }
    
    public static void TimeDictionaryRemove() {
        new Bench().Time("Dictionary.Remove", 1000, DictionaryRemove );
    }

    public static void DictionaryRemove(int count, Bench.TimeControl tc) {
        tc.Pause();
        var d = new Dictionary<int, string>();
        for (int i = 0; i < count; i++) d.Add(i, i.ToString());
        tc.Resume();
        for (int i = 0; i < count; i++) d.Remove(i);
    }
}
The TimeSleep method shows the simplest use of Bench, using a lambda expression. The TimeStringConcat method shows timing a chunk of code with a very small execution time, so it is wrapped in a for loop and using the overload of Time that will let Bench calculate an appropriate loop count. That loop count is passed in as a parameter to the Action method. The Time method will return only the time for the concatenation. The overhead to repeatedly call the Action method, as well as the time of the for loop are removed from the final timing results. The last example of DictionaryRemove shows the use of TimeControl’s Pause/Resume to exclude the set up from the timing. I’ve chosen in this case to explicitly state the loop count since I’m using the loop count to build the Dictionary, and the size of the dictionary likely will affect the time of the Remove I’m timing. By explicitly specifying the loop count, I ensure all timings are comparable. This example also shows using a regular method instead of an anonymous method or lambda expression. Again, the time returned will be for just the d.Remove(i) code. The code above produces the following output to the Console:

Name= Sleep
Millisecs/Op= 2.006, , Ops= 2,490, ElapsedMillisecs= 4,994.50
Name= String Concat
Nanosecs/Op= 67.871, , Ops= 73,200,000, ElapsedMillisecs= 4,968.17
Name= Dictionary.Remove
Nanosecs/Op= 26.370, , Ops= 169,354,000, ElapsedMillisecs= 4,465.93


The Bench class uses the term “Iteration” to describe one call to the target method. If the target method doesn’t have a for loop, then the term “Operation” is the same as Iteration. But if the target method has a for loop, then Operation is a single execution of the body of the loop, and the number of operations executed will be the Iteration count times the loop count.

I should mention the normal caveats for casual benchmarking. This utility uses StopWatch, which measures clock time, not CPU time. This is an important point, because your results will vary some from run to run because of the other things going on in Windows and in the .Net runtime. If you need to do serious CPU time profiling, it’s better to just go to the commercial profiling tools. I tend to use both. I use the profiler to identify where I need to optimize, and use casual benchmarking while optimizing a chunk of code, to quickly see the effect of my changes. You can benchmark small bits of code, but when the time per operation is below 100 nanoseconds, don’t take the numbers too literally. Numbers that small are more prone to the effects of outside factors. You should also keep in mind the possible effects of optimizations made by the just-in-time compiler, which may even optimize away some of the code you are trying to benchmark.

Caveats aside, you can use Bench to do some pretty fine grained benchmarking, as the following example shows. This benchmark shows the difference between a normal mod operation and a mod of a power of 2.
    var bench = new Bench();
    int x;
    bench.Time("Mod 13", (c) => { for (int i = 0; i < c; i++) x = i % 13; });
    bench.Time("Mod 2^4", (c) => { for (int i = 0; i < c; i++) x = i % 16; });
The code for the parts of Bench used for timing is below, with comments stripped out. Use the download link at the top of this post to get the complete file with all the comments and documentation. There are also some methods in Bench for benchmarking the size of objects and struct instance, which are described in the next post.
// Copyright ©2013 SoftWx, Inc.
// Released under the MIT License the text of which appears at the end of this file.
// 1/3/2013
// <authors> Steve Hatchett

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;

namespace SoftWx.Diagnostics
{
    /// <summary>
    /// Tools for benchmarking code. The Bench class helps do timing benchmarks, and
    /// also object size determiniation. 
    /// </summary>
    public class Bench{
        
        public int MinIterations { get; set; }
        public int MinMilliseconds { get; set; }
        public bool WriteToConsole { get; set; }
                
        public Bench() : this(5, 5000, true) { }
        
        public Bench(int minIterations, int minMilliseconds, bool writeToConsole) {
            MinIterations = minIterations;
            MinMilliseconds = minMilliseconds;
            WriteToConsole = writeToConsole;
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, Action target) {
            if (target == null) throw new ArgumentNullException("target");

            return Time(name, 1, 
                        (c, tc) => { target();},
                        (c, tc) => { });
        }

        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, Action<TimeControl> target) {
            if (target == null) throw new ArgumentNullException("target");

            return Time(name, 1,
                        (c, tc) => { target(tc);},
                        (c, tc) => { });
        }

        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, Action<int> target) {
            if (target == null) throw new ArgumentNullException("target");
            int count = CalculateGoodCount((c, tc) => { target(c); });
            return Time(name, count, target);
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, Action<int, TimeControl> target) {
            if (target == null) throw new ArgumentNullException("target");
            
            int count = CalculateGoodCount(target);
            return Time(name, count, target);
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, int count, Action<int> target) {
            if (count < 0) throw new ArgumentOutOfRangeException("count");
            if (target == null) throw new ArgumentNullException("target");

            return Time(name, count,
                        (c, tc) => { target(c);},
                        (c, tc) => { for(int i = 0; i < c; i++) { } });
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, int count, Action<int, TimeControl> target) {
            if (count < 0) throw new ArgumentOutOfRangeException("count");
            if (target == null) throw new ArgumentNullException("target");
            
            return Time(name, count, target, 
                        (c, tc) => { for (int i = 0; i < c; i++) { } });
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        private TimeResult Time(string name, int count, Action<int, TimeControl> target, 
                                Action<int, TimeControl> overhead) {
            if (count < 1) count = 1;
            GC.Collect();
            GC.WaitForPendingFinalizers();

            TimeResult targetResult = Time(count, target, this.MinIterations, this.MinMilliseconds);
            TimeResult overheadResult = Time(count, overhead, targetResult.Iterations, 0);
            var adjustedTime = targetResult.Elapsed.Subtract(overheadResult.Elapsed);
            if (adjustedTime.Ticks < 0) adjustedTime = new TimeSpan(0);
            targetResult = new TimeResult(name, targetResult.Operations, adjustedTime, 
                                          targetResult.Iterations);
            if (this.WriteToConsole) {
                Console.WriteLine(targetResult.ToString());
            }
            return targetResult;
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        private TimeResult Time(int count, Action<int, TimeControl> target, 
                                long minIterations, long minMilliseconds) {
            TimeControl tc = new TimeControl();
            target(1, tc); // ensure the code being timed has been jitted
            tc.Reset();
            long iterations = 0;
            tc.StopWatch.Start(); // don't use tc Pause/Resume here, because we don't want to count overhead for that
            do {
                target(count, tc);
                iterations++;
            } while((tc.StopWatch.ElapsedMilliseconds < minMilliseconds) || (iterations < minIterations));
            tc.StopWatch.Stop();
            long totalOperations = (long) count * iterations;
            return new TimeResult(null, totalOperations, tc.Elapsed, iterations);
        }

        private int CalculateGoodCount(Action<int, TimeControl> target) {
            int count = 1;
            do {
                var result = Time(count, target, 0, 0);
                if (result.Elapsed.TotalMilliseconds > 1) {
                    break;
                } 
                count *= 10;
            } while (count < 1000000000);
            return count;
        }
        
        /// <summary>
        /// Allows control of what portions of the method are benchmarked, by
        /// pausing and resuming the timing through the TimeControl.
        /// </summary>
        public class TimeControl {
            private Stopwatch sw;
            private long pauseResumeOverhead; // ticks
            private long accumulatedOverhead; // ticks
            
            public TimeControl() {
                this.sw = new Stopwatch();
                CalculatePauseResumeOverhead(); // ensure it's jitted before using the calculated overhead
                CalculatePauseResumeOverhead();
            }
            
            public void Reset() {
                sw.Reset();
                this.accumulatedOverhead = 0;
            }
            
            public void Pause() { this.sw.Stop(); }
            
            public void Resume() { 
                this.accumulatedOverhead += pauseResumeOverhead;
                this.sw.Start(); 
            }

            internal TimeSpan Elapsed { 
                get {
                    var ticks = Math.Max(0L, this.sw.Elapsed.Ticks - this.accumulatedOverhead);
                    return new TimeSpan(ticks);
                } 
            }
            
            internal Stopwatch StopWatch { get { return this.sw; } }
                        
            [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
            private void CalculatePauseResumeOverhead() {
                Stopwatch sw = new Stopwatch();
                this.pauseResumeOverhead = 0;
                int count = 1000;
                long computed = 0;
                // try several times to get a non-zero result 
                for (int trys = 0; trys < 10; trys++) {
                    long ticks1 = 0;
                    sw.Reset();
                    sw.Start();
                    for (int i = 0; i < count; i++) { }
                    sw.Stop();
                    ticks1 = sw.Elapsed.Ticks;
                    sw.Reset();
                    sw.Start();
                    for (int i = 0; i < count; i++) {
                        sw.Stop();
                        sw.Start();
                    }
                    sw.Stop();
                    computed = (sw.Elapsed.Ticks - ticks1) / count;
                    if (computed >= 0) {
                        this.pauseResumeOverhead = computed;
                        break;
                    }
                }
            }            
        }
        
        /// <summary>Results of benchmark timing.</summary>
        public class TimeResult : IComparable<TimeResult> {
            private string name;
            private long operations;
            private TimeSpan elapsed;
            private long iterations;
            
            private TimeResult() { }
            
            public TimeResult(string name, long operations, TimeSpan elapsed, long iterations) {
                this.name = name;
                this.operations = operations;
                this.elapsed = elapsed;
                this.iterations = iterations;
            }
            
            public string Name { get { return this.name; } }
            
            public long Operations { get { return this.operations; } }
            
            public double ElapsedMilliseconds { get { return this.elapsed.TotalMilliseconds; } }
            
            public TimeSpan Elapsed { get { return this.elapsed; } }
            
            public long Iterations { get { return this.iterations; } }
            
            public double NanosecondsPerOperation {
                get { return (1000000.0 / this.operations) * this.elapsed.TotalMilliseconds; }
            }
            
            public double MicrosecondsPerOperation {
                get { return (1000.0 * this.elapsed.TotalMilliseconds) / this.operations; }
            }
            
            public double MillisecondsPerOperation {
                get { return this.elapsed.TotalMilliseconds / this.operations; }
            }
            
            public double NanosecondsPerIteration {
                get { return (1000000.0 / this.iterations) * this.elapsed.TotalMilliseconds; }
            }

            public double MicrosecondsPerIteration {
                get { return (1000.0 * this.elapsed.TotalMilliseconds) / this.iterations; }
            }

            public double MillisecondsPerIteration {
                get { return this.elapsed.TotalMilliseconds / this.iterations; }
            }
            
            public override string ToString() {
                string timePerOpsLabel;
                double timePerOps;
                if (MillisecondsPerOperation > 1) {
                    timePerOpsLabel = "Millisecs/Op";
                    timePerOps = MillisecondsPerOperation;
                } else if (MicrosecondsPerOperation > 1) {
                    timePerOpsLabel = "Microsecs/Op";
                    timePerOps = MicrosecondsPerOperation;
                } else {
                    timePerOpsLabel = "Nanosecs/Op";
                    timePerOps = NanosecondsPerOperation;
                }
                string debugger = System.Diagnostics.Debugger.IsAttached ? "(Debugger Attached)" : "";
                return string.Format("Name= {0} {1}\n{2}= {3}, , Ops= {4}, ElapsedMillisecs= {5}",
                                     name, debugger, timePerOpsLabel,
                                     timePerOps.ToString("0.000"),
                                     operations.ToString("#,##0"),
                                     ElapsedMilliseconds.ToString("#,##0.00"));

            }
            
            public int CompareTo(Bench.TimeResult other)    {
                return this.MillisecondsPerOperation.CompareTo(other.MillisecondsPerOperation);
            }
        }
    }
}