Saturday, January 12, 2013

Benchmarking C# Struct and Object Sizes

Getting the codez:
In the previous post, I described a class you can use to do casual execution time benchmarking of C# code. That Bench class also contains a couple methods that can be used to obtain memory size information about instances of classes or structs.

These methods measure the total memory consumed by an object or struct, including all the objects created as part of its creation. The methods work by taking a parameter that is a Func<T>, which is expected to create and return an object or struct instance for measurement. The technique used to measure the memory consumption is pretty simple; it just asks the garbage collector what the total memory consumed is both before and after the object creation, and notes the difference between the two. This may not give accurate results in a busy system, but during development, it seems to work pretty reliably. It works in either 32 or 64 bits, the results of which are slightly different from each other (identical objects or structs will often consume a little bit more memory in 64 bit builds).

There are two methods for obtaining memory consumption data. ByteSize returns a long value of the memory consumed by the object created by the specified Func. ByteSizeDescription returns a string describing the total memory consumed, as well as further information about the breakdown of that memory.
ByteSize and ByteSizeDescription will work for both objects and value types (like structs). Value types are handled by boxing them as objects so they consume heap memory, and then backing out the memory associated with the boxing overhead. Note that just as it is when measuring objects, for structs containing references to objects, the size returned by Bench’s methods will include not only the size of the reference in the struct, but also the size of the object referred to, if those objects are created by the Func method passed in.

There is a little fuzziness about the value type sizes, because .Net likes things to align on 4 byte boundaries in 32 bit builds, and 8 byte boundaries in 64 bit builds. It's also not entirely straightforward how much of an object's size is overhead and how much is content. Technically, there are 8 bytes of overhead for objects in 32 bit builds. However, an object will never take less then 12 bytes of size. For an empty object, 4 bytes of that are unused. An object with a single int member variable will also occupy 12 bytes - 8 bytes of overhead, plus 4 bytes for the int. As an object has more member variables, the size will increase, but in jumps of 4 bytes at a time. For 64 bit builds, there are 16 bytes of overhead, and a minimum of 8 bytes of content, whether it's used or not. To read more detail about this topic, I recommend these blog posts and articles:
Of Memory And Strings
How much much memory does a C# string take up?
and for the real nitty gritty detail (although a bit old)
Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects

The ByteSize function will return the aligned size of the struct returned by the passed in Func method. So, for instance, a struct containing a single byte field in a 32 bit build will return a size of 4 bytes. The ByteSizeDescription function will give additional information. Besides the deep aligned size of the struct returned by the Func method, it reports the deep packed size, and also the aligned size and packed size of a default instance of the struct. The reason for displaying aligned and packed versions can be seen when you have an array of structs. Using the example of a struct with a single byte field, an array of 1 element will have a size of 16 bytes (12 bytes of object overhead of the array object, plus 4 bytes for the single struct element). However, an array of 2, 3, or 4 elements will have this same size of 16 bytes. An array of 5 elements jumps the memory to 20 bytes. So the actual memory consumed by a struct may depend on how the struct is being used.

The following snippet of code shows some examples of using the ByteSize functions.
struct S1 { byte b; }
struct S2 { int i; }
struct S3 { string s; public S3(string s) { this.s = s;} }

public class Program {
    public static void Main(string[] args) {
        var bench = new Bench();
        Console.WriteLine(bench.ByteSize(() => { return new S1(); }));
        Console.WriteLine(bench.ByteSize(() => { return new S2(); }));
        Console.WriteLine(bench.ByteSize(() => { return new string('a', 16); }));
        Console.WriteLine(bench.ByteSize(() => { return new S3(new string('a', 16)); }));
        Console.WriteLine(bench.ByteSize(() => { return new int[16]; }));
        Console.WriteLine(bench.ByteSize(() => { 
            var d = new Dictionary<int,int>(10000);
            for (int i = 0; i < 10000; i++) d.Add(i,i);;
            return d;
         }));

        Console.WriteLine();
        Console.WriteLine(bench.ByteSizeDescription(() => { return new S1(); }) + "\n");
        Console.WriteLine(bench.ByteSizeDescription(() => { return new S2(); }) + "\n");
        Console.WriteLine(bench.ByteSizeDescription(() => { return new string('a', 16); }) + "\n");
        Console.WriteLine(bench.ByteSizeDescription(() => { return new S3(new string('a', 16)); }) + "\n");
        Console.WriteLine(bench.ByteSizeDescription(() => { return new int[16]; }) + "\n");
        Console.WriteLine(bench.ByteSizeDescription(() => { 
            var d = new Dictionary<int,int>(10000);
            for (int i = 0; i < 10000; i++) d.Add(i,i);;
            return d;
         }));

        Console.Write("done");
        Console.ReadKey();
    }
}
This produces the following output in 32 bit builds,

4
4
4
48
52
76
202136

S1 (struct): deep aligned size= 4 bytes, deep packed size= 1 bytes
    aligned default size= 4 bytes, packed default size= 1 bytes

S2 (struct): deep aligned size= 4 bytes, deep packed size= 4 bytes
    aligned default size= 4 bytes, packed default size= 4 bytes

String: size= 48 bytes, objOverhead= 8 bytes, content= 40 bytes

S3 (struct): deep aligned size= 52 bytes, deep packed size= 52 bytes
    aligned default size= 4 bytes, packed default size= 4 bytes

Int32[]: size= 76 bytes, objOverhead= 8 bytes, content= 68 bytes
    count= 16, avg/item= 4 bytes

Dictionary`2: size= 202136 bytes, objOverhead= 8 bytes, content= 202128 bytes
    count= 10000, avg/item= 20 bytes

and the following in 64 bit builds.

8
8
8
64
72
88
202192

S1 (struct): deep aligned size= 8 bytes, deep packed size= 1 bytes
    aligned default size= 8 bytes, packed default size= 1 bytes

S2 (struct): deep aligned size= 8 bytes, deep packed size= 4 bytes
    aligned default size= 8 bytes, packed default size= 4 bytes

String: size= 64 bytes, objOverhead= 16 bytes, content= 48 bytes

S3 (struct): deep aligned size= 72 bytes, deep packed size= 72 bytes
    aligned default size= 8 bytes, packed default size= 8 bytes

Int32[]: size= 88 bytes, objOverhead= 16 bytes, content= 72 bytes
    count= 16, avg/item= 4 bytes

Dictionary`2: size= 202192 bytes, objOverhead= 16 bytes, content= 202176 bytes
    count= 10000, avg/item= 20 bytes

The source code for the two ByteSize methods is included below. It’s also in the the full Bench class that can be downloaded from the link at the top of this post.
static private long overheadSize = ComputeOverheadSize();

/// <summary>
/// Determines the size of the object or struct created and returned by the
///    specified method. </summary>
/// <remarks>Should not be used in production! This is meant for use during
/// development, not as a general purpose sizeof function.</remarks>
/// <param name="maker">The method that creates and returns the object or 
/// struct whose size will be determined.</param>
/// <returns>The size in bytes of the object created by the method.</returns>
[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
public long ByteSize<T>(Func<T> maker) {
    if (maker == null) throw new ArgumentNullException("maker");

    long size = long.MinValue;
    long prevSize;
    int count = 0;
    // because we're using an unreliable method of obtaining size, repeat until
    // we get a confirmed result to help eliminate reporting spurious results.
    const int maxAttempts = 10;
    long dummy;
    do {
        prevSize = size;
        count++;
        if (typeof(T).IsValueType) {
            long objSize = ByteSize(() => { return (object) 1L; });
            long boxedSize = ByteSize(() => { return (object)maker(); });
            // size is the boxed size less the overhead of boxing
            size = (boxedSize - objSize) + sizeof(long);
        } else {
            object obj = null;
            long startSize = GC.GetTotalMemory(true);
            obj = maker(); 
            long endSize = GC.GetTotalMemory(true);
            size = endSize - startSize;
            // ensure object stays alive through measurement
            dummy = obj.GetHashCode();
        }
    } while ((count < maxAttempts) && ((size != prevSize) || (size <= 0)));
    return size;
}

/// <summary>
/// Returns a string describing details about the size of the object or struct
/// created by the specified method.
/// </summary>
/// <remarks>Should not be used in production! This is meant for use during
/// development, not as a general purpose sizeof function.</remarks>
/// <param name="maker">The method that creates and returns the object or struct
/// whose size will be determined.</param>
/// <returns>String describing details about the size of an object.</returns>
[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
public string ByteSizeDescription<T>(Func<T> maker) {
    if (maker == null) throw new ArgumentNullException("maker");
                
    // get size of target
    long byteSize = ByteSize(() => { return maker(); });
    string s = typeof(T).Name;
    if (typeof(T).IsValueType) {
        // special handling of value types (i.e. structs)
        long emptyArray = ByteSize(() => { return new T[0]; });
        long deepPacked = (ByteSize(() => {
            var x = new T[16];
            for (int i = 0; i < x.Length; i++) x[i] = maker();
            return x;
        }) - emptyArray) / 16;
        long alignedSize = ByteSize(() => { return new T[1]; }) - emptyArray;
        long packedSize = (ByteSize(() => { return new T[16]; }) - emptyArray) / 16;
        s += " (struct): deep aligned size= " + byteSize + " bytes, deep packed size= "
            + deepPacked + " bytes"
            + Environment.NewLine + "    aligned default size= " + alignedSize
            + " bytes, packed default size= " + packedSize + " bytes";
    } else {
        // handling of objects
        s += ": size= " + byteSize + " bytes"
            + ", objOverhead= " + overheadSize 
            + " bytes, content= " + (byteSize - overheadSize) + " bytes";
        if (typeof(System.Collections.ICollection).IsAssignableFrom(typeof(T))) {
            // special handling of classes implementing ICollection
            var coll = maker() as System.Collections.ICollection;
            int count = coll.Count;
            s += Environment.NewLine + "    count= " + count;
            if (count > 0) s += ", avg/item= " + ((byteSize - overheadSize) / count) + " bytes";
        }
    }
    return s;
}

static private long ComputeOverheadSize() {
    var bench = new Bench();
    long simpleSize = bench.ByteSize(() => { return new SimpleObject(); });
    return simpleSize - SimpleObject.InternalSize;
}


private class SimpleObject {
    public const int InternalSize = sizeof(int) * 2;
    int one;
    int two;
}

Sunday, January 6, 2013

Benchmarking C# code

Getting the codez:
Since writing this post, I've made improvements to the code described below. You can read about them in a newer post. The NuGet and GitHub links to the right are for the new version.
When doing casual benchmarking, I have often just thrown together a StopWatch and loop. I hate to admit the number of times I’ve wasted time thinking that a tweak I made was an improvement (or the opposite), only to discover after some head scratching that I had forgotten I’d changed the loop limit, and was comparing apples and oranges with regard to the previous benchmarks.
I’m working on some code that has me doing a fair amount of benchmarking and optimizing, so I decided to write a little utility to make benchmarking easier, and remove the source of past red herrings. I had the following goals:
  • Simple to use, with minimal coding needed
  • Produce results that are comparable, and independent of loop counts
  • Automatically figure out an appropriate number of times to execute the target code
  • Provide the ability to exclude set up code from the timing
  • Be able to benchmark small bits of code with very short run times
  • Remove the benchmarking overhead time (loop processing, etc.) from results
This post presents my first pass at it, and it seems to do the things I was looking for. The benchmark class achieves this using the StopWatch class, and a main timing loop. It takes an Action delegate for the code being benchmarked, which can be a normal method, or an anonymous method. In the case of code that has very small run times (under 1 microsecond), it’s difficult to get accurate results with a single loop that repeatedly calls the target method. The substantial overhead of looping and calling the method drowns out the tiny amount of time used by the target code, which gets lost in the noise. For those cases, the target method can have its own simple inner for loop. The count for the loop is passed in to the target method. This has two benefits. Since the benchmark utility knows the loop count, it can better calculate the associated overhead of the for loop and remove that from the result. Another benefit is that you can let the benchmark utility figure out the appropriate loop count your target method should use. In some cases you’ll want to control the loop count yourself, in cases where the loop count is an intrinsic part of what you’re benchmarking. For example, if I was benchmarking adding to a list, I might want to always benchmark the time to add 10,000 items. So I’d want to specify the count myself.

This benchmarking utility is all contained within a class called Bench. The following code snippet shows some examples of using the Bench class.

public class Program
{
    public static void Main(string[] args) {
        TimeSleep();
        TimeStringConcat();
        TimeDictionaryRemove();
        Console.Write("done");
        Console.ReadKey();
    }
    
    public static void TimeSleep() {
        new Bench().Time("Sleep", () => { Thread.Sleep(2); });
    }

    public static void TimeStringConcat() {
        string s1 = DateTime.UtcNow.ToString();
        string s2 = DateTime.UtcNow.ToString();
        string s;
        new Bench().Time("String Concat", (count) => {
            for (int i = 0; i < count; i++) s = s1 + s2;
        } );
    }
    
    public static void TimeDictionaryRemove() {
        new Bench().Time("Dictionary.Remove", 1000, DictionaryRemove );
    }

    public static void DictionaryRemove(int count, Bench.TimeControl tc) {
        tc.Pause();
        var d = new Dictionary<int, string>();
        for (int i = 0; i < count; i++) d.Add(i, i.ToString());
        tc.Resume();
        for (int i = 0; i < count; i++) d.Remove(i);
    }
}
The TimeSleep method shows the simplest use of Bench, using a lambda expression. The TimeStringConcat method shows timing a chunk of code with a very small execution time, so it is wrapped in a for loop and using the overload of Time that will let Bench calculate an appropriate loop count. That loop count is passed in as a parameter to the Action method. The Time method will return only the time for the concatenation. The overhead to repeatedly call the Action method, as well as the time of the for loop are removed from the final timing results. The last example of DictionaryRemove shows the use of TimeControl’s Pause/Resume to exclude the set up from the timing. I’ve chosen in this case to explicitly state the loop count since I’m using the loop count to build the Dictionary, and the size of the dictionary likely will affect the time of the Remove I’m timing. By explicitly specifying the loop count, I ensure all timings are comparable. This example also shows using a regular method instead of an anonymous method or lambda expression. Again, the time returned will be for just the d.Remove(i) code. The code above produces the following output to the Console:

Name= Sleep
Millisecs/Op= 2.006, , Ops= 2,490, ElapsedMillisecs= 4,994.50
Name= String Concat
Nanosecs/Op= 67.871, , Ops= 73,200,000, ElapsedMillisecs= 4,968.17
Name= Dictionary.Remove
Nanosecs/Op= 26.370, , Ops= 169,354,000, ElapsedMillisecs= 4,465.93


The Bench class uses the term “Iteration” to describe one call to the target method. If the target method doesn’t have a for loop, then the term “Operation” is the same as Iteration. But if the target method has a for loop, then Operation is a single execution of the body of the loop, and the number of operations executed will be the Iteration count times the loop count.

I should mention the normal caveats for casual benchmarking. This utility uses StopWatch, which measures clock time, not CPU time. This is an important point, because your results will vary some from run to run because of the other things going on in Windows and in the .Net runtime. If you need to do serious CPU time profiling, it’s better to just go to the commercial profiling tools. I tend to use both. I use the profiler to identify where I need to optimize, and use casual benchmarking while optimizing a chunk of code, to quickly see the effect of my changes. You can benchmark small bits of code, but when the time per operation is below 100 nanoseconds, don’t take the numbers too literally. Numbers that small are more prone to the effects of outside factors. You should also keep in mind the possible effects of optimizations made by the just-in-time compiler, which may even optimize away some of the code you are trying to benchmark.

Caveats aside, you can use Bench to do some pretty fine grained benchmarking, as the following example shows. This benchmark shows the difference between a normal mod operation and a mod of a power of 2.
    var bench = new Bench();
    int x;
    bench.Time("Mod 13", (c) => { for (int i = 0; i < c; i++) x = i % 13; });
    bench.Time("Mod 2^4", (c) => { for (int i = 0; i < c; i++) x = i % 16; });
The code for the parts of Bench used for timing is below, with comments stripped out. Use the download link at the top of this post to get the complete file with all the comments and documentation. There are also some methods in Bench for benchmarking the size of objects and struct instance, which are described in the next post.
// Copyright ©2013 SoftWx, Inc.
// Released under the MIT License the text of which appears at the end of this file.
// 1/3/2013
// <authors> Steve Hatchett

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;

namespace SoftWx.Diagnostics
{
    /// <summary>
    /// Tools for benchmarking code. The Bench class helps do timing benchmarks, and
    /// also object size determiniation. 
    /// </summary>
    public class Bench{
        
        public int MinIterations { get; set; }
        public int MinMilliseconds { get; set; }
        public bool WriteToConsole { get; set; }
                
        public Bench() : this(5, 5000, true) { }
        
        public Bench(int minIterations, int minMilliseconds, bool writeToConsole) {
            MinIterations = minIterations;
            MinMilliseconds = minMilliseconds;
            WriteToConsole = writeToConsole;
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, Action target) {
            if (target == null) throw new ArgumentNullException("target");

            return Time(name, 1, 
                        (c, tc) => { target();},
                        (c, tc) => { });
        }

        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, Action<TimeControl> target) {
            if (target == null) throw new ArgumentNullException("target");

            return Time(name, 1,
                        (c, tc) => { target(tc);},
                        (c, tc) => { });
        }

        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, Action<int> target) {
            if (target == null) throw new ArgumentNullException("target");
            int count = CalculateGoodCount((c, tc) => { target(c); });
            return Time(name, count, target);
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, Action<int, TimeControl> target) {
            if (target == null) throw new ArgumentNullException("target");
            
            int count = CalculateGoodCount(target);
            return Time(name, count, target);
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, int count, Action<int> target) {
            if (count < 0) throw new ArgumentOutOfRangeException("count");
            if (target == null) throw new ArgumentNullException("target");

            return Time(name, count,
                        (c, tc) => { target(c);},
                        (c, tc) => { for(int i = 0; i < c; i++) { } });
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        public TimeResult Time(string name, int count, Action<int, TimeControl> target) {
            if (count < 0) throw new ArgumentOutOfRangeException("count");
            if (target == null) throw new ArgumentNullException("target");
            
            return Time(name, count, target, 
                        (c, tc) => { for (int i = 0; i < c; i++) { } });
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        private TimeResult Time(string name, int count, Action<int, TimeControl> target, 
                                Action<int, TimeControl> overhead) {
            if (count < 1) count = 1;
            GC.Collect();
            GC.WaitForPendingFinalizers();

            TimeResult targetResult = Time(count, target, this.MinIterations, this.MinMilliseconds);
            TimeResult overheadResult = Time(count, overhead, targetResult.Iterations, 0);
            var adjustedTime = targetResult.Elapsed.Subtract(overheadResult.Elapsed);
            if (adjustedTime.Ticks < 0) adjustedTime = new TimeSpan(0);
            targetResult = new TimeResult(name, targetResult.Operations, adjustedTime, 
                                          targetResult.Iterations);
            if (this.WriteToConsole) {
                Console.WriteLine(targetResult.ToString());
            }
            return targetResult;
        }
        
        [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
        private TimeResult Time(int count, Action<int, TimeControl> target, 
                                long minIterations, long minMilliseconds) {
            TimeControl tc = new TimeControl();
            target(1, tc); // ensure the code being timed has been jitted
            tc.Reset();
            long iterations = 0;
            tc.StopWatch.Start(); // don't use tc Pause/Resume here, because we don't want to count overhead for that
            do {
                target(count, tc);
                iterations++;
            } while((tc.StopWatch.ElapsedMilliseconds < minMilliseconds) || (iterations < minIterations));
            tc.StopWatch.Stop();
            long totalOperations = (long) count * iterations;
            return new TimeResult(null, totalOperations, tc.Elapsed, iterations);
        }

        private int CalculateGoodCount(Action<int, TimeControl> target) {
            int count = 1;
            do {
                var result = Time(count, target, 0, 0);
                if (result.Elapsed.TotalMilliseconds > 1) {
                    break;
                } 
                count *= 10;
            } while (count < 1000000000);
            return count;
        }
        
        /// <summary>
        /// Allows control of what portions of the method are benchmarked, by
        /// pausing and resuming the timing through the TimeControl.
        /// </summary>
        public class TimeControl {
            private Stopwatch sw;
            private long pauseResumeOverhead; // ticks
            private long accumulatedOverhead; // ticks
            
            public TimeControl() {
                this.sw = new Stopwatch();
                CalculatePauseResumeOverhead(); // ensure it's jitted before using the calculated overhead
                CalculatePauseResumeOverhead();
            }
            
            public void Reset() {
                sw.Reset();
                this.accumulatedOverhead = 0;
            }
            
            public void Pause() { this.sw.Stop(); }
            
            public void Resume() { 
                this.accumulatedOverhead += pauseResumeOverhead;
                this.sw.Start(); 
            }

            internal TimeSpan Elapsed { 
                get {
                    var ticks = Math.Max(0L, this.sw.Elapsed.Ticks - this.accumulatedOverhead);
                    return new TimeSpan(ticks);
                } 
            }
            
            internal Stopwatch StopWatch { get { return this.sw; } }
                        
            [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
            private void CalculatePauseResumeOverhead() {
                Stopwatch sw = new Stopwatch();
                this.pauseResumeOverhead = 0;
                int count = 1000;
                long computed = 0;
                // try several times to get a non-zero result 
                for (int trys = 0; trys < 10; trys++) {
                    long ticks1 = 0;
                    sw.Reset();
                    sw.Start();
                    for (int i = 0; i < count; i++) { }
                    sw.Stop();
                    ticks1 = sw.Elapsed.Ticks;
                    sw.Reset();
                    sw.Start();
                    for (int i = 0; i < count; i++) {
                        sw.Stop();
                        sw.Start();
                    }
                    sw.Stop();
                    computed = (sw.Elapsed.Ticks - ticks1) / count;
                    if (computed >= 0) {
                        this.pauseResumeOverhead = computed;
                        break;
                    }
                }
            }            
        }
        
        /// <summary>Results of benchmark timing.</summary>
        public class TimeResult : IComparable<TimeResult> {
            private string name;
            private long operations;
            private TimeSpan elapsed;
            private long iterations;
            
            private TimeResult() { }
            
            public TimeResult(string name, long operations, TimeSpan elapsed, long iterations) {
                this.name = name;
                this.operations = operations;
                this.elapsed = elapsed;
                this.iterations = iterations;
            }
            
            public string Name { get { return this.name; } }
            
            public long Operations { get { return this.operations; } }
            
            public double ElapsedMilliseconds { get { return this.elapsed.TotalMilliseconds; } }
            
            public TimeSpan Elapsed { get { return this.elapsed; } }
            
            public long Iterations { get { return this.iterations; } }
            
            public double NanosecondsPerOperation {
                get { return (1000000.0 / this.operations) * this.elapsed.TotalMilliseconds; }
            }
            
            public double MicrosecondsPerOperation {
                get { return (1000.0 * this.elapsed.TotalMilliseconds) / this.operations; }
            }
            
            public double MillisecondsPerOperation {
                get { return this.elapsed.TotalMilliseconds / this.operations; }
            }
            
            public double NanosecondsPerIteration {
                get { return (1000000.0 / this.iterations) * this.elapsed.TotalMilliseconds; }
            }

            public double MicrosecondsPerIteration {
                get { return (1000.0 * this.elapsed.TotalMilliseconds) / this.iterations; }
            }

            public double MillisecondsPerIteration {
                get { return this.elapsed.TotalMilliseconds / this.iterations; }
            }
            
            public override string ToString() {
                string timePerOpsLabel;
                double timePerOps;
                if (MillisecondsPerOperation > 1) {
                    timePerOpsLabel = "Millisecs/Op";
                    timePerOps = MillisecondsPerOperation;
                } else if (MicrosecondsPerOperation > 1) {
                    timePerOpsLabel = "Microsecs/Op";
                    timePerOps = MicrosecondsPerOperation;
                } else {
                    timePerOpsLabel = "Nanosecs/Op";
                    timePerOps = NanosecondsPerOperation;
                }
                string debugger = System.Diagnostics.Debugger.IsAttached ? "(Debugger Attached)" : "";
                return string.Format("Name= {0} {1}\n{2}= {3}, , Ops= {4}, ElapsedMillisecs= {5}",
                                     name, debugger, timePerOpsLabel,
                                     timePerOps.ToString("0.000"),
                                     operations.ToString("#,##0"),
                                     ElapsedMilliseconds.ToString("#,##0.00"));

            }
            
            public int CompareTo(Bench.TimeResult other)    {
                return this.MillisecondsPerOperation.CompareTo(other.MillisecondsPerOperation);
            }
        }
    }
}