Performance Zone is brought to you in partnership with:

Sasha Goldshtein is a Senior Consultant for Sela Group, an Israeli company specializing in training, consulting and outsourcing to local and international customers.Sasha's work is divided across these three primary disciplines. He consults for clients on architecture, development, debugging and performance issues; he actively develops code using the latest bits of technology from Microsoft; and he conducts training classes on a variety of topics, from Windows Internals to .NET Performance. You can read more about Sasha's work and his latest ventures at his blog: http://blogs.microsoft.co.il/blogs/sasha. Sasha writes from Herzliya, Israel. Sasha is a DZone MVB and is not an employee of DZone and has posted 203 posts at DZone. You can read more from them at their website. View Full User Profile

On ‘stackalloc’ Performance and The Large Object Heap

10.26.2013
| 1400 views |
  • submit to reddit

An interesting blog post is making the rounds on Twitter, 10 Things You Maybe Didn’t Know About C#.  There are some nice points in there, such as using the FieldOffset attribute to create unions or specifying custom add/remove accessors for events.

However, item #4 on the list claims that using stackalloc is not faster than allocating a standard array. The proof is given in form of a benchmark program that allocates 10,000 element arrays – so far so good – and then proceeds to store values in them. The values are obtained by using Math.Pow. The benchmark results show that using a stackalloc-ed array is not faster than using a heap-allocated array. I was able to successfully reproduce this result on my machine.

Unfortunately, this benchmark cannot support the claim “Using stackalloc is not faster than allocating a standard array”. This benchmark shows that if you need to calculate and store powers of large numbers in an array, it doesn’t matter from a speed perspective if you store them in a heap-allocated array or in a stack-allocated array.

The benchmark also has additional problems – it does not allow the program to warm up, it does not display statistics over the results (average, min, max, standard deviation), and it only uses a single array size, namely 10,000 elements. I refer you to Eric Lippert’s blog series on micro-benchmarking, and also Chapter 2 in Pro .NET Performance [available in part on my blog], for a more detailed treatment of micro-benchmarking best practices.

I made a few changes to the benchmark program so that it:

  1. Allocates arrays of varying sizes and compares the performance for each size. The sizes I used were 10 – 4010 (step 500) and 6000 – 96000 (step 10,000).
  2. Repeats the measurement 5 times and displays the average, min, and max times.
  3. Warms up by calling each function at least once before measuring anything.

The final version of the benchmark I used is here.

The raw results are below (times are in milliseconds):

Array size: 10
Heap Array  - AVG = 0.85394, MIN = 0.5395, MAX = 1.0125
Stack Array - AVG = 0.23302, MIN = 0.2055, MAX = 0.3028
Array size: 510
Heap Array  - AVG = 7.22854, MIN = 6.9549, MAX = 7.4369
Stack Array - AVG = 6.38972, MIN = 5.8119, MAX = 6.8206
Array size: 1010
Heap Array  - AVG = 12.53882, MIN = 10.0942, MAX = 14.1942
Stack Array - AVG = 10.92058, MIN = 10.5542, MAX = 11.5668
Array size: 1510
Heap Array  - AVG = 17.6329, MIN = 13.9484, MAX = 19.9116
Stack Array - AVG = 15.12048, MIN = 14.0906, MAX = 16.5836
Array size: 2010
Heap Array  - AVG = 21.68702, MIN = 20.5229, MAX = 22.5228
Stack Array - AVG = 19.7369, MIN = 19.0708, MAX = 21.2995
Array size: 2510
Heap Array  - AVG = 22.80914, MIN = 22.2542, MAX = 23.0967
Stack Array - AVG = 23.6264, MIN = 23.4815, MAX = 23.7922
Array size: 3010
Heap Array  - AVG = 27.27846, MIN = 26.4928, MAX = 28.2462
Stack Array - AVG = 27.96436, MIN = 27.798, MAX = 28.1741
Array size: 3510
Heap Array  - AVG = 32.08726, MIN = 30.7842, MAX = 35.0712
Stack Array - AVG = 33.52876, MIN = 32.6167, MAX = 35.2137
Array size: 4010
Heap Array  - AVG = 35.50722, MIN = 35.0341, MAX = 36.1861
Stack Array - AVG = 37.25514, MIN = 37.1412, MAX = 37.4977
Array size: 4510
Heap Array  - AVG = 39.46634, MIN = 39.2239, MAX = 40.0683
Stack Array - AVG = 43.32186, MIN = 41.906, MAX = 46.0319
Array size: 6000
Heap Array  - AVG = 51.58616, MIN = 51.2831, MAX = 52.1263
Stack Array - AVG = 56.75872, MIN = 55.4964, MAX = 58.3105
Array size: 16000
Heap Array  - AVG = 139.4916, MIN = 136.2144, MAX = 148.4627
Stack Array - AVG = 148.48424, MIN = 148.1188, MAX = 148.7517
Array size: 26000
Heap Array  - AVG = 405.05262, MIN = 398.2785, MAX = 412.4401
Stack Array - AVG = 242.50534, MIN = 242.1933, MAX = 242.9641
Array size: 36000
Heap Array  - AVG = 570.28486, MIN = 549.2248, MAX = 631.8309
Stack Array - AVG = 339.10658, MIN = 335.6826, MAX = 341.6048
Array size: 46000
Heap Array  - AVG = 716.25066, MIN = 692.9255, MAX = 741.0803
Stack Array - AVG = 435.29708, MIN = 431.5396, MAX = 442.7383
Array size: 56000
Heap Array  - AVG = 871.52846, MIN = 862.8028, MAX = 881.2801
Stack Array - AVG = 532.46454, MIN = 526.9268, MAX = 536.4717
Array size: 66000
Heap Array  - AVG = 1015.86848, MIN = 995.1994, MAX = 1040.2593
Stack Array - AVG = 627.60132, MIN = 623.1848, MAX = 637.2896
Array size: 76000
Heap Array  - AVG = 1201.39392, MIN = 1159.1772, MAX = 1236.559
Stack Array - AVG = 725.7904, MIN = 717.0982, MAX = 730.3976
Array size: 86000
Heap Array  - AVG = 1361.79128, MIN = 1336.9031, MAX = 1394.9041
Stack Array - AVG = 819.4298, MIN = 812.741, MAX = 827.5579
Array size: 96000
Heap Array  - AVG = 1554.59876, MIN = 1538.4992, MAX = 1577.4435
Stack Array - AVG = 916.43956, MIN = 911.0777, MAX = 919.0551

If you plot the averages, you get the following:

image

It’s immediately evident that something happened between 16,000 and 26,000 elements that created a gap in the results (in favor of stack allocations) – and this gap is only getting bigger when you increase the array sizes.

That “something” is the large object heap. Objects larger than 85,000 bytes are allocated from the large object heap, which 1) is not compacted and 2) is collected only with a full generation 2 collection. 85,000 bytes are enough for 16,000 integers but not enough for 26,000 integers.

We can conclude that the larger the allocated arrays are, the bigger the overhead of using the large object heap to allocate them. On the other hand, we can’t really allocate arbitrarily-sized arrays from the stack at all. The default stack size for a Windows thread is 1MB, so allocating an array with 100,000 integers is already a considerable portion of that space. But that’s a whole different question.

To summarize, we were able to show that stack allocations are faster than large object heap allocations if you consider the GC times as part of the benchmark.

Published at DZone with permission of Sasha Goldshtein, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)