Performance Zone is brought to you in partnership with:

Sasha Goldshtein is a Senior Consultant for Sela Group, an Israeli company specializing in training, consulting and outsourcing to local and international customers.Sasha's work is divided across these three primary disciplines. He consults for clients on architecture, development, debugging and performance issues; he actively develops code using the latest bits of technology from Microsoft; and he conducts training classes on a variety of topics, from Windows Internals to .NET Performance. You can read more about Sasha's work and his latest ventures at his blog: http://blogs.microsoft.co.il/blogs/sasha. Sasha writes from Herzliya, Israel. Sasha is a DZone MVB and is not an employee of DZone and has posted 204 posts at DZone. You can read more from them at their website. View Full User Profile

Revisiting Value Types vs. Reference Types

04.11.2013
| 2589 views |
  • submit to reddit

Why do C#, the .NET Framework, and the CLR need value types and reference types? Why two categories of types? Why the added complexity in training developers to understand why and when to use each type of type?

There are many answers, but very few get to the crux of the matter. You could try to justify the need for two types of types by looking at the semantic differences C# affords each. For example, you know that by default, instances of value types are copied when passed to a function, but instances of reference types are not -- only the references are copied. Or, you could say that by default, the Equals method compares whether two instances of a reference type are identical (point to the same memory location), but for instances of value types it compares their contents. There are many other superficial semantic differences, too. But do they justify having two types of types?

It seems that the standard reasons of "stack vs. heap", "by value vs. by reference", "identity vs. contents" are not by themselves enough to justify the associated language and implementation complexity of having two categories of types. Here is an attempt at an alternative explanation.

Consider C#, Java, and C++ for a moment. All three languages have types that are more lightweight than others:

Language Lightweight Types Heavyweight Types
Java Primitives (int, boolean) Object types (String, Integer)
C# Value types Reference types
C++ Structs/classes without virtual methods Structs/classes with virtual methods

What do the "heavyweight" types have in common? Instances of these types -- in all three languages -- are provided some additional services by the compiler/runtime/execution environment at the expense of additional overhead. These services, and that overhead, are the reason for having two types of types.

What are those services, then? Although it somewhat depends on the language and environment, all three examples above afford heavyweight types with support for polymorphism, namely virtual method invocation. Virtual methods in Java, C#, and C++ rely on a method table stored in memory, and pointed to from the header of each object instance. This pointer ("vfptr"" in C++, "method table pointer" in the CLR) is the overhead, the cost heavyweight types must pay for a service that lightweight types do not have access to.

+---Ref Type Instance---+
| Object Header Word    |
| Method Table Pointer ------> +---Method Table (Simplified)---+
| ... object fields ... |      | Ptr to Base MT                |
+-----------------------+      | Ptr to Object.Equals          |
                               | Ptr to Object.ToString        |
                               | ... additional methods ...    |
                               +-------------------------------+

In the CLR and the JVM, reference types enjoy additional services on top of virtual method invocation. For example, reference types participate in Monitor synchronization: you can use the C# lock or Java synchronized keyword to use an arbitrary reference type instance for synchronization. Additionally, both the CLR and the JVM offer garbage collection for heap objects. Both of these services require additional memory overhead associated with each reference type.

It is not the semantic difference in copying vs. passing a reference or comparing identity vs. comparing contents that explains why two types of types are so prevalent. The additional services -- supporting virtual methods, object synchronization, garbage collection, finalization -- make the overhead necessary for reference types. This very overhead is not acceptable for small, primitive types of which millions of instances are likely to be required. Integers, floats, characters, Booleans, two-dimensional points, rectangle coordinates cannot afford to waste 4-16 bytes of overhead per instance.

This is why C#, Java, and C++ have two categories of types -- even if you don't think of them as different categories. And this is also why you should consider using value types: not because they make it easier to copy objects by value or compare their contents, but because they do not pay the cost of services you will not require of them.

Published at DZone with permission of Sasha Goldshtein, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)