Performance Zone is brought to you in partnership with:

Sasha Goldshtein is a Senior Consultant for Sela Group, an Israeli company specializing in training, consulting and outsourcing to local and international customers.Sasha's work is divided across these three primary disciplines. He consults for clients on architecture, development, debugging and performance issues; he actively develops code using the latest bits of technology from Microsoft; and he conducts training classes on a variety of topics, from Windows Internals to .NET Performance. You can read more about Sasha's work and his latest ventures at his blog: http://blogs.microsoft.co.il/blogs/sasha. Sasha writes from Herzliya, Israel. Sasha is a DZone MVB and is not an employee of DZone and has posted 202 posts at DZone. You can read more from them at their website. View Full User Profile

Runtime Representation of Generics

08.27.2012
| 3751 views |
  • submit to reddit

This is an excerpt from Chapter 5 (Collections and Generics) of Pro .NET Performance, scheduled to appear in less than a month. I might be publishing a few more of these before and after the book is out.

Before we can concern ourselves with the runtime implementation of generics, it’s paramount to ask whether there is a runtime representation of generics—as we will see shortly, C++ templates, a similar mechanism, have no runtime representation to speak of. This question is easily answered if you look at the wonders Reflection can do with generic types at runtime:

Type openList = typeof(List<>);
Type listOfInt = openList.MakeGenericType(typeof(int));
IEnumerable<int> ints = (IEnumerable<int>
          Activator.CreateInstance(listOfInt);
Dictionary<string, int> frequencies =
          new Dictionary<string, int>();
Type openDictionary =
          frequencies.GetType().GetGenericTypeDefinition();
Type dictStringToDouble = openDictionary.MakeGenericType(
          typeof(string), typeof(double));

As you see, we can dynamically create generic types from an existing generic type and parameterize an “open” generic type to create an instance of a “closed” generic type. This demonstrates that generics are first-class citizens and have a runtime representation, which we now survey.

The syntactic features of CLR generics are fairly similar to Java generics and even slightly resemble C++ templates. However, it turns out that their internal implementation and the limitations imposed upon programs using them are very different from Java and C++. To understand these differences we should briefly review Java generics and C++ templates.

A generic class in Java can have generic type parameters, and there is even a constraint mechanism quite similar to what .NET has to offer (bounded type parameters and wildcards). For example, here’s a first attempt at a generic list in Java:

public class List<E> {
  private E[] items;
  private int size;
  public List(int initialCapacity) {
    items = new E[initialCapacity];
  }
  public void add(E item) {
    if (size < items.Length – 1) {
      items[size] = item;
      ++size;
    } else {
      //Allocate a larger array, copy the elements,
      //then place ‘item’ in it
    }
  }
  public E getAt(int index) {
    if (index < 0 || index >= size)
      throw IndexOutOfBoundsException(index);
    return items[index];
  }
  //Many more methods omitted for brevity
}

Unfortunately, this code does not compile. Specifically, the expression new E[initialCapacity] is not legal in Java. The reason has to do with the way Java compiles generic code. The Java compiler removes any mentions of the generic type parameter and replaces them with java.lang.Object, a process called type erasure. As a result, there is only one type at runtime — List, the raw type — and any information about the generic type argument provided is lost. (To be fair, by using type erasure Java retains binary compatibility with libraries and applications that were created before generics — something that .NET 2.0 does not offer for .NET 1.1 code.)

Not all is lost, though. By using an Object array instead, we can reconcile the compiler and still have a type-safe generic class that works well at compilation time:

public class List<E> {
  private Object[] items;
  private int size;
  public void List(int initialCapacity) {
    items = new Object[initialCapacity];
  }
  //The rest of the code is unmodified
}

List<Employee> employees = new List<Employee>(7);
employees.add(new Employee(“Kate”));
employees.add(42); //Does not compile!

However, adopting this approach in the CLR voices a concern: what is to become of value types? One of the two reasons for introducing generics was to avoid boxing at any cost. Inserting a value type to an array of objects requires boxing and is not acceptable.

Compared to Java generics, C++ templates may appear enticing. (They are also extremely powerful: you may have heard that the template resolution mechanism in itself is Turing-complete.) The C++ compiler does not perform type erasure — quite the opposite — and there’s no need for constraints, because the compiler is happy to compile whatever you throw in its way. Let’s start with the list example, and then consider what happens with constraints:

template <typename T>
class list {
private:
  T* items;
  int size;
  int capacity;
public:
  list(int initialCapacity) :
    size(0), capacity(initialCapacity) {
    items = new T[initialCapacity];
  }
  void add(const T& item) {
    if (size < capacity) {
      items[size] = item;
      ++size;
    } else {
      //Allocate a larger array, copy the elements,
      //then place ‘item’ in it
    }
  }
  const T& operator[](int index) const {
    if (index < 0 || index >= size)
      throw exception(“Index out of bounds”);
    return items[index];
  }
  //Many more methods omitted for brevity
};

The list template class is completely type-safe: every instantiation of the template creates a new class that uses the template definition as a… template. Although this happens under the hood, here is an example of what it could look like:

//Original C++ code:
list<int> listOfInts(14);

//Expanded by the compiler to:
class __list__int {
private:
  int* items;
  int size;
  int capacity;
public:
  __list__int(int initialCapacity) :
    size(0), capacity(initialCapacity) {
    items = new int[initialCapacity];
  }
};
__list__int listOfInts(14);

Note that the add and operator[] methods were not expanded —the calling code did not use them, and the compiler generates only the parts of the template definition that are used for the particular instantiation. Also note that the compiler does not generate anything from the template definition; it waits for a specific instantiation before it produces any code.

This is why there is no need for constraints in C++ templates. In a binary search example, the following is a perfectly reasonable implementation:

template <typename T>
int BinarySearch(T* array, int size, const T& element) {
  //At some point in the algorithm, we need to compare:
  if (array[x] < array[y]) {
    ...
  }
}

There’s no need to prove anything to the C++ compiler. After all, the template definition is meaningless; the compiler waits patiently for any instantiations:

int numbers[10];
//Compiles, int has an operator <
BinarySearch(numbers, 10, 42);

class empty {};
empty empties[10];
//Does not compile, empty does not have an operator <
BinarySearch(empties, 10, empty());

Although this design is extremely tempting, C++ templates have unattractive costs and limitations, which are undesirable for CLR generics:

  • Because the template expansion occurs at compile-time, there is no way to share template instantiations between different binaries. For example, two DLLs loaded into the same process may have separate compiled versions of list<int>. This consumes a large amount of memory and causes long compilation times, by which C++ is renowned.
  • For the same reason, template instantiations in two different binaries are considered incompatible. There is no clean and supported mechanism for exporting template instantiations from DLLs (such as an exported function that returns list<int>).
  • There is no way to produce a binary library that contains template definitions: template definitions exist only in source form, as header files that can be #included into a C++ file.
In the next post, we’ll take a look at how CLR generics are implemented at runtime.

 

 

 

 

 

 

Published at DZone with permission of Sasha Goldshtein, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)