The String Intern Pool

By | February 12, 2012

String forms a large part of a program. Irrespective of what the program is used for, almost 70% of the actual program code (approx) is generally formed by strings. As we all know .NET strings are immutable. Producing large number of string constants inside a program or assigning same string to more than one objects can eat up  the object heap. .NET provides String Intern Pool to help in optimizing the usage of strings.

String Intern Pool is a special table allocated on Large Object Heap which maintains references of all strings that are created on a program. CLR keeps track of those strings that are created in memory and uses it when it needs to create the same string again. This ensures that new memory is not used whenever the content of the string is not different.

String class uses Intern method to retrieve reference of the string object created on memory. Let us test this using code :

string string1 = "This is a string";
string string2 = "This is a string";

if((object)string1 == (object)string2)
Console.WriteLine("The references of string1 and string2 are equal");
else
Console.WriteLine("The references of string1 and string2 are unequal");

Console.ReadKey(true);

The above will print “The references of string1 and string2 are equal”. Why ? We have created two references of string? Didn’t we? Well lets check the IL that is generated by the code first :

Here in the above IL screenshot you can see in both cases the compiler calls “ldstr” to load the string into memory.  If you look into documentation of ldstr, you will see that MSDN says “The Common Language Infrastructure (CLI) guarantees that the result of two ldstr instructions referring to two metadata tokens that have the same sequence of characters return precisely the same string object” . This is done using optimization of strings while loading strings into memory using String Intern Pool.

When the program starts, when the first string object is created, the string is interned into memory for next usage. Loading the same string again, the program automatically getting the interned string equivalent from the String Intern Pool table and gets reference to the same table.

Similarly, if I just change the code a little :

string string1 = "This is a string";
string string2 = new StringBuilder().Append("This is a string").ToString();

if((object)string1 == (object)string2)
Console.WriteLine("The references of string1 and string2 are equal");
else
Console.WriteLine("The references of string1 and string2 are unequal");

Console.ReadKey(true);

Now the output will show “The references of string1 and string2 are unequal”. Actually StringBuilder maintains a new reference of string as a sequence of characters and when ToString is called, it gets the same memory location rather than the one that is interned in the object heap.

The string.Intern method is used to get reference of the string equivalent from the String Intern Pool table. For instance if I use

string string2 = string.Intern(new StringBuilder().Append("This is a string").ToString());

It will retrieve the interned string from the pool table and which is referred to the same reference. This will ensure that the references of the strings are equal.

Note: 

Remember, string Intern pool is stored into the LOH (Large Object Heap) section of application memory. Even though LOH is GCed (obviously very rarely) the string references that are created in the String Intern Pool are unlikely to be removed from memory unless the process terminates. Interning a large number of strings are often tends to have side effects. Another important fact is, to get reference the Interned string we first need to create an object of string in memory. Even though the memory is eventually be garbage collected, yet it needs to add additional memory pressure on the CLR. 

I hope this post helped.

Happy programming.