Performance Zone is brought to you in partnership with:

Sasha Goldshtein is a Senior Consultant for Sela Group, an Israeli company specializing in training, consulting and outsourcing to local and international customers.Sasha's work is divided across these three primary disciplines. He consults for clients on architecture, development, debugging and performance issues; he actively develops code using the latest bits of technology from Microsoft; and he conducts training classes on a variety of topics, from Windows Internals to .NET Performance. You can read more about Sasha's work and his latest ventures at his blog: http://blogs.microsoft.co.il/blogs/sasha. Sasha writes from Herzliya, Israel. Sasha is a DZone MVB and is not an employee of DZone and has posted 160 posts at DZone. You can read more from them at their website. View Full User Profile

Lap Around Roslyn CTP: Syntax Rewriting with Symbol Information

01.13.2013
| 783 views |
  • submit to reddit

Last time around, we were replacing the 42 numeric literal with 43. This time let’s pretend to do something more useful. Suppose you really don’t like developers calling the Console.Write method and insist on using Console.WriteLine instead. You might be slightly reluctant to use find-and-replace, because—just like last time—you don’t want to modify Console.Write calls within comments, within string literals, or—and this is vicious—calls to the Console.Write method on something that is not the System.Console class from the mscorlib assembly, like maybe a property called Console!

The C# parser, which we met in its SyntaxTree incarnation, doesn’t bind MethodInvocationExpression instances to the actual method being invoked. All it cares about is the proper structure of the expression. For all it care, Console could be a private class, and not the BCL one.

Enter the semantic model (SemanticModel class), which represents everything the compiler knows about your code after binding the syntax tree to symbols. In this case, the semantic model will give us a symbol for the Console.Write invocation expression, and we’ll be able to tell whichConsole.Write is being invoked and replace it accordingly.

To obtain a SemanticModel instance, we need to provide Roslyn with all the information to perform binding—i.e., the assembly references for our code. Recall that we could create a SyntaxTree without specifying those! This information is wrapped by a Compilation class instance, which can be used (eventually) to emit actual code.

Compilation compilation = Compilation.Create(
    "MyCompilation",
    CompilationOptions.Default,
    new SyntaxTree[] { tree },
    new MetadataReference[] {
        new AssemblyFileReference(
            typeof(object).Assembly.Location)
    },
    null, null);
SemanticModel model = compilation.GetSemanticModel(tree);

Now that we have the semantic model, we can use pass it to our rewriter’s constructor:

/// <summary>
/// Replaces Console.Write calls with equivalent
/// Console.WriteLine calls.
/// </summary>
class MyConsoleWriteRewriter : SyntaxRewriter
{
    private readonly SemanticModel _semanticModel;

    public MyConsoleWriteRewriter(SemanticModel model)
    {
        _semanticModel = model;
    }

    protected override SyntaxNode
        VisitInvocationExpression(
            InvocationExpressionSyntax node)
    {
        SemanticInfo info =
            _semanticModel.GetSemanticInfo(node);
        MethodSymbol symbol = (MethodSymbol)info.Symbol;
        if (symbol.Name == "Write" &&
            symbol.ContainingType.Name == "Console" &&
            symbol.ContainingNamespace.Name == "System" &&
            symbol.ContainingAssembly.Name == "mscorlib")
        {
            MemberAccessExpressionSyntax old =
                (MemberAccessExpressionSyntax)
                node.Expression;
            return node.ReplaceNode(
                old,
                old.Update(
                    old.Expression,
                    old.OperatorToken,
                    Syntax.IdentifierName("WriteLine")));
        }            
        return node;
    }
}

To understand what’s going on here, let’s take a look at the structure of the MethodInvocationExpression node for a typical Console.Write call:

image

The MethodInvocationExpression, in this case, consists of a MemberAccessExpression, which specifies the method to invoke, and an ArgumentList that specifies the arguments. Because we trust Console.WriteLine to accept the same arguments Console.Write accepts, we don’t need to touch the ArgumentList node. Moreover, we don’t even need to touch the first IdentifierName under the MemberAccessExpression—all we need to replace is the second IdentifierName.

Therefore, we return a new node from our VisitInvocationExpression method whenever we have something to replace the existing node with. Specifically, we ask the semantic model to give us symbol information for the method invocation expression—if it matches the System.Console.Write method from the mscorlib assembly, we keep the entire expression except the method name identifier.

Of course, to apply this rewriter to our tree, we need to provide to it the SemanticModel instance retrieved earlier:

SyntaxNode newRoot =
    new MyConsoleWriteRewriter(model).Visit(tree.Root);

This actually starts looking useful. Next time, we won’t bother with rewriting, but instead perform more complicated analysis of the syntax tree and semantic model, including data flow and control flow within a method.



Published at DZone with permission of Sasha Goldshtein, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)