Performance Zone is brought to you in partnership with:

Sasha Goldshtein is a Senior Consultant for Sela Group, an Israeli company specializing in training, consulting and outsourcing to local and international customers.Sasha's work is divided across these three primary disciplines. He consults for clients on architecture, development, debugging and performance issues; he actively develops code using the latest bits of technology from Microsoft; and he conducts training classes on a variety of topics, from Windows Internals to .NET Performance. You can read more about Sasha's work and his latest ventures at his blog: http://blogs.microsoft.co.il/blogs/sasha. Sasha writes from Herzliya, Israel. Sasha is a DZone MVB and is not an employee of DZone and has posted 160 posts at DZone. You can read more from them at their website. View Full User Profile

Lap Around Roslyn CTP: Syntax Rewriting

01.12.2013
| 713 views |
  • submit to reddit

To start doing something useful with Roslyn, we’re going to inspect a syntax tree, locate something interesting—and then modify it! The complex structure of a C# program’s syntax tree (SyntaxTree class) is exposed through a fairly intuitive object model, featuring three types of entities:

Nodes are the major elements of the language; for example, an IfStatementSyntax is a node representing an “if” statement and a LiteralExpressionSyntax is a node representing a literal expression.

Tokens are secondary elements—which are nonetheless very important—such as identifiers, string literals, and numeric literals. Tokens are always attached to a node. For example, an IfStatementSyntax node will have an ExpressionSyntax node as its Condition property, and that might turn out to be a BinaryExpressionSyntax node with a Left and Right  properties describing more ExpressionSyntax nodes and an OperationToken token that describes the operation.

Trivia are all the rest—preprocessor directives, whitespace, comments—riding on top of tokens.

The syntax tree, of course, has parent-child relationships between nodes, and there’s a set of APIs for traversing these relationships, such as DescendantNodes and FirstAncestor.

Syntax trees can be created from source code very easily. It is also a very quick process, because no binding or code emission takes place—only the lexer and parser are involved in the construction of the tree. (In the next post we’ll look at the semantic model as well, which requires symbol construction and binding.)

SyntaxTree tree = SyntaxTree.ParseCompilationUnit(@"
using System;
public class MyClass {
    public static void MyMethod() {
        Console.Write(""Hello There {0}"", 42);
        Console.Write(42);
    }
}
");
Console.WriteLine(tree.Root.GetFullText());

Inspecting the tree in the debugger visualizer (supplied with the Roslyn CTP as a sample) shows the following structure:

image

You can inspect and modify syntax trees directly, but the easier way would be to use a visitor class derived from SyntaxWalker or SyntaxRewriter. A quick code demo is better than a thousand words describing it, so here’s a rewriter that will modify numeric literals with the value 42 to the value 43:

/// <summary>
/// Replaces the numeric literal 42 with the
/// numeric literal 43.
/// </summary>
class MyLiteralRewriter : SyntaxRewriter
{
    protected override SyntaxNode VisitLiteralExpression(
        LiteralExpressionSyntax node)
    {
        if (node.Kind ==
            SyntaxKind.NumericLiteralExpression)
        {
            SyntaxToken token = node.Token;
            if (token.Value is int &&
                (int)token.Value == 42)
            {
                return node.ReplaceToken(
                    token, Syntax.Literal(
                           token.LeadingTrivia,
                           "43", 43,
                           token.TrailingTrivia));
            }
        }
        return node;
    }
}

Note that the VisitLiteralExpression method does not modify the node—it either returns the existing node, or returns a new node with a new literal token. The entire Roslyn API surface is like that—all objects are immutable, and you create new objects off existing ones.

How is this visitor applied to a syntax tree? To apply it, we need to give it the tree root, and it will return a new tree root. This new tree root can be compiled, analyzed, or simply … serialized to text:

SyntaxNode newRoot =
    new MyLiteralRewriter().Visit(newRoot);           
tree = SyntaxTree.Create(
    tree.FileName, (CompilationUnitSyntax)newRoot);
Console.WriteLine(tree.Root.GetFullText());

It goes without saying that this rewriter will detect only numeric literals—it will not match the number 42 when it appears in comments or in strings, as the more fallible regex-based approach may.

In the next post, we’ll look into a somewhat more complicated syntax rewriting visitor, which will require the semantic model of the code (i.e., symbols and their meanings) and not just the syntactic information.



Published at DZone with permission of Sasha Goldshtein, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)