What’s the plan?
“Why is this so slow?” Have you put your application into testing (you did this before going to production, right?) and wondered why you’re not getting VoltDB’s world-class performance? The problem might be with the SQL execution plan.
This article shows you how to look at the SQL execution plans and use the information to tune your application.
Getting the plans at compile time
In VoltDB, you can get execution plan information when you compile your stored procedures or later, when the database is up and running. When you build the application catalog using the voltdb compile command, the console shows information about the stored procedures, and writes detailed plan information to the debugoutput directory.
Console output plan summary
The console output contains a summary and some annotations that tell you basic information about your queries. In V3.2.1, the annotations are:
[MP] / [SP] – multi-partition / single-partition procedure
[READ] / [WRITE] – read-only / read-write procedure. Any INSERT, UPDATE, or DELETE SQL statement will mark a procedure as WRITE, even if it is never executed. (Prior to VoltDB V3.2.1, the annotations were [RO] and [RW])
[TABLE SCAN] – contains one or more sequential table scans. (Prior to VoltDB V3.2.1, the annotation was [SEQ])
[NDO] – non-deterministic order – a statement in a [WRITE] procedure that could return rows in an arbitrary order (e.g. SELECT * FROM mytable)
[NDC] – non-deterministic content – a statement in a RW procedure that could return arbitrary rows (e.g. SELECT * from mytable LIMIT 2)
As an example, compile the code in the voter example. The Results query outputs the following:
[MP][READ] Results [TABLE SCAN]SELECT a.contestant_name AS contestant_name, a.contestant_number...
This tells you that your procedure is multi-partition, read-only and contains a sequential table scan. It then shows which statement has the scan (here we have only one).
For performance, review statements and procedures marked with the [MP] and [Seq] annotations. Too many multi-partition transactions or any sequential scans on large tables may slow down your application.
For correctness, fix all the occurrences of [NDO] and [NDC] statements in [WRITE] procedures. These are important – so they also show up as WARNINGs on the console and in the log file. In a read-write procedure, non-deterministic results could be used later in the procedure as input to a write statement, causing copies of the same data to diverge. The VoltDB compiler is conservative and gives a warning on any non-deterministic SQL in a read-write procedure – it does not try to determine if the write actually uses the results of the non-deterministic read. Adding a unique index to the schema or an ORDER BY clause to the query can make the query deterministic. For more information see the documentation about VoltDB Stored Procedures and Determinism.
In VoltDB V3.2.1, we added a summary report, showing counts of procedure types as well as explanations of the annotations.
Detailed execution plan output
The detailed plan output is saved in the debugoutput/statement-winner-plans directory. Each statement in each procedure has a plan description. For example, the Results statement plan is:
RETURN RESULTS TO STORED PROCEDURE ORDER BY (SORT) AGGREGATION ops: sum NESTLOOP INDEX JOIN inline (INDEX SCAN of "CONTESTANTS" using "SYS_IDX_PK_CONTESTANTS_10019" (unique-scan covering)) RECEIVE FROM ALL PARTITIONS SEND PARTITION RESULTS TO COORDINATOR SEQUENTIAL SCAN of "V_VOTES_BY_CONTESTANT_NUMBER_STATE"
With this plan, you can see where an index is being used (the index scan of Contestants) and where it is not (the sequential scan of the view, v_votes_by_contestant_number_state). In addition, the lines for SEND TO / RECEIVE FROM partitions is only present in multi-partition procedures.
Note: The output also contains a ‘cost’ number. These numbers are used by the compiler to evaluate plans, but have no absolute meaning or scale. You should not rely on these to compare different SQL statements, nor different versions of the VoltDB compiler. The numbers are subject to change and may even go away at some point.
In the case of the Results procedure in Voter, we chose to do nothing. If this were a real application, we would make sure that Results was called infrequently, since it is multi-partition. With the low frequency of calls, and the cardinality of the view for the sequential scan being low and unlikely to change (it is # of contestants * # of states), we would also choose to leave the sequential scan in place.
Getting the plans from a running database
Once the database is up you can get the plan from the interactive tools, sqlcmd or WebStudio, using the explain and explainproc commands. These commands map to the system procedures, @Explain and @ExplainProc, respectively. You can use these system procedures directly through the VoltDB client API to develop your own live admin tool that can monitor query plan effects as your database and application evolve.”
Both commands show the same detailed plan information that is in the compiler’s debugoutput folder.
Use explainproc to show any of the compiled stored procedures in the database. Use explain followed by valid SQL, to see the plan for any single ad-hoc statement. Note, SQL that is compiled into stored procedures can take advantage of more information than ad hoc plans, such as partition directives. As a result, the explain output for adhoc may be different (typically less optimized) than the same SQL used in a stored procedure. If you want to know how a statement will behave in a stored procedure, you can create a single statement procedure without having to write any Java code. If you find that explain reports a better query plan for a statement than you get from explainproc, this may indicate that the partition directives for the stored procedure are missing or incorrect.
I’ve got the plan, now what?
The query execution plans can give you valuable information about how your application will perform. Here are a few things to remember.
Single-partition is faster than multi-partition. VoltDB can parallelize single-partition executions, so make these the bulk of your application, especially the latency-sensitive operations. If you use multi-partition procedures, don’t call them too often. A multi-partition procedure call can delay all other procedure calls while it coordinates its work across all the partitions.
Make sure that the most frequent access to large tables uses an index. As you plan your application, keep track of which tables might grow over time. We’ve had a few customers run into bottlenecks a short while after going live, due to growth of their tables and aggregation queries that don’t use an index.
Do not allow non-determinism in your procedures. VoltDB would rather shut down the system, than allow an inconsistency.
If you think you have a better plan than the one selected by the planner (for example, you’ve provided an index, but the plan isn’t using it), post an example to the VoltDB Forum.
For more information on performance, see the VoltDB Performance Guide.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)