Performance Zone is brought to you in partnership with:

I specialise MySQL Server performance as well as in performance of application stacks using MySQL, especially LAMP. Web sites handling millions of visitors a day dealing with terabytes of data and hundreds of servers is king of applications I love the most. Peter is a DZone MVB and is not an employee of DZone and has posted 250 posts at DZone. You can read more from them at their website. View Full User Profile

Implementing SchemaSpy in your MySQL environment

06.05.2013
| 3883 views |
  • submit to reddit
This post comes from  at the MySQL Performance Blog.


SchemaSpyLately I have been working with a set of customers on a longer term basis which has given me time to explore new tools using their environments.  One tool that I am finding very helpful is called SchemaSpy.

SchemaSpy is a Java-based tool (requires Java 5 or higher) that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It’s also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.

One of the things that I like about  ERD tools is that I can quickly generate a graphic representation of the database to see which tables are referenced the most, and which are candidates for dropping (why keep old data if you don’t need to?).  I get up to speed a lot faster and can contribute to the Development process a lot more effectively when I know the relationships between tables versus observing queries only.  An additional benefit is that SchemaSpy is command-line driven and builds the html after each run, so I find it convenient to set it up on crontab so that schema changes are automatically picked up.  Nice, eh?

Usage of this tool is very straightforward and it does a lot of the work for you.  Basically if your database has Foreign Keys then you’re laughing, if not don’t despair, you just have a bit more work ahead of you.  Most of the time SchemaSpy “does the right thing” but sometimes you need to give it a little help in the form of metadata files.

Installation of SchemaSpy and Dependencies
  • Download the latest copy of SchemaSpy jar file (5.0.0 at time of writing)
  • Make sure you have latest copy of Java JRE for your platform
  • You will need a Java driver for your database — I’m using the MySQL Connector/J
  • Install the graphiz package
Creating a mysql.properties file

While not explicitly required, I prefer to create this once so that the  command line is shorter and neater.  In my case here is the properties file in use for my Percona Server 5.6.10 sandbox:

description=MySQL
driver=com.mysql.jdbc.Driver
connectionSpec=jdbc:mysql://127.0.0.1:5610/schemaspy
driverPath=/usr/share/java/mysql-connector-java.jar

Example Schema
CREATE TABLE `parent` (
 `parent_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 PRIMARY KEY (`parent_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `child_A` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `parent_id` int(10) unsigned DEFAULT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `child_B` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `parent_id` int(10) unsigned DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `parent_id` (`parent_id`),
 CONSTRAINT `child_B_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`parent_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Notice that in the case of child_A there is an implicit FK relationship to parent via parent_id, however in child_B it is explicit.  This has an impact on how SchemaSpy identifies the relationship, and whether you need to provide a hint or not.

Running SchemaSpy (with Foreign Keys)

When you run schemaSpy without a metadata file, you will see that the relationship between parent and child_B is properly displayed:

FK_only

Table child_A can be made to display if you tag the Implied relationships box, but this generally only works if you have used the exact same name (in this case, parent_id) in the child and parent tables.  Your environment may be like a lot of shops that we see, where the parent table is known as id, and on the child tables they reference it as parent_id — this might make sense to Developers but it doesn’t help SchemaSpy at all.  This is where Metadata files come in to play

implied

Creating a Metadata File

The format of the metadata file is XML-based.  I have included the metadata file I used to properly link child_A to parent table:

<?xml version="1.0" encoding="UTF-8"?>
<schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://schemaspy.sourceforge.net/xmlschema/2011/02/05/schemaspy.meta.xsd">
<comments>
Main Production Database. Percona, Michael Coburn
</comments>
<tables>
<table name="parent" comments="Parent table">
<column name="parent_id" primaryKey="true">
</column>
</table>
<table name="child_A" comments="Non-FK relationship, implicit relationship to parent">
<column name="id" primaryKey="true">
</column>
<column name="parent_id">
<foreignKey table="parent" column="parent_id"/>
</column>
</table>
</tables>
</schemaMeta>

If you’re looking for other examples you can find a more comprehensive example here.

Running SchemaSpy

The syntax below if specific for MySQL, but note that SchemaSpy works for pretty much all the popular RDBMS out there.

java -jar schemaSpy_5.0.0.jar -t mysql -u msandbox_ro -p msandbox -meta schemaspy.meta.xml -o /var/www/schemaspy/

The -o directive tells SchemaSpy where to write the output to, make sure this is a directory that can then be accessed by a web server.  Also note that since I created the mysql.properties file in the same directory as the jar file, I don’t need to pass any host:port or schema name information.

Viewing SchemaSpy output

A completed Relationships view of these three tables should look like this:

after_metadata_application

Final Thoughts

While I find the Relationships tab the most useful component of SchemaSpy, I don’t want to leave you with the impression that this is the only component of the tool.  There are additional tabs:

  • Tables – Names, number of children, parents, count of columns, row counts, and comments — a great way for a high level overview of the table sizes and an easy way to use the search feature of your browser to zero in on a particular table
  • Constraints – lists the explicit Foreign Key constraints in the database (this does not include constraints identified via metadata files!)
  • Anomalies – Identifies possible relationships between columns/tables based on names, tables without indexes, columns flagged ‘nullable’ and ‘must be unique’ (woops!), single column tables, incrementing column names in tables, and tables with the string NULL instead of the actual SQL NULL value. This is basically a quick sanity check of your schema for any significant errors or items requiring review.
  • Columns – a listing of all columns in the schema, really handy to sort by name to see if you have any implicit constraints that you might have missed and can then write into your metadata file.
  • Donate – This is free software and John Currier asks for donations so he can justify the time spent maintaining SchemaSpy to his wife :)

Finally don’t forget to automate SchemaSpy via crontab once you’re done.

What are some ERD tools you use and how do they compare with SchemaSpy?  Feel free to answer via the comments.  Thanks for reading!

Published at DZone with permission of Peter Zaitsev, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)