NoSQL Zone is brought to you in partnership with:

.NET developer/architect. Been coding since 1996. Jonas is a DZone MVB and is not an employee of DZone and has posted 20 posts at DZone. You can read more from them at their website. View Full User Profile

Data Layer, the Right Way

01.15.2013
| 2720 views |
  • submit to reddit

 

The goal with this post is to give you a better understand about how you can design your data layer and why it’s important to create a complete abstraction layer.

Whats wrong?

From the very beginning the applications has been data centric. That is, the database has been the very heart of the applications. Everything that was built was (and in some means still is) just a glorified administration interface for the database (especially CRUD applications). To make things worse, some still believe that stored procedures are essential and that parts (if not all) business rules should be stored in them.

To date most tables in the database are modeled according to 3NF (the third normal form). We get a clean database with very few repeated records.

The only thing that really has happened is that the database has been hidden behind an OR/M and we say that our applications is using classes modeled after the business (since the OR/M generate classes & mappings for us).

Building an application

Let’s start by examining the typical creation of an application (reminds me of a quote that I read a long time ago: The process of building application is like the making of sausages: It should never be watched).

Data Layers 

Typical problems

Here are some typical problems which can surface when you are using an OR/M without proper abstractions.

  • Have you ever struggled with a LINQ query which just doesn’t want to work?
  • Poor performance due to lazy loading
  • The generated SQL turns out to be slow
  • Breaking changes in new version of the OR/M
  • Complex and/or highly coupled models thanks to all relations in the db
What is an abstraction?

Wikipedia defines “abstraction” like this:

Abstraction is the process by which data and programs are defined with a representation similar in form to its meaning (semantics), while hiding away the implementation details. Abstraction tries to reduce and factor out details so that the programmer can focus on a few concepts at a time

So unlike what many believes, you do not create abstractions just to be able to switch OR/M or database engine. You create it to reduce complexity.

Before using an abstraction you have to remember all details for both layers when coding. The amount of information that you have to remember depends on the kind of tool/library you use in your data layer. For OR/Ms you have to remember how the data layer handles lazy loading, how to specifically eager load items and how each entity has been configured. For ADO.NET you have to remember the table structure and all data types. If you use your OR/M you have probably auto generated all entities. And by doing so you have forced yourself to use the data model as your business model. If you have used an OR/M for a while you have probably noticed that it’s not the best thing for anything but trivial applications.

So before using an abstraction you have to look at your code like this:

abstraction_before

By using an abstraction you don’t have to care about how your OR/M or database engine works. All you need to know is that the data layer has a set of classes and methods which you invoke to fetch and store information.

abstraction

Your OR/M is not an abstraction

If you use your OR/M as the data layer, your probably have a lot of code like:

public void Summarize(int userId)
{
    // loading from db
    var carts = _dbSession.Query<ShoppingCart>().Where(x => x.UserId == userId);
 
    // some business logic
    var summary = 0d;
    foreach (var cart in carts)
    {
        summary += cart.Total;
    }
 
    return summary;
}

It looks simple enough.

But how do we test it?

We have to make sure that it calls the DB with the correct set of parameters, right? Mocking LINQ providers is not fun, so that’s out of the question. We can create a fake memory DB by using SQLite or similar. But that doesn’t guarantee that everything works with the production db (SQL Server or whatever).

How about the mappings? To guarantee that everything works we have to do integration tests. That is to use real queries against a real DB.

So it might look like the method is only responsible of doing the business logic. But it’s in fact also responsible of reading the data in a correct way too. The OR/M is just a convenience layer which loads the information from the DB for us. It doesn’t abstract away the data access. Any change to the OR/M or the database WILL affect every piece of code that uses the OR/M and the database.

Our requirements will always change. Those changes will always affect your code. By using abstraction it’s easier to handle those (since you do it in one place at a time).

Using a correct abstraction

Let’s instead change the code to:

public void Summarize(int userId)
{
// loading from db
var carts = _shoppingCartRepos.FindForUser(userId);
// some business logic
var summary = 0d;
foreach (var cart in carts)
{
summary += cart.Total;
}
return summary;
}

Now we are using an abstraction. The actual data access is hidden. We do not have to know OR/M implementation details like mappings, eager/lazy loading or OR/M specific LINQ statements. We just know that we should call the FindForUser method to get the data that we need.

We can now test the method easily. The only tests we need for the repositories is integration tests to make sure that all statements are working correctly.

Summary

Complete abstractions do take a longer time to write. But ask yourself. What will you spend most time on? Maintaining the application or writing it from scratch? Let’s say that the application got an expected lifetime of four years and that it takes one year to write it. Then you got three years in which you should maintain it. Do you think that those three years will be easier or harder with abtractions? It might be worth to spend some time extra in the beginning…

The right way

Hopefully you have now understood why we need to create some kind of abstraction from the database and that the OR/M isn’t it. But that is only part of the solution. Ask yourself: Why are you creating the application? Is it to get a nicely modeled database that your application has to follow? Or is it to solve the problem that the client has?

Firstrule

Brad tells us:

0.1872147469154859

The database

The crazy thing is that we have let the database (as in RDBMS) dictate our applications way to long. And we have modeled our databases in the same way (using the normal forms) since 1971. That’s 42 years you lazy coders (I used a calculator =)). That’s INSANE!

It’s time to end the madness. Dare to say no. The database IS just storage.

saidno

If that’s not helping. Listen to Queen Elisabeth (no disrespect meant).

its_storage

0.9660675303844571

Is the database all bad then? No. But we have to mentally switch from letting it control our application to just be a utility in which we store our information. How can we do that?

Model the domain first

One thing has been mentioned before, and that is to model the domain / business first and then the database. We create a set of classes like Person, ChatMessage etc in the first iteration. We then test them and demonstrate the application to the client.

After that we add the representations of those classes in the database. That is, we model the DB iterative too.

Denormalize

Normalization is not all good. Storage is cheap today. Think twice before you normalize if normalization would increase the complexity in your application.

If you for instance only store the shipping address in the DB (i.e. you do not search or index it) there is really no need for a separate table / relation. Store the address in a ShippingAddress column of the Orders table. (The user can even have a ShippingAdress column in the User table, which you copy it from when the Order is placed).

SRP applies to the DB too

Your application may have different target audiences who do different things in the application. For instance accounting takes care of all invoices while shipping takes care of packaging and sending of the packages. And finally you have the users who places orders.

In a situation like that it would be tempting to create a Users table which contains all fields for the accounting employees, the users and the shipping employees. A more subtle approach would be to let the user table represent the application users and then create a table AccountingUsers with a FK to the Users table.

Those approaches increases complexity and indirectly makes all persons in the system dependent of changes to the Users table.

It’s far better to create three different kind of person tables with no relation between them. You might even go so far as creating different databases. One for shipping, one for accounting and one for the core application. We do use code to make the different parts communicate with each other, right? No need for a single database.

Replace it

There are several schemaless (NoSQL) databases out there today. Many of them have matured enough to be used in production. If I can choose I would in most cases pick a NoSQL db. It lowers the friction between storage and code a lot.

Mix!

Nothing says that a single database solution (be RDBMS or NoSQL) are the right tool for the job. NoSQL might be a good fit for shipping while accounting should use RDBMS so that they can do their analytics.

Summary

Since the data source(s) are abstracted away we get a lot more freedom in our applications. We don’t have to try to make our applications fit the database but can instead fit database after our application.

Building applications

So if the database is not the thing that we should start with, what should we start with? How about something like this:

On Building Applications

You have hopefully got something to think about now. You don’t have to go about and change how you build your applications, but hopefully you think twice before choosing to go down a specific path.

The data access alternatives

When working with the database in the data layer we have in fact several approaches to choose from. The most basic approach is ADO.NET and the OR/Ms are by far the most complex solution. So why is it that the OR/Ms have gained so much momentum? It’s because they generate a lot of stuff for you. You are lazy. We all are. The thing is that what we gain in the beginning may be lost during the road. That’s why you need to know about the alternatives.

This blog post became a bit larger than I intended. I’m therefore moving the alternatives to seperate posts. They are however:

  • Vanilla ADO.NET
  • Data mappers
  • OR/M
  • NoSQL

Data layer patterns

There are also some patterns which can be used for the data layer (or involves the data layer). Those that I recommend are:

  • Repository pattern
  • Query object
  • CQS (Command/query seperation)
  • CQRS (Command/Query Segregation Principle)

Those will also be discussed in a separate post.

Published at DZone with permission of Jonas Gauffin, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)