The Rat-hole of Object-oriented Mapping

Mark Seemann recently had a great post that, as most of his posts seem to do, challenges the average reader to re-think their reality a bit. The post is titled "Is Layering Worth the Mapping". In the post Mark essentially details some of the grief and friction that developers face when they need to decouple two or more concepts by way of an abstraction.

Mark takes the example of layering. Layering is a one way coupling between two "layers" where the "higher-level" layer takes a dependency on abstractions in a "lower-level" layer. Part of his example is a UI layer communicates with a domain layer about musical track information. That track information that is communicated lives in a hand-crafted Track abstraction. Typically this abstraction would live with the lower-level layer to maintain the unidirectional coupling. Of course the UI layer needs a Track concretion for it to do its job and must map between the higher-level layer and the lower-level layer. To further complicate things other decoupling may occur within each layer to manage dependencies. The UI may implement an MVx pattern in which case there may be a specific "view-model" track abstraction, the data layer may employ object-relational mapping, etc. etc.

Mark goes on to describe some "solutions" that often fall out of scenarios like this in a need to help manage the sheer magnitude of the classes involved: shared DTOs as cross-cutting entities, POCO classes, classes with only automatic properties, etc.

It's not just layering. Layering lives in this grey area between in-memory modules and out-of-process "tiers". Layering, I think, is an attempt to get the benefits of out-of-process decoupling without the infrastructure concerns of connecting and communicating between out-of-process processes. Of course, over and above the module/assembly separation, the only thing enforcing this decoupling in layers is skill and being methodical.

I'm convinced layering is often, or often becomes, a "speculative generality" to give some "future proofing" to the application since layering so closely resembles "tiering" (not to be confused with the eventual response of "tearing") as to make it easy to make it tiered should there ever be a need for it. To be clear, this is the wrong impetus to design a software solution. You're effectively setting yourself up to fail by essentially "making up" requirements that are more than likely going to be wrong. If the requirements for the design are based on fallacies, they too are designed wrong. But, you have to continue to maintain this design until you re-write it (ever noticed that anti-pattern?).

But, implementing tiers or any sort of communication between processes often ends up in the same state. You have internal "domain" entities within the processes (and even within logical boundaries within those processes) that end up spawning the need for "DTO" objects that live at the seams on one or either side of the communication. Further that, many times that communication is facilitated by frameworks like WCF that create their own DTOs (SOAP envelopes for example). Except you're mandated by the physical boundaries of processes and you're forced to do things like shared-type assemblies to model the cross-cutting "entities" (if you choose that cross-cutting "optimization") introducing a whole new level of effort and a massive surface area for attracting human error (you've technically introduced the need for versioning, potentially serialization, deployment issues, etc. etc.).

Creating an object-oriented type to simply act as a one-way container to something that lives on the other size of some logical or physical boundary has appeared to me to be a smell for quite a while. e.g. the UI layer in Mark's original example has this concept of a "Track" DTO-like type that when used is only used in one direction at a time. When moving from the UI to the domain layer it's only written to. If it gets a "track" back from the domain layer the UI layer only reads from it. Abstracting this into an OO class seems pointless and, as Mark says, not particularly beneficial.

Let's look specifically at the in-memory representation of something like a "Track". We'll limit our self and say that we need four Track abstractions: one for the view model, one for the domain layer abstraction, one for the data layer abstraction, and one for the object-relational mapping. (I've assumed that the data layer may not have a track "entity" and is only responsible for pushing data around). So, in effect we have four Track DTO classes in our system (and two or three Track "entities"). But, if we look at the in-memory representation of instances of these objects they're effectively identical—each one can't really have more data than another otherwise there's something wrong. If we look at what's actually happing with this data, we're really writing a lot of code to copy memory around in a really inefficient way. The DTO classes in part become the way to copy memory. To be fair, this is a side-effect of the fact we're manually mapping from one abstraction to another or from one abstraction to an entity (or vice-versa).

This type of thing isn't entirely unknown; it sometimes goes by the name of ceremony.

For the most part, I think computer languages are hindering us in our ability to address this. Languages in general tend to maintain this specific way of messaging called method-calling that limits our ability to communicate only information that can be encapsulated by the language's or platform's type-system. But, to a certain extent we're also hindered by our myopia of "everything must be a type in language X". Maybe this is another manifestation of Maslow's Hammer.

Imagine if you removed all the mapping code in a complex system—especially a distributed system—and were left with just the "domain" code. I've done this with one system and I was astounded that over 75% of the code in the system had nothing to do with the systems "domain" (the "value-add") and was "ceremony" to facilitate data mapping.

I sometimes hear this isn't so much of a problem with specific frameworks. I'm often told that these frameworks do all the heavy lifting like this for us. But, they really don't. The frameworks really just introduce another seam. The issue of Impedance Mismatch isn't just related to object-relational mapping. I has to do with any mapping where both sides aren't constrained by the same rules. I've blogged about this before. but I can use some "data framework" to generate "entities" based on a data model or even based on POCO's. Some view this as solving the problem; but it doesn't. Each side operates under different rules. The generated classes can only have as much impedance as what it has to communicate with, and you have to plan that that's different than the impedance you'll end up mapping from/to. The only real solution to this is to introduce another DTO to map between your domain and the classes generated by the framework so you are decoupled from the eventual "gotchas" where your domain has different expectations or rules than the framework you're communicating with. When people don't do this, you see all sorts of complains like "date/time in X isn't the same as what I need", etc.

Don't fall into this rut. Think about what you're doing; if you're got 4 DTOs to hold the same data; maybe there's a better way of doing it. Try to come up with something better and blog about it or at least talk about the problem out in the open like Mark.