Trivial SQL ORMs considered harmful

Our team has a little “things I learned this week” tradition in our team meetings, and it just blossomed onto our mailing list (async is better!).

In one such post, Michael pointed to sqldelight, a library to automatically generate Android SQL-handling code for a typed schema and a set of queries.

I wrote a little screed advising caution, which Margaret suggested would make a good blog post… so here it is, unedited.

Note that I have nothing against automated schema and query checking, nor against saving error-prone typing; my primary objection here is to the object mapping.

Michael notes:

It’s a square library that allows you to define your tables & queries in a separate text file and it will auto-generate table creation and methods of querying. To do so, it creates Objects which represent the row of your DB.

and I reply:

At the risk of being a negative nelly: broadly speaking I find this kind of trivial ORM to be a terrible design anti-pattern, and I strongly discourage its use for anything but saving some typing before committing a v0. We implemented something like this on the iOS side of the house, and it was a huge pain in the ass to get rid of later.

If your system is simple enough that you’re putting whole objects in and getting whole objects out — that is, a simple ORM is a good fit — you should instead be not using SQLite.  Serialize your objects to a flat file in JSON and keep them in memory. Up to about 100KB of data, it’s better in almost every way. (There are some exceptions, but they’re exceptions.)

For everyone else, your inputs and outputs will differ, or you’ll need more control, and so you should run screaming from sqldelight.

There are at least five reasons why I feel this way. I’ll stop at five to avoid writing an epic.

  1. Database tables really come into their own when you join them: bookmarks against favicons, hockey players against teams and games. If you join them (particularly with left/outer/etc. joins), your ORM needs to bulk up the generated model objects with optional fields; it has to, otherwise it can’t represent the result of the join.

    Those optional fields leak throughout your app — hey, is that favicon ID supposed to be set here? Does it need to be set to -1 sometimes? — and make your life unpleasant.

  2. SELECT * is an anti-pattern in database work. You might not need all of the fields, but requesting them all limits the indices that the storage layer can use. A smart storage engine can use compound indices to make some queries with limited projections very fast indeed. Or perhaps you want to get unique values.

    To take sqldelight’s example, you should not SELECT * FROM hockey_player; if you need that, slurp a JSON file instead! When populating a list view, you probably want SELECT name, id FROM hockey_player ORDER BY position. For a name picker you want SELECT DISTINCT name FROM hockey_player UNION hockey_officials. And so on.

  3. Migrations are a reality when dealing with data storage. sqldelight doesn’t seem to address this at all.

  4. Syncability (and backup, and export, and…) are also a reality. A sync system typically has a very different viewpoint on data storage than the frontend — not only does that mean you have a set of fields that only part of the application cares about (which screws up your ORM), it also often means that two parts of the system have utterly different conceptions of seemingly straightforward actions like “delete this thing”. ORMs are (almost by definition) one size fits none.

  5. Getting SQL-based storage — hell, getting any kind of storage — right is hard. Concurrency, performance, memory usage, and correctness all involve careful attention. Take a read of the Sqlite.jsm docs or some of Firefox for iOS’s database prep code if you want a hint of this. Libraries that generate data access code can slip past this attention, and that’s a bad thing.