Category Archives: Uncategorized

Trivial SQL ORMs considered harmful

Our team has a little “things I learned this week” tradition in our team meetings, and it just blossomed onto our mailing list (async is better!).

In one such post, Michael pointed to sqldelight, a library to automatically generate Android SQL-handling code for a typed schema and a set of queries.

I wrote a little screed advising caution, which Margaret suggested would make a good blog post… so here it is, unedited.

Note that I have nothing against automated schema and query checking, nor against saving error-prone typing; my primary objection here is to the object mapping.

Michael notes:

It’s a square library that allows you to define your tables & queries in a separate text file and it will auto-generate table creation and methods of querying. To do so, it creates Objects which represent the row of your DB.

and I reply:

At the risk of being a negative nelly: broadly speaking I find this kind of trivial ORM to be a terrible design anti-pattern, and I strongly discourage its use for anything but saving some typing before committing a v0. We implemented something like this on the iOS side of the house, and it was a huge pain in the ass to get rid of later.

If your system is simple enough that you’re putting whole objects in and getting whole objects out — that is, a simple ORM is a good fit — you should instead be not using SQLite.  Serialize your objects to a flat file in JSON and keep them in memory. Up to about 100KB of data, it’s better in almost every way. (There are some exceptions, but they’re exceptions.)

For everyone else, your inputs and outputs will differ, or you’ll need more control, and so you should run screaming from sqldelight.

There are at least five reasons why I feel this way. I’ll stop at five to avoid writing an epic.

  1. Database tables really come into their own when you join them: bookmarks against favicons, hockey players against teams and games. If you join them (particularly with left/outer/etc. joins), your ORM needs to bulk up the generated model objects with optional fields; it has to, otherwise it can’t represent the result of the join.

    Those optional fields leak throughout your app — hey, is that favicon ID supposed to be set here? Does it need to be set to -1 sometimes? — and make your life unpleasant.

  2. SELECT * is an anti-pattern in database work. You might not need all of the fields, but requesting them all limits the indices that the storage layer can use. A smart storage engine can use compound indices to make some queries with limited projections very fast indeed. Or perhaps you want to get unique values.

    To take sqldelight’s example, you should not SELECT * FROM hockey_player; if you need that, slurp a JSON file instead! When populating a list view, you probably want SELECT name, id FROM hockey_player ORDER BY position. For a name picker you want SELECT DISTINCT name FROM hockey_player UNION hockey_officials. And so on.

  3. Migrations are a reality when dealing with data storage. sqldelight doesn’t seem to address this at all.

  4. Syncability (and backup, and export, and…) are also a reality. A sync system typically has a very different viewpoint on data storage than the frontend — not only does that mean you have a set of fields that only part of the application cares about (which screws up your ORM), it also often means that two parts of the system have utterly different conceptions of seemingly straightforward actions like “delete this thing”. ORMs are (almost by definition) one size fits none.

  5. Getting SQL-based storage — hell, getting any kind of storage — right is hard. Concurrency, performance, memory usage, and correctness all involve careful attention. Take a read of the Sqlite.jsm docs or some of Firefox for iOS’s database prep code if you want a hint of this. Libraries that generate data access code can slip past this attention, and that’s a bad thing.

Idle musing

An autonomous car has been built, and has driven 140,000 miles without accident.

Apparently mass production is intended within ten years.

To me — at least, to me wearing my software engineering hat — I see parallels with projects that use unrealistic test data, then blow up catastrophically in the real world.

We developers often use inadequate test harnesses, data sets that are too small, and clean test inputs. We get a false sense of confidence in our code, and this is only deflated by thorough exercise during betas. At a higher level, software companies seem to have a tendency to target our own insular demographics: we build startups that target 25–35 year olds living in the Bay Area, for example, and wonder why they fail to get broad appeal.

The equivalent for the autonomous car?

167545_186408644703713_1000000

I doubt very much that Google’s autonomous car has been tested in a Northwest blizzard, dragging its thin all-weather tires through 8 inches of snow. In the Bay Area, that’s not a problem: you might have to scrape a windshield free of ice one day each year. Outside the Bay Area, one has to cope with snowplows, whiteouts, drifts, spray, cars sliding across the road in front of you, and frequent loss of traction… then sucking mud, laddered gravel roads, and dust storms (with the consequent problems with both visibility and sensor clogging).

The parking camera on my truck, which is at chest height on a tall man, usually ends up completely iced over and useless by the end of a journey. The front of my truck looked like the inside of an old freezer after today’s 3-hour drive in a blizzard. How will the sensor array on a small autonomous car fare?

Color me skeptical.