I think that for Notion (neat company btw) it might make a lot of sense to keep the schema pretty loose but I have managed to keep an agile DB alive and well with a lot of flexibility. Having JSON catch-all columns means we can evolve our API data types in parallel with that slow SQL schema rollout, instead of needing a strict order of “migrate first, then start writing” for any new bit of data we want to render & cache on the client. If I commit a new migration today, it’ll start rolling out tomorrow, but it will take weeks to saturate the install base of our app, and we can never be 100% sure all existing clients have upgraded. In Notion’s SQLite use-case, we have many millions of instances of this schema running on hardware owned by our users, and those users ultimately control when our app updates & runs migrations on those devices. In that kind of setting you can easily manage any kind of migration, and you know when a migration is fully rolled out and you can drop backwards compatibility. It’s easier to evolve a schema to be strictly normalized when you have only a few (<1000) SQL databases running in an environment you control, especially if you also control the deployment of code that talks to those databases. Relational databases are both powerful and quite flexible - so long as you don't enshrine the schema to an extent where it feels immutable. In most other cases we'll just write quick migrations to expand our data definition as needed to fit whatever we need to put into it. At my company we do make use of postgres JSON types for some limited fields related to logging and auditing where live queries never need to be executed - where the data is essentially serving as WOM - we've also got once instance where our DB is essentially a pass-thru between two services that don't talk to each other directly, for this usage JSON works great since those two services just coordinate on the expected format. With some trivial investment in migration management infrastructure you can arrive at a point where changing the DB (even in drastic ways) is as easy as committing a new file to the code base. With the advent of schemaless I think there's been far too strong of a reactionary movement against schema adjustment. In the example given in the article foreknowledge of the property does exist - there's a specific DDL statement to run before the column-y thing is usable. Having the catch-all JSON column also means we can add an object property in our backend for web clients, and then later when we roll the feature out to native apps the data is already there - it just needs a migration to make it efficient to query. When we introduce a new query pattern to the client, we can run a number of different migration strategies to extract existing data from the JSON column and put it into a newly created column, or use a virtual column/index. So, we put un-queried object properties into catch-all JSON columns. Over time, those needs change - as do the shape of the upstream API data source. There are some object properties the clients need to query, and other object properties the clients don’t care to query, but are needed to render the data in the UI, so it still needs to be stored locally. Most of the rows come from the back-end API and serve as a local cache. Why spend the engineering cycles in the SQL client?įor a real world example - at Notion we have a SQLite database running in our client apps. In essence, you are getting CQRS (command/query representation separation) for “free”, built into the storage system. The client code can be a “dumb pipe” on the write side and just dump whatever received data into the SQL database. This seems like a pure win to me for clients that need to cache data but never update it. Using virtual columns like this encodes the “normalizing” lens into the SQL schema in a purely-functional way. You would have three things to keep track of - the upstream data source (the thing supplying the JSON, might not be under your control), the lenses in your SQL client source code, and the SQL database schema. Adding bidirectional lenses to “normalize” and “denormalize” data before it enters and after it exits a SQL database means you need to maintain those lenses separately from the SQL database schema.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |