A History of Settings File Format, and dotset

The landscape for configuration formats is quite poor.

There is a lot of choice, but each pick has substantial flaws. There is no fairer portrait than that which Wikipedia paints:

Let me go through them one by one.

            | comments | maps in lists | simplicity |
.properties |    x     |               |     x      |
INI         |    x     |               |     x      |
Plist       |          |       x       |     x      |
XML         |    x     |       x       |            |
JSON        |          |       x       |     x      |
TOML        |    x     |               |     x      |
DotSet      |    x     |       x       |     x      |

Comments, maps in lists, simplicity: pick two.

The INI file format is a simple way to store some information. Did I mention it is simple? It is very simple. You have a dictionary map at the top level, and in there, another map. Each keys of that second map have string values.

Are the keys case-sensitive? Maybe. Does it have comments? Maybe, either hashed # or Lisp-like ;. What is the delimiter between keys and values? I don’t know, it’s usually =, but sometimes :. See the problem? It is non-standard. Even if it was, you can’t do much with it. Limited data structures are limited.

DotProperties are even simpler than INI files, if you can believe it. (I cannot.) You are left with a single map from string keys to string values, and you have comments. On the plus side, it is pretty standard…

Plists used to be a beautiful, lightweight format: simple, easy to read, it has all the data structures of JSON, or close enough. It even has easy binary data as hexadecimal sequences, and dates! Truly, if Apple had worked to make it ubiquitous, JSON would never have been needed. It has all of its power, and all of its irritating things.

Yet, plist would have lost against INI. Why? Comments. Comments actually serve as documentation. A textual file is not just great because you can use your preferred text editor, it is great because you can read it. Comments are not just for programming languages: they have their place inside data, too. Plist simply forgot to add comments.

Anyway, Apple killed the Plist format, replacing it with XML. The data structures are the same, but now the parsers need to be extra smart. The addition of boilerplate made the files so big that Apple even added an equivalent binary format. Too bad, the XML form is here to stay.

On to XML, then. Why is it bad?

So many people have eloquently discussed it. It boils down to this:

  1. Starting on an XML-based project is usually a nightmare,
  2. All by itself, XML is actually more limited than our old friend plist. It’s just text, you have to build your own format on top of it, which in turn adds complexity and cruft and makes #1 worse!
  3. Namespaces are unintuitive, hard to use, and make #1 and #2 even worse…

To support its complexity, XML has developed satellite sublanguages of its own to configure it: XPATH, SAX, XSLT, XSD, DTD… Can XML data crash a browser or worse? Of course it can. Is it Turing-complete? It can be…

In the end, it all boils down to one thing: it is too complex. It is hard to read, hard to parse, and because of draconian error handling, you’d better not have made a mistake. But remember, spotting a mistake is hard, better use a validator. Or an external GUI that will make the source ugly. A wise man once said: “it combines the space-efficiency of text-based formats, and the readability of binary.”

In a world dominated by XML, JSON was a gasp of fresh air. Its syntax is minimal. It has all the data structures you look for. Writing a parser is a matter of a couple of hours; not that you need to. Reading it is easy too. Just like asm.js, it fitted perfectly as a natural extension to the Web platform, because it was already there.

JSON gains ground. It has both a lot of acceptance and good press.

Yet, everything has bad parts, and JSON is no exception:

Aware of those small annoyances, I had privately started to work on my own replacement.

Then, one drunk night, a week ago, Tom Preston-Werner, fed up as I was about the status quo, decided to take no more, and created TOML.

TOML is meant to be solidly standardized, unlike INI. However, it is meant to be just as limited. It does have JSON arrays, and it thought to authorize trailing commas (which I like to believe I had something to do with).

I wrote about TOML’s shortcomings there, let me rewrite those here:

In short, TOML isn’t as good as the OpenStep plist format, but it has its niceties. And no, it didn’t invent first-class Dates.

It felt unfortunate to see a new format fail to be greater than its historic predecessors. They say history is repeated as farce first, then as tragedy; the plist switch to XML was the farce, switching from JSON to TOML would be tragedy.

As a response, I published DotSet, the Settings File Format I pioneered.

The data structures are so close to JSON that they are interchangeable. I had fun with that. Really.

However, remember JSON’s shortcomings that I mentioned above? I erased them. I ended up with a language as pleasant to read as Python, as easy to write as plain text, and very simple. Simpler than TOML, with more power. I made a node parser / stringifier for it. It was easy. And fun.

And now, the crystal ball.

It seems to me like there are two future directions for configuration:

Funny thought: the Lua programming language was initially meant to be a configuration file format. Ha ha ha.

Ok, not that funny.

All this to end there: TOML doesn’t fit in the picture. It will one day, but then, it will have the complexity of YAML and awkwardly combined features. You will need whiskey to ignore that.