February 18, 2011 - Justin's Linklog

It’s pretty common for apps to require “configuration” — external files which can contain settings to customise their behaviour. Ideally, apps shouldn’t require configuration, and this is always a good aim. But in some situations, it’s unavoidable.

In the abstract, it may seem attractive to use a fully-fledged programming language as the language to express configuration in. However, I think this is not a good idea. Here are some reasons why configuration files should not be expressed in a programming language (and yes, I include “Ruby without parentheses” in that bucket):

Provability

If a configuration language is Turing-incomplete, configuration files written in it can be validated “offline”, ie. without executing the program it configures. All programming languages are, by definition, Turing-complete, meaning that the program must be executed in full before its configuration can be considered valid.

Offline validation is a useful feature for operational usability, as we’ve found with “spamassassin –lint”.

Security

Some configuration settings may be insecure in certain circumstances; for example, in SpamAssassin, we allow certain classes of settings like whitelist/blacklists to be set in a users ~/.spamassassin/user_prefs file, while disallowing rule definitions (which can cause poor performance if poorly written).

If your configuration file is simply an evaluated chunk of code, it becomes more difficult to protect against an attacker introspecting the interpreter and overriding the security limitations. It’s not impossible, since you can, for instance, use a sandboxed interpreter, but this is typically not particularly easy to implement.

Usability

Here’s a rather hairy configuration file I’ve concocted.

    #! /usr/bin/somelanguage
    !$ app.status load html
    !c = []
    ;c['sources'] = < >
    ;c['sources'].append(
        NewConfigurationThingy("foo_bar",
            baz="flargle"))
    ;c['builders'] = < >
    ;c['bots'] = < >
    !$ app.steps load source, shell
    ;bf_mc_generic = factory.SomethingFactory( <
        woo(source.SVN, svnurl="http://example.com/foo/bar"),
        woo(shell.Configure, command="/bar/baz start"),
        woo(shell.Test, command="/bar/baz test"),
        woo(shell.Configure, command="/bar/baz stop")
        > );
    ;b1 = < "name": "mc-fast", "slavename": "mc-fast",
                 "builddir": "mc-fast", "factory": ;bf_mc_generic >
    ;c['builders'].append(;b1)
    ;SomethingOrOther = ;c

This isn’t actually entirely concocted from thin air — it’s actually bits of our BuildBot configuration file, from before we switched to using Hudson. I’ve replaced the familiar Python syntax with deliberately-unfamiliar made-up syntax, to emulate the user experience I had attempting to configure BuildBot with no pre-existing Python knowledge. ;)

Compare with this re-stating of the same configuration data in a simplified, “configuration-oriented” imaginary DSL:

add_source NewConfigurationThingy foo_bar baz=flargle

buildfactory bf_mc_generic source.SVN http://example.com/foo/bar
buildfactory bf_mc_generic shell.Configure /bar/baz start
buildfactory bf_mc_generic shell.Test /bar/baz test
buildfactory bf_mc_generic shell.Configure /bar/baz stop

add_builder name=mc-fast slavename=mc-fast
     builddir=mc-fast factory=bf_mc_generic

Essentially, I’ve extracted the useful configuration data from the hairy example, discarded the symbology used to indicate types, function calls, data structure construction, and let the configuration domain knowledge imply what’s necessary. Not only is this easier to comprehend for the casual reader, it also reduces the risk of syntax errors, by simply minimising the number of syntactical components.

See Also

The Wikipedia page on DSLs is quite good on the topic, with a succinct list of pros and cons.

This StackOverflow thread has some good comments — I particularly like this point:

When you need your application to be very “configurable” in ways that you cannot imagine today, then what you really need is a plugins system. You need to develop your application in a way that someone else can code a new plugin and hook it into your application in the future.

+1.

This seems to be a controversial topic — as you can see, that page has people on both sides of the issue. Maybe it fundamentally comes down to a matter of taste. Anyway — my $.02.

Update: discussions elsewhere: HackerNews

Another Update, 2012-04-06: Robey Pointer wrote a post called Why Config?, in which he describes a Scala-based configuration language in use at Twitter, which uses Scala’s runtime code evaluation, and a Scala trait, to express configuration succinctly in a Scala source file and load it at runtime. The downside? It’s a Scala source file, executed at runtime, containing configuration. :(

However, this comment in the comments section is worth a read:

At Netli (now part of Akamai) we had a configuration framework very similar in spirit and appearance to Configgy. It was in early 2000-s, we open sourced it since. (http://ncnf.sourceforge.net/). It would provide on-the-fly reload for the C-based programs (the ncnf if a C library). It also had some perks like attribute inheritance and a concept of block references. Most importantly though, it contained a separate schema language and a validator to allow configuration be checked before pushing in production. At Netli we used it to configure 1200 services on over 400 hardware boxes, the configuration becoming about 20+mb in length (assembled from several pieces by the CPP, then M4 templating library).

Naturally, it wasn’t Netli’s first attempt at doing configuration. One of the first attempts failed since it was Turing-complete. That approach was to specify the configuration as a Perl data specification. In a very short time the lure of unused expressiveness of such Turing-complete environment prevailed and people started to write for-loops around data pieces and doing other tricks to remove redundancy from the configuration. It turned out to be a disaster in the end, with configuration becoming unmaintainable and flaky.

One principle I got out out of that exercise is that configuration shall not be Turing-complete. We’ve got burned specifically by that property far too many times. Yet I do agree with you that a validation facility is a must-have, which is something not usually part of the simple text-based frameworks. C-based NCNF had it almost from the very beginning though, and it proved to be a very useful harness.

+1. There’s lots more info on that system at this post at lionet.livejournal.com.

Another Update, 2017-05-09: casio_juarez on Twitter:

Dev: I'll use a declarative language for config this time.
6 months later: Let's add variables.
12 mos: And conditionals.
18 mos: Fuck.
— 0x0DEADA55 (@casio_juarez) May 8, 2017

Also related: The Configuration Complexity Clock.

(Image credit: Turn The Dial by VERY URGENT Photography)

Archives

Against The Use Of Programming Languages in Configuration Files