Against The Use Of Programming Languages in Configuration Files

It’s pretty common for apps to require “configuration” — external files which can contain settings to customise their behaviour. Ideally, apps shouldn’t require configuration, and this is always a good aim. But in some situations, it’s unavoidable.

In the abstract, it may seem attractive to use a fully-fledged programming language as the language to express configuration in. However, I think this is not a good idea. Here are some reasons why configuration files should not be expressed in a programming language (and yes, I include “Ruby without parentheses” in that bucket):

Provability

If a configuration language is Turing-incomplete, configuration files written in it can be validated “offline”, ie. without executing the program it configures. All programming languages are, by definition, Turing-complete, meaning that the program must be executed in full before its configuration can be considered valid.

Offline validation is a useful feature for operational usability, as we’ve found with “spamassassin –lint”.

Security

Some configuration settings may be insecure in certain circumstances; for example, in SpamAssassin, we allow certain classes of settings like whitelist/blacklists to be set in a users ~/.spamassassin/user_prefs file, while disallowing rule definitions (which can cause poor performance if poorly written).

If your configuration file is simply an evaluated chunk of code, it becomes more difficult to protect against an attacker introspecting the interpreter and overriding the security limitations. It’s not impossible, since you can, for instance, use a sandboxed interpreter, but this is typically not particularly easy to implement.

Usability

Here’s a rather hairy configuration file I’ve concocted.

    #! /usr/bin/somelanguage
    !$ app.status load html
    !c = []
    ;c['sources'] = < >
    ;c['sources'].append(
        NewConfigurationThingy("foo_bar",
            baz="flargle"))
    ;c['builders'] = < >
    ;c['bots'] = < >
    !$ app.steps load source, shell
    ;bf_mc_generic = factory.SomethingFactory( <
        woo(source.SVN, svnurl="http://example.com/foo/bar"),
        woo(shell.Configure, command="/bar/baz start"),
        woo(shell.Test, command="/bar/baz test"),
        woo(shell.Configure, command="/bar/baz stop")
        > );
    ;b1 = < "name": "mc-fast", "slavename": "mc-fast",
                 "builddir": "mc-fast", "factory": ;bf_mc_generic >
    ;c['builders'].append(;b1)
    ;SomethingOrOther = ;c

This isn’t actually entirely concocted from thin air — it’s actually bits of our BuildBot configuration file, from before we switched to using Hudson. I’ve replaced the familiar Python syntax with deliberately-unfamiliar made-up syntax, to emulate the user experience I had attempting to configure BuildBot with no pre-existing Python knowledge. ;)

Compare with this re-stating of the same configuration data in a simplified, “configuration-oriented” imaginary DSL:

add_source NewConfigurationThingy foo_bar baz=flargle

buildfactory bf_mc_generic source.SVN http://example.com/foo/bar
buildfactory bf_mc_generic shell.Configure /bar/baz start
buildfactory bf_mc_generic shell.Test /bar/baz test
buildfactory bf_mc_generic shell.Configure /bar/baz stop

add_builder name=mc-fast slavename=mc-fast
     builddir=mc-fast factory=bf_mc_generic

Essentially, I’ve extracted the useful configuration data from the hairy example, discarded the symbology used to indicate types, function calls, data structure construction, and let the configuration domain knowledge imply what’s necessary. Not only is this easier to comprehend for the casual reader, it also reduces the risk of syntax errors, by simply minimising the number of syntactical components.

See Also

The Wikipedia page on DSLs is quite good on the topic, with a succinct list of pros and cons.

This StackOverflow thread has some good comments — I particularly like this point:

When you need your application to be very “configurable” in ways that you cannot imagine today, then what you really need is a plugins system. You need to develop your application in a way that someone else can code a new plugin and hook it into your application in the future.

+1.

This seems to be a controversial topic — as you can see, that page has people on both sides of the issue. Maybe it fundamentally comes down to a matter of taste. Anyway — my $.02.

Update: discussions elsewhere: HackerNews

Another Update, 2012-04-06: Robey Pointer wrote a post called Why Config?, in which he describes a Scala-based configuration language in use at Twitter, which uses Scala’s runtime code evaluation, and a Scala trait, to express configuration succinctly in a Scala source file and load it at runtime. The downside? It’s a Scala source file, executed at runtime, containing configuration. :(

However, this comment in the comments section is worth a read:

At Netli (now part of Akamai) we had a configuration framework very similar in spirit and appearance to Configgy. It was in early 2000-s, we open sourced it since. (http://ncnf.sourceforge.net/). It would provide on-the-fly reload for the C-based programs (the ncnf if a C library). It also had some perks like attribute inheritance and a concept of block references. Most importantly though, it contained a separate schema language and a validator to allow configuration be checked before pushing in production. At Netli we used it to configure 1200 services on over 400 hardware boxes, the configuration becoming about 20+mb in length (assembled from several pieces by the CPP, then M4 templating library).

Naturally, it wasn’t Netli’s first attempt at doing configuration. One of the first attempts failed since it was Turing-complete. That approach was to specify the configuration as a Perl data specification. In a very short time the lure of unused expressiveness of such Turing-complete environment prevailed and people started to write for-loops around data pieces and doing other tricks to remove redundancy from the configuration. It turned out to be a disaster in the end, with configuration becoming unmaintainable and flaky.

One principle I got out out of that exercise is that configuration shall not be Turing-complete. We’ve got burned specifically by that property far too many times. Yet I do agree with you that a validation facility is a must-have, which is something not usually part of the simple text-based frameworks. C-based NCNF had it almost from the very beginning though, and it proved to be a very useful harness.

+1. There’s lots more info on that system at this post at lionet.livejournal.com.

(Image credit: Turn The Dial by VERY URGENT Photography)

This entry was posted in Uncategorized. Bookmark the permalink. Both comments and trackbacks are currently closed.

15 Comments

  1. Posted February 18, 2011 at 04:35 | Permalink

    My argumentative agreement with this post is at http://fanf.livejournal.com/112006.html

  2. ben
    Posted February 18, 2011 at 11:03 | Permalink

    Well, why not just establish the provability of the configuration file? That should be easy. In ruby all you do is:

    gem install halting_oracle

    halting_oracle conf.rb

    It works about half the time.

  3. Keith Brady
    Posted February 18, 2011 at 13:11 | Permalink

    Thing is that you might have some pretty complicated decisions to make before you come up with the final config. For example, if you have multiple instances of the binary executing separate parts of a problem and you need to implement a sharding strategy then something has to calculate the work assignments.

    Obviously this can be in the binaries themselves but then you need to ship new code each time you want to change the strategy which might not fit well with responsibility splits in the team. It can also be inflexible when the sharding interacts with the job control environment.

    You can write a script to generate the configs but, arguably, this is just moving the problem to another location (though at least you have an intermediate file you can validate).

    I think the best option is a domain-specific version of a well known language (e.g. constrained python syntax with some well-known classes provided).

  4. steve
    Posted February 18, 2011 at 13:32 | Permalink

    Well, it looks all nice and pretty, but in the real world people use truck-loads of XML to do configuration. The reason why programming languages are better than some made up config files for configuration: – Shorter code – IDE helps you

  5. Posted February 18, 2011 at 14:52 | Permalink

    Tony — I think we’re vociferously agreeing. I need to read your post some more (and possibly improve this one in response ;).

    Keith — I’d argue that once that complex logic comes into play, that’s when plugins become useful. Extract the complex branching/looping/conditional logic into chunks of “real” code, and load and configure those in turn from the configuration files. Keep the config files simple, and the complex logic in plugins.

    Steve — re ‘truck-loads of XML to do configuration’. I’m aware that parts of the Java world (by no means all of it, cf. Guice) are married to some rather nasty XML-as-configuration mostrosities, but don’t conflate “crappy XML configuration format” with “all configuration formats”. Other parts of the real world have much saner formats.

  6. witek
    Posted February 18, 2011 at 17:20 | Permalink

    In Erlang one can easly parse files with erlang syntax in them. One can only use there a literal constructs like integers, floats, tuples, strings, binaries, lists, and compile-time constants, like 2+5. It can be extended as you wish, for example to allow some functions, or variables if you like. They can be limited/extend to any subset you wish, by simple proxy function. But this subset if sufficient to express lots of structural, non-structural, hierarchical or just flat configuration files. Erlang syntax is also very clear, simple, and readable. It is similar, to JSON, slightly simpler (because there are no “objects”, no key/values maps, at least built-in), and thus key/value maps, are expressed in slightly different way most of the time.

    This solves problem of validating and terminating, as this subset of erlang is not really a programing language (it is not turing-complete), but same syntax as language, and embeded parser makes this so simple, and consistent with other erlang software. you can use it immieditly without writing any parser (but you can and there are modules for this in erlang official distribution). It is much better than keeping configuration in XML, or writing yet another unknown DSL. And you can prove and check easly a config file is valid, quickly and in finite time. This is a reason most of erlang software uses erlang configuration files with greate success.

  7. Posted February 18, 2011 at 17:56 | Permalink

    witek — cool!

  8. Jonathan Rochkind
    Posted February 18, 2011 at 18:28 | Permalink

    In the ruby approach to “DSLs”, a DSL is in fact just ruby code (with certain methods defined in certain ways to give you convenient concise syntax for the domain). So you can write your simple DSL, but if you need to do something not provided for in the DSL (say, provide a value that’s the result of a computation), you can always write any other ruby you want too. This gets around the problem of DSLs ending up being too limiting for what you really want to do, and is also a lot easier for the host to implement than providing a whole new parser for a made-up language.

    Is this the best of both worlds with regard to: “Not only is this easier to comprehend for the casual reader, it also reduces the risk of syntax errors, by simply minimising the number of syntactical components.”? (The ruby DSL approach also reduces the risks of an error in your DSL itself, compared to having to write a parser for a seperate toy language).

    Of course, you write this sentence in a section titled “Security” even though that sentence is not exactly about security. And providing configuration only in a toy language DSL does let you try to ensure only certain things are possible in the configuration. But in what contexts does this kind of security matter for configuration? Perhaps If the app is provided by some third party hosted, and the user will be uploading configuration to the third party app, then the third party might want to make sure that what the configuration can do is limited. But for an ordinary app where the same person running it is the person configuring it, is it necessary to provide a security model for configuration? Anything wrong with just giving the user enough rope to hang themselves?

  9. Michael
    Posted February 19, 2011 at 11:57 | Permalink

    Its just a matter of the programming language in use for configuration files. If you have something like Tcl at hand with built in sandboxes (safe interpreters), ultra simply syntax AND the ability to remove all and any commands you like from the sandbox to make limit your configuration files access to a turing complete language it becomes quite viable to use a programming language for configuration files. For some examples see: http://www.kocjan.org/tclmentor/55-tcl-slave-interpreters-as-sandbox.html

    http://wiki.tcl.tk/8587

    I agree that its a really bad idea if you have a limited language like Python which cannot be properly secured without a major headache, been there, done that.

  10. Posted February 22, 2011 at 00:24 | Permalink

    Michael — interesting to note that Tcl allows that. thanks for the comment…

    Lua was reportedly created as a configuration-oriented language, and has been recommended elsewhere as a good fit for that use case, although I can’t see any easy way to simplify and render it Turing-incomplete so it can be validated — so I’m not sure it helps on that count.

  11. Posted February 22, 2011 at 13:16 | Permalink

    and of course Tony raised Tcl’s suitability, right in the first comment. sorry Tony!

    Offline, someone else mentioned a key guideline which is orthogonal: configuration’s effects on code should be minimal. ‘almost all code should not have an explicit concept of “config”, even if it is in some sense “configurable”. [...] You just pass parameters to stuff, or maybe call a function to find out some value that will affect how you do somehthing. I.e., you don’t need a special “concept” of configuration.

    E.g.,

    HttpServer(hostname, port)

    not:

    HttpServer(ConfigFile(“/snord/httpd.conf”))

    is how most code should be written. Push the notion of config way out to the top level.

    The “configuration” is really just a user interface (in fact possibly multiple) that you present, somehow, to someone, at some point. E.g., if it makes sense for the “interface” to be a text file, you’d go with some text-based format. But most code should not care and not be tied to any particular configuration library, or format etc.’

    This is very similar to how Guice injects properties-based configuration. It’s a good plan — although I’m wondering how to deal with configuration that then changes during the object’s lifecycle…. hmm.

  12. Posted February 23, 2011 at 17:18 | Permalink

    But eval() is so handy in python. For example to read this config file http://www.pixelbeat.org/programs/Tira-2/toppy.tira2 all I do is: config = eval(open(config_filename).read())

    Aha, I’ve just noticed a new safe equivalent: ast.literal_eval() http://docs.python.org/dev/whatsnew/3.2.html#ast

  13. Posted February 27, 2011 at 07:19 | Permalink

    I’m not sure I buy your last argument. You seem to be saying that, faced with learning a full-blown language you’ve never seen before or a DSL you’ve never seen before, the latter is easier. I think I disagree with that conclusion a bit (since the full-blown language is more likely to have gone through reasonable language design processes, and so you won’t be banging your head against a quickly-written parser trying to figure out how to escape a quotation mark followed by a space, or whatever).

    Bbut more relevantly, it’s pretty likely you are in fact familiar with a common language like Python or Ruby or shell, and so there’s no learning curve there. And if you’re not, the chances that good syntax documentation exists and actually matches the parser are much higher for a real language.

  14. Erik E
    Posted February 28, 2011 at 13:58 | Permalink

    I have come to a lot of the same conclusions from suddenly having the need to get certain kind of information out of a scons build script. Scons buildscripts are written in python. There is no easy way to do that. One has to be able to treat the configuration script as data and not as code, to be able to write a program that can analyze the script easily and extract information from it.

    Thus my conclusion is not that we should avoid programming languages for configuration files altogether but that they should be declarative and easy to parse. A subset of LISP, JavaScript, Lua etc without control flow constructs would probably work well. In particular I like LISP since you can easily use a LISP interpreted to analyze LISP code like data, and extract any information you like in a configuration file written in LISP (given no control flow constructs are used).

    We already see that happening. E.g. JSON is an example of a subset of JavaScript suitable for configuration files. Qt Quick uses this to good effect.

    However that does not mean I favor XML. From being a big XML fan I have recently become strongly against it. I still think it is a good format for describing text documents with some sort of structure. However for configuration files I think it is bloated, has poor readability and is overkill. S-expressions like LISP or JSON would IMHO always be better at this. I see the XML fans go on and on about how it is standardized and shouldn’t invent yet another format bla bla. I think they are kidding themselves. None of the important things for a configuration files are standardized. Both s-expression and JSON e.g. have a standard for expressing strings, numbers and lists. XML doesn’t.

  15. Posted March 9, 2011 at 01:38 | Permalink

    I think this is one of those issues that requires a lot more context than a simple “foo is bad” (where, in this case, foo = “code for configs”).

    My job is being a System Administrator/Engineer. I write a decent amount of code, but the majority of it is for internal tools and applications, nearly all of which are 500 lines of code or less, and the target audience is other sysadmins, network guys, and NOC people.

    Lately, I’ve been moving towards using code for config files more often. The ability to throw into a perl script:

        require "foo_script.conf";
    

    and then put the frequently changing run-specific values that need to be stored into foo_script.conf is really useful. Your contrived example may be really hairy, but mine tend to look more like:

        # Comment about $bar
        $bar = "abcdefg";
        # Comment about $baz
        $baz = "12345";
        # Comment about %quux
        %quux = (
            abc    => '123',
            def    => '456',
            xyz    => '000',
        );
    

    Which sure looks pretty readable and manageable to me. My coworkers seem to really like it, too, based on the feedback I’ve gotten. It reduces the number of command line options they have to supply (especially for cases where there’s a lot of data needed to run the tool), and it’s significantly easier than editing the script itself.

    The fact that you can use the full power of a programming language in a code-based config file doesn’t mean that you need to do so.

    Limiting yourself to a declarative subset of your given programming language is not much different from creating a configuration DSL based on the data declaration aspects of your programming language. Well, except now you don’t have to write a bunch of code to parse your config (or load an external module to parse the config).