Quantcast
Viewing all articles
Browse latest Browse all 49206

A new system for managing configuration

This is a proposed game plan for managing configuration information in Drupal 8. This is focused on config data - the kind of thing that is currently being managed by the variables system as well as more complicated items like Blocks or Views.

Thanks to the following for their input and inspiration:

  • Sam Boyer
  • Nathaniel Catchpole
  • Jeff Eaton
  • Larry Garfield
  • Earl Miles
  • Pierre Rineau
  • Bojhan Somers
  • David Strauss

What This Buys Us

  • Configuration becomes fully revisionable and manageable through code using existing deployment tools. We will also now have the opportunity to do ui-based undo of configuration, including multiple level undo.
  • We can access variable configuration information very very early at bootstrap without having to boot the database layer, offering many gains in performance and flexibility.
  • Increased flexibility for organization of data, variables table goes away. The myriad of one-off SQL stores for configuration also go away, as the core configuration system can handle them directly.
  • By offering a standard interface, we get a lot more flexibility to do things like locking everything on one server (Zend has a readOnly flag that can be set dynamically) preventing changes on one server while enabling them on another.

What Is It

  • Configuration information should be stored on disk, in an easily manageable format like JSON (leave aside for the moment the question of what is and isn't configuration.) JSON has the advantage of being easily manageable in PHP in a reasonably performant way, as well as being non-executable which is advantageous from a security perspective. Drupal will interact with this JSON, reading and writing directly to and from it. The DB will not enter into the picture, except for caching and history, and even this will be pluggable to allow other options.
  • An interface will be provided for this data which will take its inspiration from Zend Config (for more information see my previous writeup). Basically, information will be stored hierarchically in a tree of configuration objects accessible through magic get/set functions for easy management (if this doesn't kill us from a performance perspective, see http://www.garfieldtech.com/blog/benchmarking-magic). These objects will be iterable and countable. We will provide a basic implementation of this interface for storing standard scalar data just as you would find in the variables table. In terms of organization, my idea is that the config information is stored in a tree, and each top branch is a module, plus a special top branch for core (possibly.) So you could have stuff like

    $config->drupal->site_information->site_name
    $config->drupal->regional->default_country
    $config->my_module->foo->bar

    Under each branch, it would be up to the module to store things as they see fit. For more complicated stuff, we can do new implementations of the config interface (or extend the basic one). So we can have a special config object for blocks, and Views can do their own, and Panels, etc. This allows us the benefit of having a standard interface for accessing and saving this data, while still allowing modules with more complicated requirements to manage those requirements as they need.

    We should also offer the optional to organize this data into Features-style 'functional' structures. For instance

    $config->photo_gallery->view->gallery
    $config->photo_gallery->pathauto->gallery_path

    and the like. The functional settings could be set to save to a separate directory, and we could define an order of overrides, probably with "local" winning out in all cases where there's a collision. Later, we can add sophisticated analysis tools to display overrides, much like the CSS tools in Chromium and Firebug.

  • When changes are saved back (through changes in the admin interface or other method) old revisions are stored to a configurable location (the database, a special directory, etc.) These may be a full copy of the previous object, or just a diff. This will allow us the ability to implement rollback and undo. Additionally, since this is all stored on disk, it can be versioned through whatever system you are using for code versioning. This should be more of a CRAP implementation than a CRUD one. We can offer various rules for purging as need be. If possible we should store a pretty significant number of revisions before purging off the old ones. Imagine Time Machine-like functionality where you can step through old revisions of your site configuration, reverting to the one from 8 days ago.)
  • My current idea is that each 'branch', stored in a config directory which is server-writeable but protected from arbitrary reading via .htaccess rules. So for instance

    <code>
    $config->drupal->site_information->site_name lives in /config/drupal.site_information.json
    $config->drupal->regional->default_country lives in /config/drupal.regional.json
    $config->my_module->foo->bar lives in /config/my_module.foo.json

    This allows us to lazy load bits of config data without ever having to expand the entirety of Drupal configuration into memory. The performance implications around reading/writing/caching this are not really my specialty, and I am open to suggestions.

    Because it is server-writeable, we will probably have to start taking a generally less trustworthy attitude towards this data.

  • The interface should also support reasonably advanced merge operations, offering the opportunity to merge together a partial config with a full one, replacing only those items that exist in both. For instance say you have a client with a large number of subsites (think departments at a university.) For the most part, these are all the same, but each one has little tweaks. Say that one department wants their slideshow to have 10 images instead of 5. You override the View, change the pager, and there is a way to write out only the changed items, essnetially a diff. When the View is shown, you load the master view, then merge in the diffs. Huge gain: if tomorrow the university decides it wants all the captions in the slideshow to be bolded, you can make this change in the master view and it will still propagate down to the overriden ones (as long as that particular property is not actually changed.) This kind of system could also be used to manage server-specific information like database settings, google maps api keys, etc. Exact implementation of this TBD.

    Modules will also be able to ship with a default set of settings of some sort (perhaps a module.settings.json file or a hook.) If these settings are changed, then just like above the diff is saved and merged in. I don't know if this is realistic to do performantly, or how it messes with the opportunity to have config data available at bootstrap without spinning up the db.

  • How this whole thing is built up and accessed is up for debate. One easy thing to do would be to simply build and merge the entire config object as one of the first steps in bootstrap, making it a singleton accessible system-wide. However this has serious memory consequences, especially as we start storing more complex data in it. Another option is that when we access for instance

    $config->drupal->site_information->site_name

    We just load that one item (or branch) and then let it go when we're done. Better for memory but it means we end up possible re-opening and parsing the file a lot more often (or the cached version.) Or we do a mix of both (lazy load whats needed as needed, but keep it around in the singleton for later.) A lot of experimentation is necessary to fully understand the pros and cons here. Architecturally, we should try to make the API independent of the file organization so that file organization is not an API break. "Use uncertainty as a driver."

  • Catch has brought up the prospect of pluggable storage backends. For instance, instead of writing to the file system we could write to apc, chdb, hidef or some other system for increased performance or other needs. This is an interesting idea although some of these systems have limitations like only being able to save scalars and not more complex data. Still I don't see why that should prevent us from offering up the ability to plug other storage backends in. Also, since each 'branch' of configuration is its own config object, there's no reason why we couldn't have pluggable storage per branch. So if you want to store the variables table in apc_define_constants, and views on disk, and everything else in the DB, then you could. David has also suggested ZooKeeper (https://github.com/andreiz/php-zookeeper) as a possible first implementation, which sounds pretty interesting. It has both REST and native PHP interfaces, and supports JSON and hierarchical organization.
  • This will replace a default settings.php and $conf. It is possible we will still need the ability to let modules set stuff in an executable file that is loaded before any work is done (Domain Access has been put forth of an example of this) but if so, we still don't have to ship with it.

How We Do It

  1. Create a system for reading/writing to disk, start with json as a testbed. Make this an interface that anyone writing a format has to implement.
  2. Create the interface and an implementation for a basic config object, basically a port of zend config. Nothing fancy. Should have option to use magic functions and explicitly created equivalents so we can performance test.
  3. Figure out a good way to interface with this for variables. Implement variables table with it. Do a bunch of testing and iterating.
  4. Work through core replacing variable_set() and variable_get() as needed, including stuff like system_settings_form()
  5. When we have this working well, we can start to work on some more complex implementations. Block config might be a good use case here.

There is a bit more unknown in this system than in the UUID system, which already has several implementations in Drupal-land. I suspect that after implementing the basics, we will learn a lot that will inform what our next steps will be.

What an Go Wrong

Major areas of concern

  1. Performance - Will require an enormous amount of testing and iteration. We can not slack off here and must start hammering things and optimizing as early as possible and continuously throughout the cycle of development to prevent regressions. I would love to see this somehow integrated into the testbots, alternately will heavily lean on Catch and others with expertise in this area (and start to develop some myself)
  2. Security - My hope is that by writing non-executable data to an isolated directory protected by .htaccess, we will mitigate most major concerns. (This model has served us well in the files directory.) The one thing I am concerned about is what we do with more sensitive data IE: do we store our database password here? I look to the experts for advice.
  3. What do we do about content type/field config changes? - They can perform permanent modifications to the underlying data structures (and thus the content contained within) then doing rollback/updates of them can be incredibly tricky. This may be something we just have to punt on, or we can follow the Features model where moving to an old or newconfiguration may require undesirable steps to achieve (like deleting all content from a content type), but it should always be possible and we give great warning to people who may be attempting something which is destructive. I am more than open to suggestions.
  4. Multisite / Sites.php - Haven't thought this through yet to be honest.

Viewing all articles
Browse latest Browse all 49206

Trending Articles