One of the key aspects of solving the configuration management conundrum is identifying how to uniquely identify data between servers. To this end I propose implementing UUIDs in core for the identification of content, and to provide an API through which UUIDs can be generated and added use by contrib modules. While we are still talking through many big picture architectural issues, I think that it is possible to get moving on this and several other tasks to get the real work started. So here is my proposed game plan for this, created after some discussion with Dave Hall (skwashd) who has put a lot of work recently into porting the UUID module to D7. He has done several implementations of this now, and I think a lot of the current code can be reused or at least serve as inspiration. (see http://drupalcode.org/project/uuid.git/tree/refs/heads/7.x-1.x) This is not set in stone by any means, its just a proposal and I'd love to hear what people think.
- Figure out how we want to generate UUIDs and implement an API function to generate them. I propose using the UUID module's current functionality as a starting point (http://drupalcode.org/project/uuid.git/blob/refs/heads/7.x-1.x:/uuid.inc). I'm unsure about the benefits of having multiple generators vs just the PHP implementation. On the plus side, the PECL and Windows-specific generators are much faster than the pure PHP specification. On the downside, I'm not sure if it is appropriate for core to be adding platform-specific exceptions like this. It seems like it sets a bad precedent. Ideally we can make this pluggable - we ship with PHP implementation by default and you can replace it if you want/need to.
- We will need some simple APIs for translating UUIDs to serial keys and vice versa.
- I propose that UUIDs should live alongside serial keys, with Drupal using the serial keys internally for most uses while UUIDs can be used where necessary. The UUID module has implemented a lot of this already through the use of hook_schema_alter() but obviously we can just drop the fields in. Some tables will be excluded. Off the top of my head variable and system seem like places where this will be unnecessary due to their use of unique text strings as identifiers. I'm sure there will be more and we should make a list, the general rule being anything with a 'machine name' can be excluded. This will also involve some discussion about when and where we use UUIDs internally. For instance, should Nodereference/Relation/etc refer to content based on its serial ID or its UUID? What about user IDs in the node object? Can we node_load() using the UUIDs? etc. We need to establish some guidelines around this and adjust as needed, with particular attention to performance implications.
- Once all that is sorted we can start with the patches adding appropriate fields to the tables in hook_schema(), and the update functions to add them in as well as (probably) a batch update to add uuids to all your existing data.
- We will also want to update the _save() functions to acknowledge the existence of a uuid as an indication this is not a new user/node/etc, as well as adding uuid generation to the _insert() functions. We need to make sure and not regenerate UUIDs for objects that have them already (since the assumption becomes that they are coming from an external system.) This will have to be done carefully, but may be an opportunity to start working in some entity CRUD functionality at the same time.
As we encounter bugs or roadblocks for the above along the way, file issues and mark them uuid and/or change management so we can keep a good list going of where we're at.
What have I forgotten? What part is broken? Let me know!