I have been conducting some research into how other systems are organized and manage their configuration and content deployment. There are plenty of systems that don't manage this well or whose problems are similar to ours, but I've found a couple places that have some interesting approaches that I am going to start adding here. Please note that my experience with these systems is limited to several hours of research each, and that is just barely enough to be scratching the surface, especially since many of these systems are so different architecturally. If anyone with more experience finds inaccuracies or has more to add then I encourage you to do so. Additionally, if anyone has other interesting systems to check out, please let me know or add them yourself!
The one commonality seems to be that these systems have a very hard line drawn between what is content and what is configuration. This obviously reduces their challenges. Many of the systems I have looked at also implement UUIDs for content references and other uses. Neither Plone nor Alfresco use an RDBMS (Plone uses the ZODB object database, and Alfresco stores content on the file system) so many of the concerns we have about performance don't apply (or are at least different.)
For now this is just somewhat of a note dump, will try and clean it up later but I went to start pushing out what I was learning as I learned it.
Plone
Architecture Diagram
http://plone.org/countries/conosur/articulos/Plone-Infrastructure.EN.svg...
Plone 4 User Manual and a reasonable overview
http://plone.org/documentation/manual/plone-4-user-manual
http://plone.org/documentation/manual/plone-4-user-manual/introduction/c...
Collections are Views. Portlets are blocks.
Usage of UUIDs for content reference
http://plone.org/documentation/manual/plone-community-developer-document...
Developer Manual
http://plone.org/documentation/manual/developer-manual
Data Models
http://plone.org/documentation/manual/plone-community-developer-document...
- Three schemas aka objects - persistent data, form data, config data. All extend the interface class.
Alfresco
Some high level feature overviews
http://ecmarchitect.com/archives/2009/08/31/1038
Community Docs
http://www.alfresco.com/help/34/community/all/
Architecture Overview
http://wiki.alfresco.com/wiki/Alfresco_Repository_Architecture
By default, Alfresco has chosen to store meta-data in a database and content in a file system. Using a database immediately brings in the benefits of databases that have been developed over many years such as transaction support, scaling & administration capabilities. Content is stored in the file system to allow for very large content, random access, streaming and options for different storage devices.
A lot of good info here about how their DM deployment stuff works under the hood
http://wiki.alfresco.com/wiki/Transfer_Service
Specifics about resolution of remote content
http://wiki.alfresco.com/wiki/Transfer_Service#Where_transferred_nodes_a...
UI side of the deployment
http://www.alfresco.com/help/34/community/all/tasks/gs-wcm-publish.html
Content replication screencast
http://blogs.alfresco.com/wp/webcasts/?p=1261
- Can push content from one source to multiple 'transfer targets'
- Remote content is read only
- Can specify batches of data to go to each target, can be different per target
- All defined via UI
- Pretty similar to Deploy, can select folders of content to send, scheduling
- Replication jobs run in the background
- Replicates deletions too
- Entire transfer is a single transaction and if any part fails the whole thing rolls back
Take-aways
- Very strong line between content and config
- Implements three key foundation services on which everything is built (Node (config), Content, Search), all share the same transaction, security and configuration characteristics.
- They are using UUIDs plus some other identifying information, but internally they also appear to be using int IDs for primary and foreign keys