Quantcast
Channel: Recent posts across whole site
Viewing all articles
Browse latest Browse all 49221

REST routing

$
0
0

Last Thursday (5 May) I had an impromptu IRC chat with a number of people about the routing logic we are going to need in Phase 3. We haven't really nailed that down much, so this was an excellent conversation to have. I've tried to capture all of the relevant details below (not necessarily in the order we arrived at them), but those that were present can correct me if I'm wrong somewhere. Further discussion welcome.

Attending:

  • Larry Garfield (Crell)
  • Earl Miles (merlinofchaos)
  • Vladimir Zlatanov (dikini)
  • Roy Scholten (yoroy)
  • Justin Randall (beejeebus)

Previous discussion: http://groups.drupal.org/node/67588

We originally were talking about the UI needed to define routing rules and how one would use context for that. We quickly drifted to the routing itself, however. Vladimir noted an Erlang-based router called webmachine, which while we cannot use it directly (it's Erlang) or do a direct port (PHP is too totally different) did serve as a good discussion point for comparison.

What we came to realize is that if we really want to support arbitrarily complex routing rules, that's, well, expensive. Especially since some of the information on which we could want to route will require interaction with Drupal as it is derived context (eg, node type). However, most routing logic we want to handle based solely on request context (the raw HTTP request). It's faster, and frankly the majority case (we think).

This split exists in Drupal now, sort of. Core routes (via hook_menu) solely based on path. Panels adds a concept of "variants" on top of it to allow essentially secondary routing, but because it has to dance around core it is ugly, hacky, hard to understand, and no one really likes it. However, it appears that conceptually we will still need to have something like that for routing based on arbitrary criteria, so we will need to do it right from the get-go.

Essentially, routing becomes an inherently 2-step process. The first, which for the time being I'll call primary routing, takes the incoming context and looks up possible mapping information based on selected, fixed criteria in the request. The primary routing uses "build time" logic; that is, it's like hook_menu now where we do a lot of work up front to figure out our mapping table and then request time is very fast. That will return one or more possible Response Controllers (plus configuration) to handle the incoming request.

If only one response controller is found, we're done. Pass the context object off to the response controller and let it do its thing. If more than one possible response controller is found, then we trigger the secondary routing step. The secondary routing step is request-time, that is, it's PHP code that executes in the request to decide which response controller to use. The secondary routing narrows the list down to a single response controller, and then we use that and we're done.

Primary routing

First off, we all agreed that primary routing should be pluggable. Since we're already building a plugin system, it seems like a reasonable system to use. :-) In fact, arguably the routing is simply the Mapper for the Response Controller Plugin Type. (Check the definitions page for what each of those pieces is.) Making it pluggable has a number of advantages:

  1. It forces a clean separation of concerns using a system and pattern that we're going to be using throughout Drupal.
  2. While the default implementation will almost certainly be SQL-based, there's no reason why one shouldn't be able to reimplement it using MongoDB as a backend, or some other system.
  3. In fact, one very important alternate implementation would be "hard coded". If we view the installer and updater as not one-off hacky scripts but as simply alternative routers with their own configuration, then we can vastly simplify their code. Those systems no longer need to deal with an alternate "state", they just respond to an alternate, pluggable router. It also means that, in essence, Drupal CMS, the Drupal Installer, and the Drupal Updater become three separate applications built on top of Drupal Framework. Which is just all kinds of hot and sexy.

We went around a couple of times on what the primary routing criteria should be. We didn't come to a firm conclusion, but the closest we got to a consensus was:

  • Domain
  • Path (or rather path pattern, as we do now)
  • HTTP Method (GET, POST, PUT, etc.)
  • Content-Type (vis, text/html, text/json, image/jpg, etc.)

Note that Language cannot be a primary routing criteria since it is derived context; language could be influenced by all kinds of things like use preferences.

All of those elements are available directly in the HTTP request itself. Together, they uniquely identify a REST resource (domain, path), what to do with it (method), and how (content type). Even just those four attributes give us vastly more flexibility than we have now in Drupal. The trivial, degenerate implementation would look like simply adding Domain, Method, and ContentType columns to the menu_router table. (We'll likely do much more than that, but you get the idea.)

Most interestingly, it allows for multiple operations on a single URL.

  • GET node/5, type: text/html: Return to me the HTML page of node 5.
  • POST node/5, type: text/html: Submit this data to node 5.
  • GET node/5, type: text/json: Return to me the JSON version of node 5.
  • PUT ndoe/5, type text/json: Here's a JSONified version of a node, save it to node 5.
  • GET node/5, type: application/pdf: Return to me node 5 rendered as a PDF.
  • DELETE node/5, type: *: Delete node 5 (assuming proper permissions)
  • GET some/view, type: xml/atom: Return to me that view, using the Atom display plugin.
  • GET some/view, type: xml/rss: Return to me that view, using the RSS display plugin.
  • POST node/5, type: drupal/form: Submit this Drupal form at node/5 (proposed???)

And so forth. The domain part means, amusingly, that we've just put part of Domain Access into core almost for free. Ken wil be very chagrined to hear that. :-)

Another important factor is that with that richer information we can return far more useful and appropriate error messages. For instance, if someone sends a PUT node/5 request, and there is no PUT handler registered but there is a GET handler registered, then an HTTP 404 is not the correct response. The correct response is HTTP 405 Method Not Allowed. If the request is for text/json, then sending back a big HTML error page is simply flat out wrong; instead, an empty JSON response (or something sensible) should be sent back instead. There's considerable potential for performance improvements if we can send back a proper HTTP 302 (Found), 303 (See other), or 304 (Not modified) response very early before the rest of Drupal initializes.

This would be extremely useful for REST applications, but also for our own mundane Ajax usage. If you've ever used the Views or Panels UIs and had an Ajax response come back as an incomprehensible pile of garbage in an alert box, you should be very excited about the possibilities this opens. :-) The best part is that this is all existing parts of the HTTP standard; we're just not bothering to use it yet.

One suggestion was to also include ETags in the primary routing information, but I'm not sure if that's viable since that could depend on a huge number of factors that wouldn't be impacted at build time.

While I'm talking about nodes above in the examples, remember that at this level there are no nodes or entities; there is just data and requests to resources. Essentially, at this point we're architecting a Drupal-independent REST server.

Secondary routing

We have no idea yet what form this would take. :-) It needs to not be dog slow, and needs to be configurable from the UI, but beyond that I have no idea what this looks like. As Earl noted, however, "it's amazing what they assume will work", so we need to make sure that "anything" (not to be confused with "everything") can be used here. My thinking is that we want to make it reasonably fast for the common cases (entity bundle and language are the most common I can think of), but still allow people to shoot themselves in the foot with incredibly stupid and slow routing rules.

Other considerations

Another important question is access control. We didn't really cover that. Presumably access should be pluggable (duh), and bound to each router, not just each path. Someone could easily have access to GET node/5, but not PUT node/5. Right now that's governed by simple callback functions that are serialized into the menu_router table. We likely want something more robust than that, but it still needs to be thought through.

Also relevant is error handling. While breaking up the routing information as above means that we can return a separate 4xx error type for each method or content type or domain, we still need to determine how to do so in a performant fashion that isn't butt ugly.

We also need to determine how we support wildcards. In the degenrate case, we wouldn't want to have to record the domain of the site into every record. That makes the site less portable. So at the very least we will need to make that optional and/or allow an "any" key.

While our primary implementation will likely be SQL, as noted above we need to ensure that the semantics we define will still work on other systems (hard coded, MongoDB, Cassandra, whatever).

There's still some open questions but I really like the direction this conversation went. It provides us with an extremely robust under-pinning to do lots of things Right(tm) with respect to HTTP, which is the lifeblood of Drupal I/O.

Discuss. :-)


Viewing all articles
Browse latest Browse all 49221

Trending Articles