By Zed A. Shaw

Mongrel2: State Machines, HTTP->0MQ, Events

Today I'm dropping a fresh Mongrel2 that features a completely redesigned connection management algorithm that uses a bad ass Finite State Machine to keep everything straight. This state machine will make it possible for Mongrel2 to keep-alive connections no matter what kind of backend is requested and allow for developers to inject their own filters on the events that manage connections. This release also features a first (hackish) working HTTP->0MQ protocol based on SCGI and a simple demo.

I'm very excited about the connection state machine because it means that in addition to the original highly accurate Mongrel HTTP parser, I've now got a connection state system that's just as accurate. It will let Mongrel2 support bizarre proxy configurations, keep-alive state, any kind of backends, HTTP long poll operations, JSSocket, and WebSockets.

As I worked on it this last week I started stumbling into extra features I got for free with this design. HTTP long poll and keep-alive from HTTP->0MQ are probably the two sexiest. Thanks to this design, long polling isn't a special feature, it's just how things work. It also means that HTTP->0MQ has the ability to do N:M processing similar to the current chat demo but using plain HTTP.

What I'm most excited about is how, since the state machine is controlled by simple integer events through a fast Ragel FSM, that means people can write filters based on the events and the callbacks. In the same way people grabbed the Mongrel HTTP Parser and used it to build web servers, this new connection state machine will let you extend Mongrel2's internal connection state processing to meet whatever hairy problems you meet.

HTTP->0MQ

First up, I got HTTP->0MQ going. It's gross, but it works. If you go to this test you can see it in action. What this does is take your HTTP request, translate it to SCGI, and then hand it to a 0MQ backend. That backend is then just echoing back your headers and such, so it doesn't do much. Here's the code for the handler (which is a total hack):

import zmq
import time

sender_id = "82209006-86FF-4982-B5EA-D1E29E55D481"


ctx = zmq.Context()
reqs = ctx.socket(zmq.SUB)
reqs.setsockopt(zmq.SUBSCRIBE, "")
reqs.connect("tcp://127.0.0.1:9997")

resp = ctx.socket(zmq.PUB)
resp.connect("tcp://127.0.0.1:9996")
resp.setsockopt(zmq.IDENTITY, sender_id)

class Request(object):

    def __init__(self, ident, headers, body):
        self.ident = ident
        self.headers = headers
        self.body = body


def parse_request(msg):
    ident, rest = msg.split(' ', 1)
    length, rest = rest.split(':', 1)
    length = int(length)
    headers = rest[0:length]
    headers = headers.split('\01')[:-1]
    headers = dict(zip(headers[::2], headers[1::2]))
    body = rest[length+1:]

    return Request(ident, headers, body)


while True:
    req = parse_request(reqs.recv())

    response = "\nIDENT:%r\nHEADERS:%r\nBODY:%r%" % (req.ident, req.headers, req.body)

    print response, "\n"

    resp.send(req.ident + " HTTP/1.1 200 OK\r\nContent-Length: %d\r\n\r\n%s" % (
        len(response), response))

As you can see, it is not doing much, and it is not doing it that well. In fact, all of this will change because what I really want is a more unified 0MQ transport that will work between different backends better. This is just to get the feature out the door and try it.

The surprising part is that, because of the state machine design, the HTTP connection is also in keep-alive mode inside Mongrel2, and the backend can send data to any currently connected browser.

I got long polling for free. Bad ass.

I personally think all these keep-alive hacks are pathetic, considering how huge of a hack they are in almost every other web server there is. In other web servers, getting long polling up and running is like a major holiday with cake and ice cream. In Mongrel2, this kind of asynchronous multiple response keep-alive operation is like...Tuesday.

Remember though, this is a total hack right now. It's going to get and cleaned up quite a lot, but the fact that the feature works better than I even planned is really fun.

To test out that your connection is in keep-alive mode, go hit the test page and then refresh real quick a few times. See how your ident number stays the same? That's actually your socket number inside Mongrel2, so when it changes your browser reconnected. I think there's a socket leak somewhere, but I'll fix that soon. The key point is that your connections are maintained as long as possible for the most speed.

Connection State Machine

When I first got proxying going I had a bug because I wasn't maintaining connection state properly. If you went to the proxy test and then hit the file serve test page you would randomly get a page from the proxy backend. The reason was keep-alives. I was just doing a simple proxying where once a request came in, I parsed the first HTTP header, figure out where it should go, and then held on for death shuttling everything between the browser and the backend.

That's just bad design because HTTP has many different states it has to be in and you need to switch between them without disrupting the browser's connection. If you proxy, and then a request comes in for some other resource, you can't use the proxy to get it, you have to get it separately. You also can't keep connections to proxies open forever, since you can overload them and cause problems. Browsers also flake out so you need to know exactly when to shut things down, and when not to send requests.

After thinking about it, I thought I'd try using a Ragel State Chart similar to how I did connection state with Utu back in the day. The way this works is instead of writing code with lots of if-statements for every possible edge case, you define three things:

  • A state machine that defines how a connection is managed.
  • Events that cause the state machine to change states.
  • Callbacks that have to do work to make the state machine do stuff.

Keeping control of complex state like in an HTTP connection is dead easy in a state machine, assuming you can get one defined. The problem with and FSM is that it forces you to sit down and really get what you want to happen straight. You can't half-ass an if-statement or switch because the FSM has to be complete or it won't work. This makes them harder to use at first, but much easier to deal with once you've defined them and simplified them.

The Benefits Of An FSM

Right away as I worked on this design it found bugs and gave me "magic" features for free. For example, because the connection state is designed so it continually processes requests and knows exactly when the socket is closed, Mongrel2 can do keep-alives even when the backend doesn't. This means a browser making an HTTP request doesn't have to care that it goes to a 0MQ handler, it just keeps the connection open.

When I wrote the HTTP->0MQ support, it sort of worked right away with keep-alives. Browser would connect, 0MQ would respond, and then the next request just kept using the same connection. That means....I got long polling for free. To test it out I did different tests where one browser tossed messages to others using the basic HTTP->0MQ support.

It was easy, and no special gear was needed, unlike with other implementations.

Debugging turned out to be simple too. The unit tests just look like this:

// Simulates doing a basic HTTP request then closing the connection.
RUN(http_dir,
        OPEN, ACCEPT,
        REQ_RECV, HTTP_REQ, DIRECTORY, RESP_SENT, CLOSE);

// Simulates two keep-alive handler requests then a close.
RUN(http_handler,
        OPEN, ACCEPT,
        REQ_RECV, HTTP_REQ, HANDLER, REQ_SENT,
        REQ_RECV, HTTP_REQ, HANDLER, REQ_SENT, CLOSE);

To make sure the FSM works, I just have tests that feed it the events for different situations.

Since the FSM can log every state transition and why it's doing what it does, I can also pinpoint failures and figure out what should happen next.

Overall, getting this FSM right is much easier than using other methods, and should support future changes easily.

How The State Machine Works

To give you an idea of how the Mongrel2 state machine works, here's how it runs one:

void Connection_task(void *v)
{
    Connection *conn = (Connection *)v;
    int i = 0;
    int next = 0;

    State_init(&conn->state, &CONN_ACTIONS);

    for(i = 0, next = OPEN; next != CLOSE; i++) {
        next = State_exec(&conn->state, next, (void *)conn);
        check(next > EVENT_START && next < EVENT_END, "!!! Invalid next event[%d]: %d", i, next);
    }

    State_exec(&conn->state, CLOSE, (void *)conn);
    return;

error:
    State_exec(&conn->state, CLOSE, (void *)conn);
    return;
}

This is the whole thing in C. All this does is initialize the state machine, and then loop getting events and feeding each one back into the state machine. Once it gets a 0 or a CLOSE event it drops out and finishes up.

Of course what this is running is much bigger, but the end result wasn't that much bigger than the code when I started. It's around 800 lines of code:

$ wc -l src/connection.c src/state.rl src/state_machine.rl 
567 src/connection.c
129 src/state.rl
64 src/state_machine.rl
760 total

It will get bigger, but the fact that this redesign is about the same as the previous design but has full keep-alives, correct proxying, directory serving, and functioning HTTP->0MQ while the other code could barely proxy is awesome.

So What?

Alright, so apparently I've got this thing I feed integers into and it does stuff. How does this impact a developer who has to use it? My design idea is that the state machine will help developers on three levels using Mongrel2:

  1. Exact understanding and debugging of what exactly is going on for each connection.
  2. Total control over connections by filtering or changing their events in the middle of processing.
  3. Augmenting connection management with your own logic to compensate for the "rails effect".

Bad Ass Debugging

Take a look at the image of the state machine which is generated by Ragel. Try to see if you can figure out what might happen at different conditions while Mongrel2 is running. Maybe you can see what happens when a connection to a proxy is interrupted by a request for a backend HANDLER?

Nobody is expecting you to refer to this diagram as you use Mongrel2, but imagine if you needed to find out why something is failing. Right now your web server is a total black box. You got no idea what's going on unless you turn on some obtuse insane logging mode.

This image is actually how the state machine flows, and it can be generated from the code so you know what's going on. The code however is also pretty readable and understandable, take a look:

Proxy := (
        start: ( 
           CONNECT @proxy_deliver -> Sending |
           FAILED @proxy_failed -> Closing
        ),

        Proxying: (
            HTTP_REQ @proxy_deliver -> Sending |
            PROXY @proxy_exit_routing |
            HANDLER @proxy_exit_routing |
            DIRECTORY @proxy_exit_routing |
            REMOTE_CLOSE @proxy_close -> Closing
        ),

        Sending: (
            REQ_SENT @proxy_parse -> Proxying |
            REMOTE_CLOSE @proxy_close -> Closing
        ),

        Closing: (
            CLOSE @proxy_exit_idle
        )

     )  <err(error);


Connection = (
        start: ( OPEN @open -> Accepting ),

        Accepting: ( ACCEPT @parse -> Idle ),

        Idle: (
            REQ_RECV @identify_request HTTP_REQ @route_request -> HTTPRouting |
            REQ_RECV @identify_request MSG_REQ @route_request -> MSGRouting |
            REQ_RECV @identify_request SOCKET_REQ @send_socket_response -> Responding |
            CLOSE @close -> final
        ),

        MSGRouting: ( HANDLER @msg_to_handler -> Queueing ),

        HTTPRouting: (
            HANDLER @http_to_handler -> Queueing |
            PROXY @http_to_proxy  |
            DIRECTORY @http_to_directory -> Responding |
            CLOSE @close -> final
        ),

        Queueing: ( REQ_SENT @parse -> Idle ),

        Responding: (
            RESP_SENT @parse -> Idle |
            CLOSE @close -> final
        )

        ) %eof(finish) <err(error);

Despite the slightly odd syntax, hopefully you could figure out what causes transitions, what callbacks go off, and what states the machine can get in during processing.

The first advantage is very clear: It will be possible to give people direct control and fast debugging of connections based on the events that Mongrel2 is processing. For example:

  • You could have a console that keeps track of the number of events and spits out histograms of them, or other stats for basic information. That's stats right down to the processing loop of the server.
  • You could keep track of all the connections coming in and what state they're in.
  • You could kill connections that get into states you see as dead.
  • Restarts could target specific states rather than just blanket cover all connections.
  • You can kick connections that either eat up too many events, or do too many too fast. That means a kind of per-connection CPU counter in a way, to avoid abuse.

Right now the Mongrel2 I'm running is actually logging the hell out of everything, and I can see what connections are in keep-alive, where they're proxied, who's closed, when they get closed, the works.

This design will also hopefully end the debates about how a web server should work. In other servers they have to incrementally tweak and bolt on features to support newly found edge cases. With this design, we'll be able to pinpoint changes based on the FSM, and figure out how to support new edge cases inside it, or even exactly why we should reject the identified edge case.

Controlling Events

Because the event processing is just a sequence of integers, it'll be possible for you to manage Mongrel2's events on the fly in situations where you need emergency control. Imagine being able to say something along the lines of:

Crap! We're overloaded, filter all HTTP_REQ so that HANDLER, PROXY, and DIRECTORY are sent to the maintenance handler.

In current parlance that's setting up a maintenance page, but in other servers you have to configure some arbitrary file, then figure out the path magic that makes it happen, and then put the file there when you want, oh and on all the different servers.

With Mongrel2, I'm hoping you could just send out a command that says the above using just the event names, and have it affect all the machines.

There aren't very many events either, here's all of them so far:

    ACCEPT=101,
    CLOSE=102,
    CONNECT=103,
    DIRECTORY=104,
    FAILED=105,
    HANDLER=106,
    HTTP_REQ=107,
    MSG_REQ=108,
    OPEN=110,
    PROXY=111,
    REMOTE_CLOSE=112,
    REQ_RECV=113,
    REQ_SENT=114,
    RESP_SENT=116,
    SOCKET_REQ=117,
    TIMEOUT=118,

This is a bit of hand waving, since I'm not sure how the hell a command language with these would look, but I know they'd work great.

Modules, Filters, And The "Rails Effect"

Imagine you've written a web framework, and it leaks memory like the Titanic leaks ice water. Imagine also that this web framework leaks memory because the language you chose to use is written by a bunch of hack hobbyists who had many bugs in their garbage collector and arrays, but denied there were any bugs at all. There was no way you'd ever fix this memory leak, and you can't tell everyone your super popular web framework is built on a sand pit, so you need a backup plan.

You need your web server to keep your framework alive using some kind of logic like this:

  1. If a request comes in.
  2. And out of 10 backends none are available.
  3. Then hold this connection until one becomes available.
  4. And send that request to the first open one.
  5. But if the backend dies, restart it first.
  6. Oh, and check that it's not using too much ram.
  7. Oh, and like scale up 10 more if we're overloaded.

All joking aside, I'm aiming Mongrel2 at this kind of stupidity I call "the rails effect". This is where the arrogance of the framework requires that the operations people have to suffer through supporting whatever hair brained crap they think up to keep from having to actually fix their broken gear.

With Mongrel2, the plan is to allow you to create modules like say in Apache, but that these modules:

  • Can hook into events and filter them, change them, or alter connections based on them.
  • Can get called before/after different callbacks to change how they operate.
  • Can completely replace the callbacks in special hardcore requirements.
  • Can be either loadable .so files, or external handlers that are specially written.

The idea is that you have the events mentioned above, so you can register callbacks that are fed the events before the state machine gets them. That'll let you alter them as you need. Maybe you want to authenticate people before they go to any proxies? Filter the PROXY event and change it to redirect if they haven't authenticated.

This idea is fairly powerful because the configuration could be dead simple. You just say what events the module gets, and maybe for what handlers. It is also a basic primitive for implementing other features like page caching, memcache caching, security, conditional serving, and all using one common concept: the event.

Events are nice and all, but you might also need to have things that happen deeper in the server, like you want to change out the protocol used to talk to Handlers because you like BIRT. Well, here's all of the calllbacks the state machine uses:

StateActions CONN_ACTIONS = {
    .open = connection_open,
    .error = connection_error,
    .finish = connection_finish,
    .close = connection_close,
    .parse = connection_parse,
    .identify_request = connection_identify_request,
    .route_request = connection_route_request,
    .send_socket_response = connection_send_socket_response,
    .msg_to_handler = connection_msg_to_handler,
    .http_to_handler = connection_http_to_handler,
    .http_to_proxy = connection_http_to_proxy,
    .http_to_directory = connection_http_to_directory,
    .proxy_deliver = connection_proxy_deliver,
    .proxy_failed = connection_proxy_failed,
    .proxy_parse = connection_proxy_parse,
    .proxy_close = connection_proxy_close
};

It would be possible to let special modules "wrap" these calls, either altering them directly, or completely swapping them out. They all have a consistent call structure and are mostly fairly small, averaging about 10-20 lines of C code. With this you could inject SSL encryption at certain stages, secure wipes, client certificate checks, whatever you need to control the server.

However, all of this cold also be safer than in other web servers because the state machine is completely defined. You'd know right away when your module caused problems, and why it fails, and assuming you didn't nuke the process, Mongrel2 could keep on trucking. Especially if the module is using 0MQ to do unix socket communication.

The gist of it all is that by having Mongrel2's connection state and main processing easily controlled by a few callbacks and some events, I can expose that safely to deployments that need it.

What's Next

This new design is showing a lot of promise, but I need to cycle on it and see if I can simplify it more while adding the remaining features. I also need to figure out the module system and exactly how you'd configure them in the sqlite3 database. The nice thing configuration is mostly just using the events so it'll be cake. The bad thing is I'm not sure how to make these events usable yet, and maybe I'll just punt for a bit.

I also need to work on the generic unixy things like forking, daemonizing, chroot, etc. That's fairly easy but it's getting close to time for needing it.

Finally, I gotta work on a Python driver for this so that I can quit writing this hack job Python code to make it work.

If you've got comments, shoot me an email or come hang out in #mongrel2 on irc.freenode.org (since that will probably stay up after the chat demo dies).