Sunday, May 13, 2012

Hello World with ØMQ (ZeroMQ), Part I

Why I am interested in ØMQ

Several months ago, I stumbled across an interview with Pieter Hintjens about ØMQ (ZeroMQ) in episode 195 of FLOSS Weekly by Randal L. Schwartz. From what I got from the interview, ØMQ is a pretty powerful "message queue" system that is somehow implemented in a light-weight way as a linkable library. There are also many language bindings, including all fashionable and many exotic languages (but, sadly, not Common Lisp).
I had heard about message queue systems for a long time, but have never really used any, and they always seemed a little scary. The currently popular message queue system seems to be RabbitMQ, and despite the cute name, I hear that it is somewhat big. At the same time, I'm sure that message queues serve a useful purpose, and may be a great basis for distributed systems with fewer reinvented wheels and, thus, better behavior (including performance), so they probably deserve a closer look. And ØMQ seems to be successful in several respects, and at the same time "lightweight" enough for me to understand something. This makes it an attractive system to investigate.

First steps

I have finally found time to start reading the ØMQ Guide during a train ride from Geneva to Zurich.
The introduction ("Fixing the world") at first looks a little pompous, but is in fact full of very good thoughts, both original and convincing, about big problems in programming large distributed software systems. Apparently ØMQ aspires at solving an important part of these problems. Judging from the introduction, the people who wrote this seem very smart. And from the interview, I know that the designers have had a lot of practical experience building real systems, and they knew the deficiencies (but also the achievements) of other messaging systems before they started rolling their own.

Walking through the "Hello World" example

So now I'm reading through the first, "Hello World", example in the guide, trying to understand as much as possible about ØMQ's concepts. The example starts with the server side (which responds "World"). I cut & paste the complete code to a file server.c, which compiles easily enough on my Ubuntu 12.04 system with libzmq-dev installed:
    : leinen@momp2[zmq]; gcc -c server.c
    : leinen@momp2[zmq]; gcc -o server server.o -lzmq
    : leinen@momp2[zmq]; 
When trying to understand an ØMQ API call, I first guess a little what the names and arguments could mean, then I look at the respective man page to validate my guesses and to learn the parts that I was unable to guess.
    void *context = zmq_init (1);
Why is the result type void *, rather than something like ZqmContext *? I'll explain in a mpment why I would strongly prefer the latter.
And wouldn't it be nice if the size of the thread pool (the 1 in the call) could be left undefined? The ØMQ system could either optimize it dynamically, or it could be controlled by standardized external configuration. But maybe this isn't really practical anyway.
    //  Socket to talk to clients
    void *responder = zmq_socket (context, ZMQ_REP);
The socket call is simple enough, taking just one argument beyond the necessarily required context - the socket type (here: ZMQ_REP), "which determines the semantics of communication over the socket." So what does ZMQ_REP mean, and what other types are available? Aha, "REP" is for "reply", and the complementary type is ZMQ_REQ, for "request". So these must be for the standard request/response pattern in classical client/server protocols.
Other types include ZMQ_PUB/ZMQ_SUB for pub/sub protocols, ZMQ_PUSH/ZMQ_PULL for processing pipelines, and a few others related to, if I understand correctly, load balancing, request routing etc.
That's a great choice of patterns to support, because they cover a huge subspace of socket applications in real applications.
    zmq_bind (responder, "tcp://*:5555");
In passing, I notice that this call doesn't take a "context" argument. This probably means that it gets the context from the socket (here: "responder"). Convenient. But, coming back to my previous complaint, responder is also of type void *. So what happens when someone is only slightly confused and passes context to zmq_bind? The compiler certainly has no way of catching this. In fact, when I deliberately introduce this error, I find that even the library doesn't catch this at runtime! I just get a server program that mysteriously sits there and doesn't listen on any TCP port. That is really not nice. Maybe real programmers don't make these kinds of mistakes, but I have my doubts. As an old Lisper, I'm certainly not religious about static type checking. But type checking at some point would really be beneficial.
OK, back to the zmq_bind call. We have a responding socket, so we need to bind it to a "listening" port. The API uses URL-like strings to specify endpoint addresses (tcp://*:5555). This is fine, although I'm slightly worried whether the API developers try to adhere to any standards here (is there a standard for "tcp" URLs?), or whether they just make things up.
The URL-like string approach is certainly superior to what people using the C socket API have to put up with: Use the right sockaddr structures, use getaddrinfo() (and NOT use gethostbyname() anymore! :-), do address resolution error handling by hand, etc.

ØMQ could support IPv6, but does it?

One can easily imagine that the library just Does The Right Thing (DTRT) concerning multiple network-protocol support, e.g. that the above call results in a socket that accepts connections over both IPv4 and IPv6 if those are supported.

Not just yet, it seems.

Unfortunately this doesn't seem to be the case however: When I run the compiled server.c binary under system-call tracing, I get
    : leinen@momp2[zmq]; strace -e bind ./server
    bind(16, {sa_family=AF_INET, sin_port=htons(5555), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
      C-c C-c: leinen@momp2[zmq]; 
So this is obviously an IPv4-only socket. Still, if I look inside libzmq.a (in GNU Emacs in case you are curious, though nm | grep would also work), I notice that it references getaddrinfo, but not gethostbyname. So there is at least some chance that someone thought of IPv6. Maybe one has to set special options, or maybe the people who built the Ubuntu (or Debian) package forgot to activate IPv6, or whatever?
Looking at the man page of zmq_tcp, ("ØMQ unicast transport using TCP"), it only talks about "IPv4 addresses". The way I interpret this is that they only support IPv4 right now, but at least they don't ignore the existence of IPv6, otherwise they would have just said "IP addresses" and still meant IPv4 only. So there is at least a weak hope that IPv6 could once be supported.
When I briefly had connectivity (because the train stopped at a station with WiFi), I googled for [zmq ipv6] and found a few entries that suggest that there has been some work on this. Maybe the most promising result I got from Google was this:
[zeromq-dev] IPv6 support - Grokbase
http://grokbase.com/t/zeromq/zeromq-dev/118fjjmpmc/ipv6-support
15 Aug 2011 – (5 replies) Hi all, Steven McCoy's IPv6 patches were merged into the master. The change should be completely backward compatible.

So this fairly new. I also noticed that the libzmq in my brand-new Ubuntu 12.04 is only version 2.11.11, and some other Google hits suggest that IPv6 support is planned for ØMQ 3.0. So when I get home I'll check whether I can find ØMQ 3 or better, and use that in preference to the Ubuntu package.

What about multi-transport support?

So we learned that multiple network-layer support (e.g. IPv4 and IPv6) seems possible, and even planned. What about support for multiple different transports? This is certainly not possible using low-level sockets, and it seems difficult, but not impossible, to provide this transparently using a single ZMQ-level socket. For example, a server endpoint could listen requests on both TCP and SCTP connections.
I have a hunch that ØMQ doesn't support this, because the standard interface has only a single scalar socket-type argument. In some sense this is a pity, because having a single socket that supports multiple protocols would make it easier to write programs that live in a multi-protocol world, just like support for multiple network protocols in a single socket makes it easier to write programs that live in a world with multiple such protocols, like IPv4 and IPv6 these days.
Conceptually, ØMQ looks powerful enough to support this without too much pain. It might also be possible to build "Happy Eyeballs" functionality into the library, so that in such multi-protocol contexts, the library could make sure that a reasonably well-performing substrate is always used, so that developers using the library don't have to worry about possible problems when they add multi-protocol support.

Swiss trains go too fast, or Switzerland is too small

So I only got through a whopping two lines of Hello World code for now. But I hope my sidelines and confused thoughts didn't turn you off, and you still agree with me that ØMQ is something worth watching. I sure hope I'll find the time to continue exploring it.

Tuesday, February 07, 2012

Chrome Beta for Android, first impressions

So Google published a beta of Chrome for Android.  It's only available for Android 4.0 "Ice Cream Sandwich", which caused many complaints.  I find this somewhat understandable because Chrome uses fancy graphics, e.g. for the interface of changing between multiple tabs.  What I have a harder time understanding is why they restricted it to a handful of countries in Android Market.  Fortunately the comments on a Google+ post contain hints on how this can be circumvented - thanks, +AJ Stang!
First impressions from using this for a few minutes on a Nexus S: The multiple-tabs feature seems very powerful, and the UI for changing between them seems to work really well on a small mobile device.  Being able to use the Chrome development tools (profiler, DOM inspector etc.) over USB is also quite cool.  It does seem a little slower than the standard Web browser in Android though.  As a heroic experiment on myself I'm making this my default browser for now.

Friday, February 03, 2012

The inexorable growth of bandwidth, or lack thereof

I moved offices recently, so I threw away some old posters.  One of them was an old map of the GÉANT backbone.  On a first look, I was wondering how old it was - the main backbone links were all 10Gb/s, much like today (a few links have been updated to 2-3*10Gb/s or 40Gb/s last year).  To my surprise, the poster was from 2001.  So for ten of the last eleven years, the standard backbone link capacity for large research backbones has stagnated at 10 Gb/s.  (I know other things have changed... these things have become more "hybrid" and stuff.)
In a similar vein, the standard network interface for a standard 1RU or 2RU rackmount server has been Gigabit Ethernet (1Gb/s) in 2001, and it is still GigE in 2012 - although servers generally have at least two and commonly four of them.  You can get servers with 10GE interfaces, but this is not the norm.
The main reason is probably that the upgrades to 10Gb/s or GigE in 2001 were "too early", or based on too optimistic assumptions of future growth.  Anybody remember Sidgmore's law?
But I think that demand has eventually caught up, and that we're on the brink of moving beyond these steps.  For servers, it seems clear that GigE will be replaced by 10GE.  For this to happen, it needs to be on the motherboard, and probably in the twisted-pair variety.  For backbones, the preferred option in most circles seems to be to move to 100GE.  Personally I think that 40GE or even 10GE, in connection with link bundling (as is already done in the commercial world) could be interesting alternative options, also for research networks.