Wednesday, June 25, 2014

Riding the SDN Hype Wave

In case you haven't noticed, Software-Defined Networking has become the guiding meme for most innovation in networks over the past few years.  It's a great meme because it sounds cool and slightly mysterious.  The notion was certainly inspired by Software-Defined Radio, which had a relatively well-defined meaning, and has since spread to other areas such as Software-Defined Storage and, coming soon, the Software-Defined Economy.
As a networking veteran who has fought in the OSI and ATM wars, I like making fun of these fads like the next person—buzzword bingo anyone? But actually, I consider the SDN hype a really good thing.  Why? Because, to quote Cisco CTO Padmasree Warrior, "networking is cool again", and that's not just good for her company but a breath of fresh air for the industry as a whole.
What I like in particular is that SDN (because nobody knows exactly what it means and where its limits are...) legitimates ideas that would have quickly been rejected before ("it has been known for years that this doesn't scale/that this-and-that needs to be done in hardware/...").  Of course this also means that many tired ideas will get another chance by being rebranded as SDN, but personally I think that this does less damage.

SDN Beyond OpenFlow

The public perception of SDN has been pretty much driven by OpenFlow's vision of separating forwarding plane (the "hardware" function) and control plane, and using external software to drive networks, usually using a "logically centralized" control approach.
The Open Networking Forum attempts to codify this close association of SDN and OpenFlow by publishing their own SDN definition.  OpenFlow has huge merits as a concrete proposal that can be (and is) implemented and used in real systems.  Therefore it deserves a lot of credit for making people take the "SDN" vision seriously.  But I think the SDN meme is too beautiful to be left confined to OpenFlow-based and "logically centralized" approaches.  I much prefer JR Rivers's (Cumulus Networks) suggestion for what SDN should be: "How can I write software to do things that used to be super hard and do them super easy?" That's certainly more inclusive!

x86 as a Viable High-Speed Packet Processing Platform

Something that I definitely consider an SDN approach is to revisit generic computing hardware (mostly defined as "x86" these days) and see what you can do in terms of interesting packet processing on such platforms.  It turns out that these boxes have come a long way over the past few years! In particular, recent Intel server CPUs (Sandy Bridge/Ivy Bridge) have massively increased memory bandwidth compared to previous generations, CPU cores to spare.  On the interface front, most/all of today's 10 Gigabit Ethernet adapters have many helpful performance features such as multiple receive/transmit queues, segmentation offload, hardware virtualization support and so on.  So is it now possible to do line-rate 10Gb/s packet processing on this platform?
The dirty secret is that even the leading companies in ASIC-based backbone routers are already using regular multi-core CPUs for high-performance middleboxes such as firewalls (as opposed to previous generations that had to use expensive-to-design and program network processors, FPGAs and/or ASICs).
Intel has its DPDK (Data Plane Development Kit) to support high-performance applications using their network adapters and processors, and there are several existence proofs now that you can do interesting packet processing on multiple 10Gb/s interfaces using one core or less per interface—and you can get many of those cores in fairly inexpensive boxes.

Snabb Switch

One of my favorite projects in this space is Luke Gorrie's Snabb Switch.  If CPU-based forwarding approaches are at the fringe of SDN, Snabb Switch is at the fringe of CPU-based forwarding approaches... hm, maybe I just like being different.
Snabb Switch is based on the Lua scripting language and on the excellent LuaJIT implementation.  It runs entirely in user space, which means that it can avoid all user/kernel interface issues that make high-performance difficult, but also means that it has to implement its own device drivers in user space! Fortunately Lua is a much friendlier platform for developing those, and one of Luke's not-so-secret missions for Snabb is that "writing device drivers should be fun again".
The Snabb Switch project has gained of traction over the year or so since its inception.  A large operator is investigating its use in an ambitious backbone/NFV project; high-performance integration with the QEMU/KVM hypervisor has been upstreamed; and the integration into OpenStack Networking is making good progress, with some hope of significant parts being integrated for the "Juno" release.  And my colleague and long-time backbone engineering teammate Alex Gall is developing a standalone L2VPN (Ethernet over IPv6) appliance based on Snabb Switch, with the ultimate goal of removing our only business requirement for MPLS in the backbone.  Now that should convince even the curmudgeons who fought in the X.25 wars, no?
The final proof that Snabb Switch's world domination is inevitable is that it was featured in the pilot episode of Ivan Pepelnjaks new Software Gone Wild podcast.
(In fact that is the very reason for this post, because yours truly also had a (small) appearance in that episode, and I had to give up the address of my blog... and now I'm afraid that some of the venerable readers of Ivan's blog will follow the link and find that nothing has been posted here lately, even less so related to networking.  Welcome anyway!)

Come on in, the water's fine!

So turn your bullshit detectors way down, embrace the hype, and enjoy the ride! There are plenty of good ideas waiting to be implemented once we free ourselves from the rule of those ASIC-wielding vertically integrated giga-companies...

Sunday, May 13, 2012

Hello World with ØMQ (ZeroMQ), Part I

Why I am interested in ØMQ

Several months ago, I stumbled across an interview with Pieter Hintjens about ØMQ (ZeroMQ) in episode 195 of FLOSS Weekly by Randal L. Schwartz. From what I got from the interview, ØMQ is a pretty powerful "message queue" system that is somehow implemented in a light-weight way as a linkable library. There are also many language bindings, including all fashionable and many exotic languages (but, sadly, not Common Lisp).
I had heard about message queue systems for a long time, but have never really used any, and they always seemed a little scary. The currently popular message queue system seems to be RabbitMQ, and despite the cute name, I hear that it is somewhat big. At the same time, I'm sure that message queues serve a useful purpose, and may be a great basis for distributed systems with fewer reinvented wheels and, thus, better behavior (including performance), so they probably deserve a closer look. And ØMQ seems to be successful in several respects, and at the same time "lightweight" enough for me to understand something. This makes it an attractive system to investigate.

First steps

I have finally found time to start reading the ØMQ Guide during a train ride from Geneva to Zurich.
The introduction ("Fixing the world") at first looks a little pompous, but is in fact full of very good thoughts, both original and convincing, about big problems in programming large distributed software systems. Apparently ØMQ aspires at solving an important part of these problems. Judging from the introduction, the people who wrote this seem very smart. And from the interview, I know that the designers have had a lot of practical experience building real systems, and they knew the deficiencies (but also the achievements) of other messaging systems before they started rolling their own.

Walking through the "Hello World" example

So now I'm reading through the first, "Hello World", example in the guide, trying to understand as much as possible about ØMQ's concepts. The example starts with the server side (which responds "World"). I cut & paste the complete code to a file server.c, which compiles easily enough on my Ubuntu 12.04 system with libzmq-dev installed:
    : leinen@momp2[zmq]; gcc -c server.c
    : leinen@momp2[zmq]; gcc -o server server.o -lzmq
    : leinen@momp2[zmq]; 
When trying to understand an ØMQ API call, I first guess a little what the names and arguments could mean, then I look at the respective man page to validate my guesses and to learn the parts that I was unable to guess.
    void *context = zmq_init (1);
Why is the result type void *, rather than something like ZqmContext *? I'll explain in a mpment why I would strongly prefer the latter.
And wouldn't it be nice if the size of the thread pool (the 1 in the call) could be left undefined? The ØMQ system could either optimize it dynamically, or it could be controlled by standardized external configuration. But maybe this isn't really practical anyway.
    //  Socket to talk to clients
    void *responder = zmq_socket (context, ZMQ_REP);
The socket call is simple enough, taking just one argument beyond the necessarily required context - the socket type (here: ZMQ_REP), "which determines the semantics of communication over the socket." So what does ZMQ_REP mean, and what other types are available? Aha, "REP" is for "reply", and the complementary type is ZMQ_REQ, for "request". So these must be for the standard request/response pattern in classical client/server protocols.
Other types include ZMQ_PUB/ZMQ_SUB for pub/sub protocols, ZMQ_PUSH/ZMQ_PULL for processing pipelines, and a few others related to, if I understand correctly, load balancing, request routing etc.
That's a great choice of patterns to support, because they cover a huge subspace of socket applications in real applications.
    zmq_bind (responder, "tcp://*:5555");
In passing, I notice that this call doesn't take a "context" argument. This probably means that it gets the context from the socket (here: "responder"). Convenient. But, coming back to my previous complaint, responder is also of type void *. So what happens when someone is only slightly confused and passes context to zmq_bind? The compiler certainly has no way of catching this. In fact, when I deliberately introduce this error, I find that even the library doesn't catch this at runtime! I just get a server program that mysteriously sits there and doesn't listen on any TCP port. That is really not nice. Maybe real programmers don't make these kinds of mistakes, but I have my doubts. As an old Lisper, I'm certainly not religious about static type checking. But type checking at some point would really be beneficial.
OK, back to the zmq_bind call. We have a responding socket, so we need to bind it to a "listening" port. The API uses URL-like strings to specify endpoint addresses (tcp://*:5555). This is fine, although I'm slightly worried whether the API developers try to adhere to any standards here (is there a standard for "tcp" URLs?), or whether they just make things up.
The URL-like string approach is certainly superior to what people using the C socket API have to put up with: Use the right sockaddr structures, use getaddrinfo() (and NOT use gethostbyname() anymore! :-), do address resolution error handling by hand, etc.

ØMQ could support IPv6, but does it?

One can easily imagine that the library just Does The Right Thing (DTRT) concerning multiple network-protocol support, e.g. that the above call results in a socket that accepts connections over both IPv4 and IPv6 if those are supported.

Not just yet, it seems.

Unfortunately this doesn't seem to be the case however: When I run the compiled server.c binary under system-call tracing, I get
    : leinen@momp2[zmq]; strace -e bind ./server
    bind(16, {sa_family=AF_INET, sin_port=htons(5555), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
      C-c C-c: leinen@momp2[zmq]; 
So this is obviously an IPv4-only socket. Still, if I look inside libzmq.a (in GNU Emacs in case you are curious, though nm | grep would also work), I notice that it references getaddrinfo, but not gethostbyname. So there is at least some chance that someone thought of IPv6. Maybe one has to set special options, or maybe the people who built the Ubuntu (or Debian) package forgot to activate IPv6, or whatever?
Looking at the man page of zmq_tcp, ("ØMQ unicast transport using TCP"), it only talks about "IPv4 addresses". The way I interpret this is that they only support IPv4 right now, but at least they don't ignore the existence of IPv6, otherwise they would have just said "IP addresses" and still meant IPv4 only. So there is at least a weak hope that IPv6 could once be supported.
When I briefly had connectivity (because the train stopped at a station with WiFi), I googled for [zmq ipv6] and found a few entries that suggest that there has been some work on this. Maybe the most promising result I got from Google was this:
[zeromq-dev] IPv6 support - Grokbase
http://grokbase.com/t/zeromq/zeromq-dev/118fjjmpmc/ipv6-support
15 Aug 2011 – (5 replies) Hi all, Steven McCoy's IPv6 patches were merged into the master. The change should be completely backward compatible.

So this fairly new. I also noticed that the libzmq in my brand-new Ubuntu 12.04 is only version 2.11.11, and some other Google hits suggest that IPv6 support is planned for ØMQ 3.0. So when I get home I'll check whether I can find ØMQ 3 or better, and use that in preference to the Ubuntu package.

What about multi-transport support?

So we learned that multiple network-layer support (e.g. IPv4 and IPv6) seems possible, and even planned. What about support for multiple different transports? This is certainly not possible using low-level sockets, and it seems difficult, but not impossible, to provide this transparently using a single ZMQ-level socket. For example, a server endpoint could listen requests on both TCP and SCTP connections.
I have a hunch that ØMQ doesn't support this, because the standard interface has only a single scalar socket-type argument. In some sense this is a pity, because having a single socket that supports multiple protocols would make it easier to write programs that live in a multi-protocol world, just like support for multiple network protocols in a single socket makes it easier to write programs that live in a world with multiple such protocols, like IPv4 and IPv6 these days.
Conceptually, ØMQ looks powerful enough to support this without too much pain. It might also be possible to build "Happy Eyeballs" functionality into the library, so that in such multi-protocol contexts, the library could make sure that a reasonably well-performing substrate is always used, so that developers using the library don't have to worry about possible problems when they add multi-protocol support.

Swiss trains go too fast, or Switzerland is too small

So I only got through a whopping two lines of Hello World code for now. But I hope my sidelines and confused thoughts didn't turn you off, and you still agree with me that ØMQ is something worth watching. I sure hope I'll find the time to continue exploring it.

Tuesday, February 07, 2012

Chrome Beta for Android, first impressions

So Google published a beta of Chrome for Android.  It's only available for Android 4.0 "Ice Cream Sandwich", which caused many complaints.  I find this somewhat understandable because Chrome uses fancy graphics, e.g. for the interface of changing between multiple tabs.  What I have a harder time understanding is why they restricted it to a handful of countries in Android Market.  Fortunately the comments on a Google+ post contain hints on how this can be circumvented - thanks, +AJ Stang!
First impressions from using this for a few minutes on a Nexus S: The multiple-tabs feature seems very powerful, and the UI for changing between them seems to work really well on a small mobile device.  Being able to use the Chrome development tools (profiler, DOM inspector etc.) over USB is also quite cool.  It does seem a little slower than the standard Web browser in Android though.  As a heroic experiment on myself I'm making this my default browser for now.

Friday, February 03, 2012

The inexorable growth of bandwidth, or lack thereof

I moved offices recently, so I threw away some old posters.  One of them was an old map of the GÉANT backbone.  On a first look, I was wondering how old it was - the main backbone links were all 10Gb/s, much like today (a few links have been updated to 2-3*10Gb/s or 40Gb/s last year).  To my surprise, the poster was from 2001.  So for ten of the last eleven years, the standard backbone link capacity for large research backbones has stagnated at 10 Gb/s.  (I know other things have changed... these things have become more "hybrid" and stuff.)
In a similar vein, the standard network interface for a standard 1RU or 2RU rackmount server has been Gigabit Ethernet (1Gb/s) in 2001, and it is still GigE in 2012 - although servers generally have at least two and commonly four of them.  You can get servers with 10GE interfaces, but this is not the norm.
The main reason is probably that the upgrades to 10Gb/s or GigE in 2001 were "too early", or based on too optimistic assumptions of future growth.  Anybody remember Sidgmore's law?
But I think that demand has eventually caught up, and that we're on the brink of moving beyond these steps.  For servers, it seems clear that GigE will be replaced by 10GE.  For this to happen, it needs to be on the motherboard, and probably in the twisted-pair variety.  For backbones, the preferred option in most circles seems to be to move to 100GE.  Personally I think that 40GE or even 10GE, in connection with link bundling (as is already done in the commercial world) could be interesting alternative options, also for research networks.

Thursday, September 02, 2010

Traceroute puzzle

Which ISPs appear in the following traceroute, and where do they interconnect?

  1 swiCE3-10GE-1-4.switch.ch (130.59.36.210) 0 msec # local AS: 559
  2 bb1.tor.primus.ca (195.69.145.154) [AS 3549] 108 msec
  3 216.254.129.3 (216.254.129.3) [AS 6407] 108 msec
  4 www.primus.ca (216.254.141.10) [AS 6407] 104 msec