Tuesday, November 18, 2025

Cutting Payload aka Packet Trimming—the CRISPR-Cas9 of Network Flow Control?

One of my tasks in the (backbone) network team at Switch involves identifying and following "trends" outside our bubble that might somehow end up affecting our business. A current trend with possible impact is "AI"—maybe you have heard of it—which today means mostly machine learning using (deep) artificial neural networks, and in particular large language models (LLMs).

There are already noticeable changes in how datacenters are built to accommodate LLMs—check out what OCP has been doing in the past few years. These changes include (datacenter) networks, which must now support the needs of large GPU clusters, in particular for LLM training. To use those high-performance—and high-cost—GPUs effectively, one wants to be able to copy large amounts of data between GPUs anywhere in the cluster, which today might contain 32768 GPUs or so, with some striving up to a scale of millions. These transfers typically use remote memory-to-memory transfer without CPU involvement (RDMA). There are significant efforts to make this work over some sort of standardized Ethernet-based network. This could be done with ROCE (RDMA over Converged Ethernet) (v2), where the "Converged" hints at some modifications beyond traditional Ethernet.

Some techniques were already considered for addition to Ethernet for datacenter networks before the LLM hype wave, for example PFC (priority flow control) to support "lossless" operation of the network for specific subsets of traffic, or novel ECMP (equal-cost multipath) approaches that don't necessarily preserve per-flow ordering (using "spraying" instead).

Recently I was excited when I heard about another technique to be added: packet "trimming". It reminded me of a great (video) presentation by Mark Handley (UCL) from SIGCOMM 2017. I'll add the link later. For now I'll refer to this work as "the NDP paper".

I remember being impressed when I saw the video, and I also found that it was well received. Just having a paper accepted at SIGCOMM is one of the highest achievements for (Internet-affine) network researchers, and this paper was awarded best paper that year. I also remember thinking something along the lines of... "wow, that was an awesome presentation and must have been the talk of the conference... but what are the chances of this being implemented in production networks within my lifetime?". Focused on "classical" Internet backbones and enterprise networks, I thought that this was just a bit too revolutionary/"disruptive" to have much of a chance in our environment, which has shown strong resistance to seemingly much simpler changes (AQM, ECN, making addresses wider...). But 7–8 years later and thanks to the investment craze kicked off by ChatGPT, maybe the time is ripe!

One thing I was wondering about was whether the NDP paper was the first to introduce "trimming". Upon re-watching the video after several years (and understanding a bit more than at previous attempts :-) I caught Mark referring to previous work from Tsinghua as the inspiration for the trimming. A bit of searching turned up this NSDI 2014 presentation of the paper "Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Centers" by Peng Cheng (then a Ph.D. student at Tsinghua University, now at Microsoft Research Asia) et al. It includes a nice introduction where Peng Cheng talks about cooking and his use of the scissors as a multipurpose tool, suggesting that his "cut payload" approach could be similarly broadly useful. (A bit like the CRISPR "gene scissors" in bioengineering today, no?) Maybe Peng Cheng et al.'s work could be considered for a "test of time" award.

So where are we now, in 2025? Looks like concepts from the NSDI 2014 and SIGCOMM 2017 papers (trimming, spraying...) will be added to the Ultra Ethernet standard. It is not clear to me how the trimming capability will be used for congestion management. Maybe not at all in the initial standard?

Thinking a bit further: Assuming that trimming becomes a standard function in data center switches, could we use it in other network contexts? For example, to reduce "bufferbloat" for broadband users, or generally in the wide-area Internet... If people see potential for this, then I guess some work (including standardization work, presumably in the IETF) would be needed to ensure that trimmed packets could safely be sent over the Internet, and on Internet-deployable transport protocols (based on something like NDP?) that can make use of them.

My intuition says that there is some potential there, and if done well, this novel idea could become a useful addition to the Internet.

But one assumption that justifies trimming is that "serialization delay" (i.e. the time it takes to put the bits of a packet on the wire) is significant relative to "propagation time" (the time it takes the packet to travel through the network). This can be true for networks that are relatively small in geographical size—for example within a data center—or for networks that have relatively "slow" (low-bitrate) links; or both, of course! (There is also a dependence on packet size, but for now let's assume that the maximum packet size is similar for all networks, between ~1500 and ~9000 bytes. This is the case in practice today, although it would be interesting to study in how far trimming could make large-MTU networks more viable.)

So assuming that trimming makes sense in a DC network where end-to-end round trips take on the order of a microsecond (100m distance) and link speeds are on the order of 100Gb/s, it should make just as much sense on a network with one millisecond RTT and links of 100Mb/s, or one with 100ms RTT and links of 1Mb/s. OK, such delay/rate combinations aren't frequently seen in classical "research & education networking" anymore these days. But some interesting link technologies used at the network "edge" may sometimes exhibit such low rates—Wi-Fi when the station is far away from the base, or low-energy short/medium-distance wireless.

My gut feeling is that packet trimming may become commonplace in data center networks, provided that the AI "summer" continues for a few years. It may disappear again because the complexities (integrating it with higher-level protocols) may outweigh the gains. But it's encouraging that such "outside-the-box" ideas can gain traction in the market—when the stars align just right.

Oh yeah, here is the promised link to Mark Handley's presentation: Re-architecting datacenter networks and stacks for low latency and high performance, SIGCOMM 2017 proceedings. See the "supplementary material" section for the video.

Sunday, April 16, 2023

Google Code Jam 2023 in Lisp

Since 2009, I have occasionally participated in Google's Code Jam competitions, usually programming in Lisp. Google decided to discontinue it along with their other coding competitions. So this year there was a "Farewell Round" held on a single day (yesterday), actually four rounds (A, B, C, D) held during the same long (four-hour) timeslot. Because I didn't feel very competitive or confident, I mainly tackled the problems from the "A" round, assuming that those were the easiest ones.

Thanks to the generous timeslot, I managed to solve all five problems in Round A.  You can find them on the gcj2023 project on my personal GitLab server.

Colliding Encoding: The task is to check whether there are collisions (duplicates) in a list of word after encoding with a lossy cipher function (mapping 26 alphabetic letters to 10 decimal digits). I map the strings, sort them, and walk through the result comparing neighboring entries—iff there's a match, then there are collisions. Code: a/1.lisp

Illumination Optimization: Given a set of lampposts at positions Xi along a road, how many lightbulbs of illumination radius R are needed to light the entire stretch of road from 0 to M? You go through the lamppost positions in order and skip those that are already illuminated by the previous lamp. My solution (a/2.lisp) tries both directions, in case one is "more optimal" than the other :-)

Rainbow Sort: On first read, I found it a bit confusing, because it talks about colors, card positions, and ordering numbers, all represented as integers; so I put it off until after the others. In the end it turned out to be quite easy. My solution (a/3.lisp) is a simple recursive function using a hash table to store colors that have already been seen.

ASCII Art: I quickly figured out that there must be a solution that involves just computation and no searching or construction. But it took me quite a bit of time (easily more than an hour) and trial/error until I removed all off-by-one errors. The solution (a/4.lisp) ended up simple enough, and solved all sets on the first attempt.

Untie: Here again, I quickly figured out that I can simply ignore all sub-sequences that have no repetitions. For every repeated sequences of length N, you can always fix the tie by flipping ⌊N/2⌋ symbols. I convinced myself that this is always possible without creating additional collision, thanks to the fact that you have two symbols to choose from. The trickiest part is that we're dealing with a circle rather than a sequence with start and end. My approach was to start at a point where the symbol changes—if there is no such point, then this is a special situation of a circular sequence of the total length of the sequence, and in this case actually you need to flip ⌈N/2⌉ symbols. Other than that, my recursive solution (a/5.lisp) looks relatively simple, and solved all cases on the first attempt.

By solving all problem sets in time, I ended up in rank 734 for that round, which made me quite happy! With some (but too little) time remaining, I tried my hand at two problems from more difficult rounds, but didn't manage to solve either of them in time.

Game Sort (Round C): This didn't look that hard, but my code consistently solved only the tiny sample set, and failed at even the small set on all my attempts. Maybe I take my broken code (c/1.lisp) up and try to find and fix what's wrong with it—or probably rather with my initial understanding of the problem and its possible solution.

Spacious Sets (Round B): My first iteration of code solved the small set, but ran into a timeout on the large set. I had to revise it to avoid quadratic complexity. I first thought of "memoization" to optimize away repeated iterations; in the process of implementing that, I discovered a nice linear initial pass (which has to be done in both directions—the problem has some structural similarities with Illumination Optimization above) that allowed the computation for a given pivot element to be done in constant time (adding two pre-computed numbers and 1). To my surprise, even with this probably very close to optimal solution, my program ran out of time! It turned out that I had another O(N²) issue in an innocuous part of the code that simply constructs a mapping from indexes in the original unordered sequence to the position in the sorted sequence that I use for the computation. I had to replace the trivial code using POSITION with my own FAST-POSITION trivially implementing binary search on the sorted output vector. The code (b/3.lisp) doesn't look great, but it works! I only really started after the end of the competition, though; otherwise I might have gotten four points for the small set...

So that was Google Code Jam's Farewell Round! Thanks a lot to Google for organizing those events over twenty years, and especially to the many Googlers for coming up with great problems, ironing them out, and providing tons of challenging data sets! I had a lot of fun—along with quite a bit of frustration; but that goes along with the challenge, I guess.

Thursday, June 10, 2021

AI/ML Hype: If even Google doesn't get it right, then...?

 I'm a happy and frequent user of Google Photos, which I use to back up, share, and edit the many photos that I take on my smartphone.

This morning, the mobile app greeted me with a new feature: It suggested that six of my photos were incorrectly oriented, and offered me to fix that by rotating them.

I checked, and out of those six photos, five were totally fine as they were. (I'm a bit pedantic and usually fix any wrong orientation quickly by hand, which Google Photos makes quite easy.)

The sixth was indeed misoriented, but the rotation suggested by the tool was also wrong.

First I laughed. If Google—of all people!—sends me twelve predictions via a highly popular app, and gets eleven of those wrong, how can anybody really expect that Artificial Intelligence/Machine Learning will solve real problems and eventually pay back all the investments being made in it? And note that the problem class here is basically image classification, which is one of the few narrow domains where ML has been particularly effective.

And then I dutifully corrected Google's mispredictions by rotating the proposed images until they were correct (correct again, in five of the six cases). So maybe if a few million other nice/gullible people help poor Google out, then maybe one day, it will actually become helpful even to pedants like me.

This suddenly recalled a question Prof. Mireille Hildebrandt asked at a large EU event last week ("Leading the Digital Decade", video):

What is a digitally skilled population? Is it a population capable of using AI and other digital systems? That's the usual way to talk about citizens: users. Or is it, perhaps, a population that is open to be used by these systems as data engines?

Sunday, May 03, 2020

Google Code Jam 2020 in Lisp—Round 1C

Well, that was slightly embarrassing and frustrating...

After a relatively promising attempt at round 1B, I entered round 1C yesterday as the last chance to advance to round 2. Again, three nice problems, with the third one looking too hard for me at first sight, at least for the general solution. My "plan" after reading the problem descriptions was roughly: Solve problems 1 and 2 and, time permitting, submit a solution for the easy first input set of problem 3. That would have been a good plan, except I took a bit too long to code problem 1, and I got completely stuck trying to debug my Lisp solution for problem 2.

Problem 1: Overexcited Fan

The problem turned out to be even easier than I first thought. I should have spent more time designing the algorithm before starting to code it. I had an OK solution, but it was more complex than necessary and took me 53 minutes. That's much longer than it should have, given the overall time limit of 2 hours 30 for the three tasks. After the round, I simplified this to a more elegant solution.

Problem 2: Overrandomized

Here I had the correct inspiration to work with letter frequencies, in particular first-digit letter frequencies. Unfortunately my Lisp solution would always result in "RE" (runtime error) when submitted, even though it ran fine both against the sample data set and against some additional data sets that I have generated. At this point I ran out of time.
After the round, I tried for some time to get my Lisp program working, but without any possibility to get diagnostic output, I didn't find where the problem lied. I even started from scratch using a different implementation technique, but the result was the same: My program would run fine against the sample and my self-generated tests, but throw "RE" when faced with the competition/training data. Very frustrating. If anyone finds the error(s), please let me know!
At one point I reimplemented my solution in C++, and it successfully solved all data sets as soon as I got it to compile. I then "backported" it to C, which simplified it further (in the C++ version I used both "traditional" arrays and "library" vectors—for sorting—which makes that version quite ugly.

Problem 3: Oversized Pancake Choppers

After reading the problem description, I assumed (probably correctly) that a full solution would beyond my capabilities for solving, at least under this time pressure. But the easy set seemed quite tractable: If there are only 2 or 3 diners, then the set of solution possibilities is quite bounded. Indeed it took me only (competitive coders can laugh now) about 15 minutes to write a stupid solution that worked for this limited case. But that's academic because I had run out of time in problem 2, so this happened after the round was over.

Conclusion

With only the first and easiest problem solved in time, I ranked in the 8000s, much below my rank in 1B—and hopelessly below the cut-off at 1500.
In problem 1 and (my constrained/stupid solution for) problem 3 I had no problems with Common Lisp as a coding language. But I failed to get my problem 2 Lisp solution working, while later attempts in C++ and C worked right away (and showed that I had analyzed the problem correctly).

Sunday, April 19, 2020

Google Code Jam 2020 in Lisp—Round 1B

Although I had quite a bit of trouble surviving this year's qualification round, I was allowed to participate in online round 1, which, as usual, is held as three different sub-rounds to accommodate participants from various time zones.  The first (1A) was in the middle of the night in my time zone, so I skipped it and didn't even get around to reading the problems.  But today I attempted  round 1B.  The problems were again very cute and interesting, and while I struggled with the limited time, I ended up being quite satisfied, even though I missed qualification this time.

Problem 1: Expogo

I wrote a relatively simple implementation using search with memoization (caching) and a bit of pruning of implausible search regions, and that was sufficient for all three data sets.  According to the after-round summary, there's also a constructive solution.  I had suspected so, but couldn't find it.

Problem 2: Blindfolded Bullseye

This is a nice search problem that involves some geometry.  I think I started with a reasonable search strategy, but then ran out of time when it came to doing the geometry... So I gave up trying to find a "good" solution, and coded the trivial brute-force solution for solving the easiest data set.  That was worth only 3 points, but those 3 points improved my final ranking by more than 1'000.

Problem 3: Join the Ranks

This didn't look too impossible, but given the high amounts of points awarded for this, I suspected that this would be beyond my skills and decided to focus on the other two problems.

Conclusion

With a full solution for problem 1, a half(?)-done good solution for problem 2, and the few extra points for the trivial data set of problem 2, I finally ranked #1996 [edited for final result].  As only the first 1500 contestants qualify for round 2, this was not sufficient.  But I'm quite happy with how it went, and am looking forward to round 1C, about two weeks from now.  And unlike during the qualification round, I didn't run into any issues with coding in Lisp—even though I haven't been using it professionally for several years now.