About the talk
Multi-field packet classification is a crucial component in modern software-defined data center networks. To achieve high throughput and low latency, state-of-the-art algorithms strive to fit the rule lookup data structures into on-die caches; however, they do not scale well with the number of rules.
We present a novel approach, NuevoMatch, which improves the memory scaling of existing methods. A new data structure, Range Query Recursive Model Index (RQ-RMI), is the key component that enables NuevoMatch to replace most of the accesses to main memory with model inference computations. We describe an efficient training algorithm that guarantees the correctness of the RQ-RMI-based classification. The use of RQ-RMI allows the rules to be compressed into model weights that fit into the hardware cache. Further, it takes advantage of the growing support for fast neural network processing in modern CPUs, such as wide vector instructions, achieving a rate of tens of nanoseconds per lookup.
Our evaluation using 500K multi-field rules from the standard ClassBench benchmark shows a geometric mean compression factor of 4.9x, 8x, and 82x, and average performance improvement of 2.4x, 2.6x, and 1.6x in throughput compared to CutSplit, NeuroCuts, and TupleMerge, all state-of-the-art algorithms.
00:11 A Brief about Packet Classification (2/3)
00:52 Hardware vs. Software
01:21 The Problem (2/2)
02:04 Large Rule Sets Spill Out of CPU Cache
06:03 Can We Fit in L2 Cache? (1/2)
06:39 Isn't Neural Network Inference Slow?
06:57 What About Updates? (2/2)
07:39 Evaluation End-to-end performance
Computational approach to packet classification. My name is a joint work with stains from the technion-israel. Network functions such as Access Control quality of service packet, folding fouls in mini more classified pockets in everyday life and decide what to do with it. According to set of rules it with several films such as source and destination, IP, addresses source, and destination, IP protocol, and more, for instance, OpenFlow 1.4 support, matching up to 41, distinct heatherfield.
Each field can be exact match or had wild-card such as long as prefects own America ranges, Rules May overlap. So the classifieds to let his role with the highest priority. Pacification can be performed in both hardware and software Hardware, classify as a possible. Thanks to Dedicated component named Turner accountants, addressable memory or tkam, which is the power-hungry asetek to perform specific a shin in a single cycle. We focus on the alternative soft to classify the tooth commodity Hardware, instead of a dedicated, the middle box, cuz if occasionally arboretum's
usually take part in Virtual switches, So let's look on all visual switch. We gave it some set of rules and now it's busy classifying Pockets. Everything is okay. As long as there is only a small amount of rules when we try to any rules. We get a dropping throughput and it doesn't matter which classification Arboretum water using or experienced the same phenomenon. Large rule said spill out of CPU cocash causing expensive. L3 cache offices, just to understand the impact. L3 cache access time is ten times slower than that of L1 cache. This is devastating
for packet. Certification, you may ask yourself how bad is, it will let Luke and real example. Couple marriage is a state-of-the-art algorithm that uses hash tables for pacification. It requires the same amount of computation regardless of the number of rules that holds the only thing that changes is its memory footprint which you can see on the left note that loud classified meaning 100,000 rules and more spill out of L2 cache on the right. You see the throughput of the same classified. By the way these experiments were performed using the standard class
bench Benchmark for pocket, classification of the largest classify which is the right most point in dried draft. It is twice as low as the smallest classify. The reason is the slowest time of the L3 cache. We make the following observations on pocket cuz if acacian the rule space is boss, meaning rules cover, only a tiny fraction out of all possibilities. In all Fields, L3 cache and Deidra Marcus has all the major bottleneck, is there not integrated in the CPU core and hard work friends will communicate with elevators such as wide.
Victor processing engines that make them to tations much faster. Makes us wonder whether pocket classification made benefit from Trading memory exercises for computations. Classification in the presence of wild card is similar to range value look up in which we search for the range that contains an input here. For example, the input 175 is contained within the highlighted range of 150 to 200 which corresponds to the Value City by storing the Rangers in the continued stable, we can represent them in their office in table as a function, then use
the model to approximate the function. We do that using the data structure called range, query, recursive model, index or a que hora Mi for short, water line, Motorsports fast, and efficient, train value, lookups They consist of multiple neural networks in a staged fashion, only a single network is evaluated. First stage, all networks solve the same regression problem, giving an input Detroit to approximate the matching range that contains X. The output of internal stages are used to choose a network. In the following stage that has a better chance to be
successful. What's an example of how a q are my works. We have a set of rainbow topaz and say we trained in all that you are my mother for approximating them. Now, we wish to find the corresponding value for the input 801. We feed it to the motor and get an approximation for the memory off. Debt at hat. X equals 103, say we know that approximation are always bounded by Epsilon equals one. So, we go to the table and performance, second research over the Rangers around the motor approximation, More details about the out, you are my training, 60 and
Correctional Spruce available in the paper. We also show, how to handle multi-dimensional rules with multiple Fields return. Listen, to pocket, education is Nuevo match. Its trades, memory access for computations using all QR my models. They stole between 70 to 100% of the rules while the rest is Cactus remainder in classified using any existing picnic. Will elaborate more on how we choose, which rules get into the northern and which get into the remainder in the paper. Because if occasionally is performed using real-time neural network inference,
the remainder is evaluated only when we can't find the measuring rule using the r q online mode. In the end with the both of you are my models and the remainder will always fit in the L2 cocash. Let's see it in action. He off, cut split Newark at in table match. All state-of-the-art techniques for pocket pacification. We measure the memory footprint using the standard class bench. Benchmark packet. Classification almost all methods spill out of L2 cache full out classifieds Compared to them, which
hardly reaches the bounds of L2 cache. We're and we're going to three times and each time we use a different method for the remainder, the rate of all represent out, you are my mother's the orange, represent the remainder. By combining small and shallow neural networks with wide Vector. Instructions. We can evaluate an hour to all my mother within 40 to 59 seconds. Heading the secondary research and the remainder the into and classification. Licensee varies between 100 to 300 nanoseconds What supports updates? We can easily remove rules from all
to all my morals and insert new ones into the remainder. After passing the threshold in the novel of new rules, we retrain the model or experiment, show the training with tens of law, takes about 10 to 40 minutes on CPU. However, in our ongoing work, your shift running time of between 1 to 20 seconds on the CPU using native implementation more information on wiggle match such as its scalability to the number of pacification field, musico evaluation. And then early termination optimization are described in the paper. Let's talk about evaluation
We experiment ends, when performance in detailed analysis of various aspects in this video will focus only on the end-to-end performance evaluation using traffic with uniform, temporal causality. Real World classifieds such as openvswitch inflow Kesha's to catch the temporal locality of the traffic and invoke the packet. Classify only upon cash. Mrs. We stimulate this Behavior. By testing new iPhone x use increases with uniform temporal locality. We use class managed to generate 12. Distinct process with 500,000 rules and compare against three. State-of-the-art techniques
in all routes at Nuevo, magic seeds are other methods. The GM on the right stands for Dramatical mean know that Tom has a missing. These are no. Classified. That is not converge. To conclude that introduces a new point in the design space of pocket certification, algorithms and opens up new ways to scale it on commodity processors. These impossible things to argue or am I more those that support fast? Look up of range value is it is important to understand that normal much is not operational in a real system just yet. This is the goal of our ongoing War,
more details are available in the longer version of this presentation and in the paper, thank you. All right, time zone for an awesome talk. So why we're waiting for questions over the unit to look, as well as like? Let me on some of the questions. I have machine learning based approach system that has have hard time, trusting machine learning solutions to play or I don't trust that these will perform well in the presence that excites. So what steps will you take to establish that place to live in a world? Like maybe there are some let you know I have proposed and a paper
and have this deployed in Russian settings. And this is a good question. It is important to understand that the machine learning part in Nuevo match, is, is no, the bottleneck. And it's not the cause of any problems because it's part of the CPU. We ran run the machine learning about using wide vegetable processing engines, which are available in almost any more than you. So Network, operators doesn't need don't need anything else. Okay. Cool price? I have a question from, you don't care.
And his question is, is Amulet of model structure. Fixed? That is the number of layers to notice that you're using. Or does it change based on the traffic at acoustics? This is also a good question. As a matter of fact that structure is fixed and creating a dynamic structure, is the goal of our ongoing work. Okay, so I think I have one more question, which is my question that this aren't you? Out of my tools that you're using those papers exciting? So I can see that a lot of other networking problems
with this can be used. So what is your view on what other networking problems where this tool can be useful? Let me repeat the question. You're asking whether all QR my can be used on different applications in different applications different. This is a very good question. We're currently thinking about it in current Network applications, we can use for my we were thinking about implementing its using girl, that combined with sketches to collect Network information. We don't have any specific goal just yet were in the process of thinking.
One more question and this is from home, and his question is, how long does insertion of training takes compared to adding rosin to the table? Yes. So using tensorflow, training takes a long time but in our ongoing work to achieve fast, super fast training, even at stop II training for the whole motor. So compared to a ruler in insertion. You can say, some maybe three or four orders orders of magnitude longer that we can use a remainder that support fast insertions while we train.
So it's kind of a buffering well, with training and then we do a switch on the train and motor. Okay. See I think I have, I guess somebody else is asking, but yeah, okay, there is a question. So I think that the doc said and then I think that the stock said that the rules can be added to reminder for updates. Does this mean that the added rules cannot override any previously existing rules or is the remainder always searched? Can you ask a question again.
Override any previously existing rules? They can override their there isn't a problem you when you insert new rule that overrides any and I previously had find the role you just invalidates at the previous roles and you can do it using. You don't need to retrain the model in order to do so. All right, so I'm going to lay down some. That's a long arm, I think it's overcooked and the Equator. So you show the difference in time between SFO and night version on the, why is that? Why is it
And that's a good question. I think it's because Dental flow is not optimized to small neural networks. The native implementation is just at Sea cold. So, when we're on using her native c-19, implementation is just faster. I can explain it really Are there any cases where there is pacification by tomorrow? Oh this is a very good that question. We proved that correctness of our method are so the classified will always exceed know mr. I have a following question. So you mentioned using machine learning model, I'm so
we always have some probability that the motor is not working. So why are you saying that? You know, it's kind of dirty. Yes. So the motor is probably stick but we have a, we have guarantees on its Arrow. So we call that the process is composed of a motor, neuron Network in France, and then a secondary research and the second research eliminates all and all the arrow caused by the mall. But actually I have a question. So it seems that your job is in the background that you're doing this. So
what do you want to do? Is to YouTube get more cash so it's if you can answer. So I figured all the people that happened. So will there be any kind of you and your actions? Yes, I'm going walk. We we wish to implement our method in smartnic. So as long as them for Cecil's on Smart next support, a wide Vector instructions. They can run with a match.
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.