About the talk
This paper presents Hoyan-- the first reported large scale deployment of configuration verification in a global-scale wide area network (WAN). Hoyan has been running in production for more than two years and is currently used for all critical configuration auditing and updates on the WAN. We highlight our innovative designs and real-life experience to make Hoyan accurate and scalable in practice. For accuracy under the inconsistencies of devices' vendor-specific behaviors (VSBs), Hoyan continuously discovers the flaws in device behavior models, thus aiding the operators in fixing the models. For scalability to verify our global WAN, Hoyan introduces a "global-simulation & local formal-modeling" strategy to model uncertainties in small scales and perform aggressive pruning of possibilities during the protocol simulations. Hoyan achieves near-100% verification accuracy after it detected and fixed O(10) VSBs on our WAN. Hoyan has prevented many potential service failures resulting from misconfiguration and reduced the failure rate of updates of our WAN by more than half in 2019.
00:21 Alibaba's infrastructures
00:50 Monkey King
01:36 Technical challenges
05:20 Key insight
08:20 Deployment experience
Hello everyone. Today, when I was an intern, that's one of the largest service providers has a structure to serve over 1 billion customers. What do I play online services? Such as cloud, service you Commerce. You payment online map sensor on a diesel option for Reuters and Archie prefix is Santa Cruz. To prevent the incidents resulting from natural me standing ovation. You see, last year, we have presented our Sao perfect vision work. Today I'm going to talk about for you, the routing configuration verification system,
Beautiful and system requires us to address to technical challenges. The first try to reason about the properties about global one week or the rich ability on a federal case, ask a fatal property, which means the first example should be one because it's the link between DND 5e, the regional Rehabilitation no longer hold in the second example to these tuning spell the rich ability is violated However, reasoning about the Casey the property in your wife's
heart first. Is it in simulation place to vacation approaches, not need to enumerate. All possible cases, which has a capacity second, the state of our former. Because our wise, not to mention call the existing reduction and certification approaches neck bones are, do not have neither The Second Challenge is the network model may not accurately represent the real Network behaviors that you sent home. We have full Reuters, router, 1/3 and 4. I provided that vendor a r b supposed to Perfect 10
to 3 already exist except photography and generous the routing table. Unbeknownst to us from different vendor from R134. It has a latent Behavior. In this case, 250 seats arrived. I want as the community to them and sends them to our to our, to talk to the community, then comes out 303 only accept it when you eventually, you get a different routing table. So then that's nothing to behaviors that. I'm beginning of RC system can detect the behaviors. Spo2 offer configuration verification to
R1 and address is about two challenges to use for your system. The operator's needs a verification through the system generates unnatural behavior model. The group has called configuration verify which focuses on reasoning about the configuration correctness in Galloway. In order to ensure the correctness of narrow behavioral model. We also equipped with behavior model validator, which is responsible for ensuring the accuracy of generated behavior model. The first show you, how do I adjust the scalability challenge? To adjust the scalability
challenge or inside, includes the following simulation. So we propose a combination solution name. The global simulation and local modeling. Our approach runs over. What? You sure. Don't know. The network model is all possible running conditions that are transmitted along with the simulation process. To avoid that. We also in techniques, find the formulas We chose that something's worth containing 80 Reuters. Now, I want to compare appointment with the state of our station, my sweeper, actually this table will
try to reason about the Kaiser Torrance property. let me show you how to change the second challenge with your device, Behavior, production Network, as a reference for assimilation, A strong solution is what's the difference between real and simulated routing table has to talk to you, say, Aquos routing table tomorrow to Norbeck faces the path on to your router. Who's the author e in this example. The band has specific behavior and should be able to receive US
policy or are forcing breath policy. Unfortunately, using normal Reeves cannot localize the correct Modesto City behaviors in practice for the reference. That's all I want to know. Are you say satinder, Reeves rather than normal Reeves? With extended Reeves tomorrow to know, I can't correct meditech. Arthur is extended routing tables between Rio and simulation a different. So the vendors list behavior, actually exist, Armada tuner detected, as bsp, use my phone, including
default ACL. I show you this picture and I'll tell you is all Moto tuner on a six-month security up for you and a Happy New Year at 100%. what year has been using Alibaba still go on for about two years, the overall rate of update regarding appointment During the two years in detecting, many potential, configuration arrows, hunting on Network that has IP address conflict. And then read on the ocean, In the last year. What year was also used to check on configuration of their plans before they are committed to the network? Let me clarify the cup
up. It is called before you. What kind, of course that's why I support tv g p r, o, s p. F Distribution on a cell replication arawan. What network probably is California risen about. According to our operation means of properties, including the ability passed consistency. In conclusion. Why is the first ever reported practical configuration verification system coverage and has me using iCloud for 2 years? This is Grandma talk. I'm happy to take any questions. Thanks. Thank you for a very nice talk.
Let's wait for questions on the slack. In the meanwhile, maybe our stocks was one question and then. So the question is, how does your design play true, the rip sizes? Like, would you see any performance problems with different? You're always super large, scale, ribs. We actually do with a simulation for each specific, a prefix prefix in a parallel way our system going to code. Efficiently handling, all the cases rather than just a dry cleaner in tire rims from our daughters, that will make it. So the next question, from a digital travel from Wisconsin,
a r c panel password like what are the unique about Alibaba before me? The paper where you really want you to be quiet that you wasn't on that time. So I will have to try to reproduce. After the paper got accepted, I'm sorry. Yes, I think that's a great way to work. The technical difference, might be. So, designed based on the graph, a graph ways to organize something like that. I even got a cold. So that's a key reason. So we're going to try that because I, I noticed it said it's open now,
Chinese food in the future for a question. 90 some old confessions, I would rather sexy sometimes and sometimes in a way we want to hear you say that I sent a phone number or something other than our former based approach to borrow yd. You know, sometimes it's going to be at work. So you have to propose a system of system yourself a problem, but I guess not since, you know, the address is. So for our location and we're thinking about, I really hope to see you no more work in that direction because it's really, really hard for them to write this message.
How they do not want you and Dad have to provide a way for them to write. Thank you for. Let me see a comprehensive. The first question. So the idea of using a model tuner to to help sort through the vendor specific behavior. Seems pretty cool call automated today and see how much does the person driving the auto tuner? Need to provide input in order to say expand the the running tables and interpret the traces that come back every question. So this is very important thing for
all the children are you running to time. Will you check the power Dalton table together to compare the difference between the simulated, the routing table and they're rerouting? Table is send a report to our operators? The process is automatic so that means they automatically Run Track. Smiley smiley writes. The patch for the runner specific behavior to speak last week at so far to fix the one. Where is the Canada fixing a genetic in the motor? Oh, so you're not case everything. Stop becomes automatic.
Great. I see, I don't see any of the questions on slack. So, folks, are asking and being shy and we have time for one more question before we have to sign off. If not I will ask one. I'm so also this way of partitioning the work between sort of a simulation based approach and Analysis based approach seems very pregnant. Is that also should have fully automated Oregon. Is there a human driver who has to guide the tool down those paths now? Wait, wait, begins.
One quick question then, and then we really do need them. So what's, what's next for you? And it sounds like you have, you know, two years of operational experience and gray performance. More properties. Richard properties properties. The like I want, you know, what chocolate is ability for some traffic around or not? Well, right, traffic channel, is a way you want to, and I kind of say it is able to chat some problems resulting from The Box by the former education system yourself. He cannot detect the box. So where you want to come by there? Hawaiian, BBQ San Marcos
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.