About the talk
Keynote Panel: Why GitOps? - Tracy Ragan, DeployHub; Dan Garfield, Codefresh; Cornelia Davis, Weaveworks; Moderated by Dan Lorenc, Google
GitOps, Infrastructure as Code, and the DevOps pipeline, how do they fit together and how are they different? Join this amazing panel of industry leaders including Cornelia Davis, Weaveworks, Tracy Ragan, DeployHub, and Dan Garfield of Codefresh, moderated by Dan Lorenc of Google and the CDF TOC chair. Hear how they define GitOps and where it fits into CI/CD. They will explore the beginnings of GitOps and discuss how it will mature as more companies begin their journey into operations by pull request. Learn if GitOps will work for your organization and understand the challenges before you make the move.
For more Continuous Delivery Foundation content, check out our blog: https://cd.foundation/blog/
I am a technology leader and full-stack engineer specialized in evangelizing containers, Kubernetes, Helm, Istio, and related technologies. As the Chief Technology Evangelist at Codefresh, I lead communication, marketing, and forward-thinking technology initiatives. As an evangelist at Codefresh and Google Developer Expert, I've presented at Kubecon, Google Cloud Summit, DeveloperWeek, Meetups, and more. I build my own demos :)View the profile
Tracy is CEO and Co-Founder of DeployHub. She is expert in configuration management practices, microservice management, continuous integration, continuous delivery and continuous deployment. She currently serves as a board member of the Continuous Delivery Foundation (Linux Foundation) where she is the General Member Representative. Tracy began her odyssey in the area of configuration management as a consultant to Wall Street firms building out ways to improve the software build & deploy process through automation (now called DevOps). Having begun her programming career as a mainframe developer, she quickly recognized how the new ‘distributed’ datacenter lacked the build and deploy processes that were standard on mainframe systems. In 1995, she co-founded OpenMake Software, a private software company that specializes in the automation of cross platform, and accelerated software builds (compile/link standards). OpenMake Software has maintained their customers for over 20 years. Tracy’s innovation in the area of life cycle management was recognized by IBM, leading her to a board position on the newly formed Eclipse Foundation where she served for 5 years. In 2014, Tracy and her team recognized the shift in pipeline management that the new container based architecture would require. This led to the formation of DeployHub, a new product and company focused on bringing microservice configuration mapping, sharing and deployment to high performing development teams.View the profile
Hello, everyone. And welcome. We are doing a panel today on get-ups. The title is why I get upset with your topics related to the practice of the cops and the motivations for it. So, my name is a share of the clerestory foundation Toc. And I'm a software engineer at Google, like a bunch of awesome panelist years. I'm going to let you should do some cells, Dan Garfield you on your first. Sure. I understand Garfield. I am the chief Open start Stop store at code fresh. I'm also one of the chairs and one of the founders of, to get off working group. So excited to chat about giraffes.
Alright, Tracy right now. I am Tracy v a c u l e s. That is an incubating project at the CDF and I'm super excited about you like many people. Awesome, and Cornelia. I am the CTO at weaveworks weaveworks refers to itself as the gas company. And so, I think we'll talk a little bit more about where that came from and just a moment. My background is, I've been in the industry for about 30 years. I've been working on developer platform for about the last 10 years, prior to weave workz, which I've been here for almost a year-and-a-half. I was at
pivotal where I first worked on cloud Foundry. And then later on worked on kubernetes based developer platforms, but always from the focus of developer enable developer self-serve an author. I wrote a book called cloud-native patterns, which is targeted at the application, developer an architect. And I'm part of the cmcf, technical oversight committee as well. Awesome all strapped in right off of that one. That's when we works is known as the getups company. Coined that term originally came from. Yeah. Sure. So quick quick, history
lesson. So yes, I do believe, I don't think anybody's ever liked it like suggested otherwise, but Maya found her. One of my co founder and CEO Alexis Richardson coined the term. And the way the story goes. This is before my time at wework, swedeworks was not originally I get off, company was originally around, you know, networking for container-based environments, and things like that went through a number of evolutions and it had an observer ability project. It was the founder of Cortex as well, which is another CNC of project. And we were running that as a service. And
long story short. We had an outage and we were able to recover from that outage. It was a complete total outage of the fast service. We were able to recover from that. In about 45 minutes, and the reason we were able to recover from that was because of what turned out to be these getups principles that we'll talk about throughout the conversation. We had a complete representation of the desired state of the system and we had the right kind of automation that could make it. So and so all we need to do is point that to new infrastructure and it came back up and that was a big, aha moment. It
was that our Engineers that were running. The South Service had put into place these practices because they saw value in them and found value in them. And it was that aha moment where we realized that this combination of get with reconciling automation. With convergent automation, really had tremendous power and that's where get UPS came from. And because it's a little bit of a misnomer because the Ops part is just important. As we get, we tend to over rotate on the part of get off. But that, that's where the term came from. Trust me, here came from a variety of potential
disaster. I was just there myself last week when it was in a real production system, but I ran through control delete on the wrong cluster and had to scramble and had a moment of panic at The find out what cost drivers in. And I've got everything restored cuz it was all in a man get repository. Tracy over your first experiences with get off. Patricia Grand. So I've just begun seeing it up sort of show up over the course of the last few years, but my first kind of exposure to this kind of Technology
really came from infrastructure as code. When we started playing with her spitting up environments, and having a an operator that lived outside of a cluster and was able to Define your infrastructure really, really quickly. And I'm for small company. We found that to be extremely helpful because having somebody sit and do things in a very declarative way or imperative way, was not an option. Reading that original blog post by Alexis. I think it was called operations by Paul request was the first one, where he talks about it in may put it in a
relay think clearer way. I mean we've been doing this stuff for years, right? We've been throwing our stuff and get infrastructure as code has been huge. We've been following all these patterns, you know, consistently and so having it kind of put packaged in that way. Worth it all felt like it was part of one specific Paradigm that was really valuable. And so we started building on top of that and figuring out like, how do you do this with like a canary released? And how do we do this? You know, it's scale. Like, what does that look like? And I've kind of
his part of the trip. Had some discussions with people who are have been in the industry, you know, long time and I've been doing similar things for a long time and they kind of came out and we're like Ouu, young upstarts with your with your fancy words, as you know, we've been doing this a long time. I invented this 15 years ago. Where is your talk about get off? Cuz if it's a new thing and so I thought a lot about that and what the differences and I think that people like Henry Ford wasn't the first person to invent the car, you know, there are a bunch of people that
have been to the car. There were there, they built cars but what he did was he built an assembly line and made it. So the cars were accessible to everybody and I think that what we're get off started and we're now it's going to specially with the work that's happening. Is to make it accessible and repeatable an easy for everybody to do. And so that stage and that's why we're talking about it today is because get off. She's getting to a point and the tools are getting to a point where it's easily accessible by everybody and it's not just you know, it won't. You don't hear some principles
and you've got a road map and and get 5 Engineers together and you go figure out how to implement all the stuff, you know, the really it's really getting streamlined into something straightforward and the tools of a train to a place where everybody can use it. So I think you were getting charged with being a hipster. You didn't use the term but like you're being charged with being a hipster and SVN 15 years ago. Guess what? I think one of the other big Trend we're seeing eye Foundation is people trying to integrate and get off seeing to a CD pipeline. Is that
the only way to get off so you get off to the nearest delivery or is there kind of benefits of combining them Stars, you Tracy? So definitions are important, right? If you think about what get-ups is and in relation to the continuous delivery pipeline. It's really The Continuous deployment phase of the continuous delivery Pipeline and it is an amazing way to get to continuous deployment and in Garfield analogy of Henry, Ford assembly line. We are, we
are doing what we need to do to create a repeatable process. And that's the heart of having a operator and being able to, to check everything into a get repository and let the getups operator reference to the repository to determine what the state is. But it's not a continuous delivery. It isn't, it's not orchestrating testing. It's not orchestrating a CI bill. That's not doing any of that. It's doing what it needs to do to make it. To make sure that we have a true repeatable, continuous deployment process, like that. The deployment at
Dev is done the same as test as the same as prod, we have not achieved as some continuous delivery. Really, some people have in their tools out that I was making that happen, but in terms of really creating a repeatable process, I think they get the idea either. I get apps has the most promise to get us to a continuous deployment state. But it's not continuous delivery and there is an you'll hear me talking about this all the time. One of the challenges I think of get off is that it does not fit well into a CD pipeline
because you have a human that needs to create a portal Quest. So we have to address that in the CD pipeline. We have to address the number of Yama files that are required to maintain a it get Ops model. I talked about it being as killing a human stealing problem. Not a cluster stealing problem cuz you can scale it as big as you want, but we do have to look at how it's going to fit into the CD Pipeline and its discussion that we should be having as companies begin to embrace this, kind of technology and understand that. It is a doorway for a very consistent
continuous deployment method. Yeah, I mean, if I could, I'd love to interject and add a little bit to that. There's so much to unpack, set an awful lot there. But the way I sometimes think of it as I think of what you're referring to is continuous delivery. And again, I think that there isn't one way that people use the term continuous delivery. But in the way that you were using the term, I think of the kind of continuous delivery pipeline as this horizontal thing that says, all right. I've got various stages of my software delay delivery life cycle. I've got my damn stage, my testing stage
my user acceptance, testing my staging in the, my production and you're controlling a great deal of the gates that need to be gone through the things that need to be satisfied. The conditions that need to be met as you go through that. And I think what you're saying Tracy is that within one of those then we have a consistent way, you know the kind of going downward. We have a consistent way across all of these different stages to then realize things into that environment. And that's where we can consistently. Applied these getups practices. Of course, some people consider
taking things from, get out into the interface, a kubernetes as part of the delivery process. So I don't think we can work through kind of bet that definitional thing here on this panel, but you're right then that goes down there. But the last thing that I want to say is that there is in my mind. No such thing as the continuous deployment operator, the Viga tops operator because get off this is really about not only the that watching it. And then bringing things into let's say the Cooper Nettie's store, the STD
store, but it's also how do you relate that to the actual running state of the things that you have expressed in that in that get repository? So there you have to link it together with these other operators that are making it. So and so there's this kind of pair between, I've got to get things into my runtime environment, and then I've got operators that operate on that not until it's a running pod. If you're doing containers is actually in operation. So get off Sue's about managing that it's not just about to get to kubernetes. And the
thing that I think Danube you brought up earlier, was that you accidentally had this issue, where you had, deleted files in a production cluster, right? And not to pick on you, but from a I think from a standpoint and the point of get off. So I don't think is to be a gatekeeper like it's hard to say like you're not doing your job. So I'm going to beat you over the head with it, but it's maybe a spectrum of things that you adopt and you get more. Get off soon as you go along. But ideally, you know, you don't have people with connections to production cluster
like after it's set up maybe even ever. I mean, ideally every single operation is happening through get. And so I think we were going to get to that point, CDs Camp point where no no, we don't we don't have keys to production. We talking about like a laptops like that's crazy. Like what if their laptops? You know, if I say something now see what's going on in years ago, a company. I am familiar with had an issue where a developer's laptop got hacked and they had access to thousands of client websites and suddenly all
those client websites had basically were spamming and and cookie injecting on people and spying on people because they dump it is developers laptop in. Like, that's the kind of stuff. We're for a security perspective. We can, we can eliminate all that stuff by following this process and relying on it. And I'm not saying that we're a hundred percent there yet. I think we're getting close. But, you know, there are certain situations where you might need to escalate, but ideally, you are able to accomplish and get to Continuous delivery with Guetta
because it is a pattern that lets you do it and so many deployments that they're having that, I mean, what percentage of deployments are happening. And it's shockingly low very, very low compared to what we'd like to be. So this is, I think, supercritical, for the productivity and Sanity of everybody, everybody present way that delete case that you just talked about. There is something that get off the livers on today. If you had your system setup is get off. When you did that, delete restored it for you. You do your operator would have picked it up
and then restore it back to what's in, get get is the source of Truth. So we would have had to scramble to get up with a scrambled for you. I really like that term and the operator term we talked about a bunch of times because he really capsule. It's not the manual tasks that human operator would have been doing in the past that clean. My case was no production system. Nobody is. So enough to let me near one at all. In any particular, my laptop you started to touch
on some other important things in the news, a lot lately. Security. You think you get off so important role here and improving security of people do things out the production. Yeah, I think that it's critical and security is often thought of as things that we keep out of production, but security is just as much about getting things into production because the software that you have deployed in production right now, has vulnerability. I mean, it's statistically a certain that you have vulnerabilities
in fractions that you don't know about. Maybe nobody knows about my daily. Nobody knows, right. But his patches come out to the software that you're relying on. You don't want to get that past really quick. Right? And the best way to do that is if you have if you're using get option, you're using a standard, you know, protocol for Rolling things out, you're going to be able to easily commit. Those changes, get him stage, get him into correction. You don't seconds or minutes rather than hours or days a week. So 6:30 is about deploying faster. If you want to be more stable and more
secure, you're going to Find a way to play faster for sure and it's also an audit ability. Going back on the other end where drift attention is a key element here. If somebody is able to gain access to your environment and start making changes, while with drug detection. They're going to be just constantly getting rebooted out because it's going to be noticed that the state is not the one that's desired to find a did not do what you got to go patch that hole. So they can't keep getting in but but it's super super critical from a security standpoint and to be to be effective from the
security standpoint. You have to be effective from the productivity standpoint, which is great cuz that means you it's a win-win if you can figure it out. I think we all have to keep in mind as we were thinking about what get Ops is that we are removing the human element. You're not tweaking your production or your testing or your development environment by command line or some other kind of interactive experience because you're defining a state and a state is being managed by get. So as Cornelia pointed out, if you tweak it manually,
The getups Operators, going to say yeah, that's how the saying. I'm going to bring it back to the state that's defined. So by doing that, you eliminate the external hacker, that is tweaking things. They shouldn't be, even if they are somebody who works for the company. This is this is one of the cultural things will change. There's so many people, I talk to them. Like, I'm so used to using the command line. I'm so used to doing this manually. How do I, how do I let go of that control? And that is that's what is required when you move from something that's imperative
to something that's declarative. And that's what shift it that we're going to be seeing not only end in This getup's movement. We're going to speak. And seeing that shift in many places, the more we can Define it and state and then just manage a, some kind of declarative script to Define that state. More careful. We will be, as we move fast as we go, fast. As you said, then taking the human element out. There's a phrase that that I really like. I think that describes, what would bet that kind of captures? What, you just described their Tracy,
which is that get is the interface to operations. So you're right, people used to use CLR eyes, or they use the vsphere console or they use the AWS console. Those were their interfaces to operations and now get becomes the interface to operation mean that you're not going to have some continuous delivery tool that is scripting gift. For the gift is the interface that it's scripting. In my script thing that you know, are quotes around that, you're right. You might still be eliminating the human but the interface to operations is now
through get charms of life. If I built a tool to simplify by management, you know that I wanted to be able to use that you I what should that you actually do? Well, it should make changes and get or make a pull request automatically or something like that. For you is like, if you need that but you lied and that it should actually go through get so they can be reconciled. Finally and even making it so that one of the parents Seen a lot is people getting rid of manual configuration creation and they're using generator configuration. So like if I make it changed
my application, repo this kicks off an automated process that generates the Manifest, which are opened onto a pull request on to the infrastructure repo which then goes through the get off the process. So you eliminate a lot of the possibilities where problems occur in the best of downtime can occur. Like eight of u.s. How many times have they gone down? Because somebody's Got Fingered to change, you know while they're connected to production too many, right, but you can get rid of that because it to this point get is the interface of operation.
Yeah. And automating that the front end of get, I is, I think it's where my strongest interest and because when we do that, but you what you just described then we are really tightening up the process because somebody didn't copy the wrong Shaw, write something as simple as that is where the weakness is and the more we can automate that and lock it down so that it just happens for you on the closer. We're getting to a really clean way of managing our deployments regardless of what environment it's going to.
Okay. Well, I'm sold at this point in our company, just hearing about all of this for the first time somehow. I think you want to get started. Are there some tips are there? Incremental approaches? People can take before jumping all the way and you have any kind of situation? What's the start? I'll go ahead and jump in there. Was enough of a pause. So we've been talking about it quite a bit. We've been talkin and Tracy's emphasized it. Several times is kind of declarative non imperative automation.
So it's the declarative State automation that is convergent non imperative and it turns out that there's a really cool system out there that already has these foundational principles. It's called kubernetes. So the way that you're deploying containerized applications. And by the way containerized applications is just use case one for kubernetes, but it's pretty darn mature at this point. We've got that reconciling automation that's happening that the replica set controller, the the deployment controller that make
it. So, And we got the Claret of configuration that expresses what you want running there. So, now, you add some additional reconciling automation with Tracy, was referring to the operator, and I was saying, well, just a little, it's get off. There's more than one operator, but you've got some operators that handled getting it from. Get into Cooper Nettie's and internet. He's already has these reconciling controllers. Use converging controllers. That actually create the pods because you get that for free, the declarative, and the reconcile and controllers for free with kubernetes.
That's a great place to start your get off sturdy, because you don't have to bring, you don't have to build that convergent non imperative automation. It's already there. And then you can leverage something like flux, which is a CNC of project that we Works creates or argocd, which the nether reconciling kind of Mechanism for bringing things from get into kubernetes. That's also in the CNC. Have you? Add that and then you're you're like their, you're doing it off now, there's some best practices in you but you start to learn those but that's an outstanding
place to start because you get so much of the equation for free by just using kubernetes the starting with kubernetes and the tools that were mentioned both since you have projects fantasy Flex from weaveworks argocd, which is some of the code freshworks quite a bit on those are, those are really perfect for Coretta. Securities is ideal for it, but it's worth calling out to that. Kubernetes is by no means, the end destination of get UPS these principles. We really think are going to apply to pretty much all infrastructure. And
one of the reasons that a w s is involved in to get out of working group and hazard. About to get off the working group is specifically because they're looking for standards that are going to help them. Build better infrastructure, tools that are going to work on her principles. And so as they are coming out with new ideas, new products or religious as you know, obviously doing a lot of work, we can hold up a yardstick and say hey is this compatible with get-ups? Can you do this with get off till if you can, you know, what kind of sit and we're kind of we're getting behind. So
this is, this is the right place to start, but it's by no means. We're we'll end up. I'm going back to a camellia said about the containerized application. That is the place you should start. You know, when you start decomposing your containerized application, you're going to find out that breaking up is really hard to do and you need to have a strategy for doing that. Now what I'm starting to see some mistakes because we work a lot in the microservices area is our focus is that we are seeing a lot of application-based
employment files and files. It don't break it up into application the H microservice having its own. So when you move from a containerized application to start decomposing your applications, you really have to sit down and get on the Whiteboard and start looking at all of the parts and pieces that have to be touched. And are we going to go to poly repo? And if you go to poly repo, you're going to have a lot more places that you have to check in those llamo. So it is a architectural change and you should consider it, a
architectural change, and address it as an arcade, and Architectural change, and don't just rush, because you'll have a lot of pieces and parts. She didn't put together, right? Cool. Something interesting to think a couple people said in my last night and it doesn't stop with kubernetes. What are some of the strangest places you've seen it? I can think of a few like, how the community in different communities reason. Get off Spanish calendars, now, or like you can change the meeting invitations and get repo and it gets reconciled to the shared calendar. Everyone has
it's awesome to do stuff that way. Stranger Van Der knoob insightful spots to get off this popping up. No, but we're thinking about how to do that for the CD pipelines right now, to clear what you want in your pipeline. And then you're going to have events and sequences. You're going to have to like the pipeline anymore. You're going to have a sequence, right? And the events for old. I'm determined that you could declare that for your your your pipeline check it in to get and your pipeline would run off of it. Now, that would be really
cool. I'm hoping that we see that sooner than later. Something else gets that I've seen are things a lot of kind, kind of policy related things. So it could be the group's that you've defined in your ldap system and you're using L. Was kind of your central place to come up with your groups and who has which roles and in and all that and then propagating that out into the systems that then apply those policies. That something that I'm starting to see pretty commonly, you know, you doing things around Opa is kind of a more General like get Austin, your Rico rules.
Those are some of the types of things that we're definitely seeing it. Applied to in many. Many scenario is one of those conversations that we had when they get off working group has started as somebody came in and was like, hey, why are we using a Google doc like we should you be using get for taking notes and stuff and we're like Well, maybe we will someday but it's probably okay for us to do a little bit of Life collaboration and it's meant for it to get some ointment for that. Same way. But even though I think it's a really interesting use case
Leonardo Maria. Okay, if I really cool presentation a while ago, about doing a air gap environments and basically what they did is they set up essentially a thumb drive that had to get repo on it and then they would walk up and then plug it into the device and it would go through the reconciliation loop with a reconciler that was, then this case, it was kubernetes on the edge. It was in, I think 5G Towers or something, but there are Gap and it would then do the reconciliation there, even though it requires, walking to USB drive up, you know, it's
still good. And it says, it's still actually simplifies the process that they would go through because before that, you know, if they wanted up Dozens are thousands of nodes and people to walk out. They would have like a handbook of instruction that I really need to follow and if you don't follow it, exactly. You run into some issue steadily. Now, you have Divergent across all these. Do you have to have really super skilled technicians to deal with that? This case? Now, it's it's easier. They can prepare a simple thumb drive that they can just copy the code over when it's ready. And then they can
just send it out and everybody can just go and plug it in and they're they're good to go. And that's, that's a pretty wild interest in Eustis. I can't imagine how much easier mix debugging later when you just have to check what version, you're a graph system is up. I wouldn't have to get to it. Then instead of having to check the state of every single system inside of there, a good transition, I guess, because you started mentioning to cncf, get up a special interest group is evolving and what are some of the things going on in and get upset? And I was helping out. Why should people get
involved there? Yeah, I'll start there, if that's. All right. So the gas working group, is we launched it in Partnership, code fresh weaveworks, AWS as your job course. Launch this. Just at the end of last year and the goal was really, let's put together standards and best practices and principles for what get off is and how to do it. And let's make something that's inclusive and streamlined and simplified because we've seen get off, start to come into discussions. Were people are saying, no, you're not doing, get
off. This is get off. That's when I get off Sunday, Don and there's confusion about what it means. And so, for example, one of the things that made people don't realize is that get up the principles and they're not completely finished. They're actually being drafted right now though. So, please would encourage anybody listening come, get involved in the community there. Dozens of people working on this from all kinds of different companies, but one of the principals is this site, Closed-loop reconciliation. So having an operator that is looking for
changes and trying to pull them and apply them on to the infrastructure. That's something that even if you can do in a structure is code. You're probably not doing that yet. And their number of other principals in there that are being worked on that are incredibly valuable. Once you start using them. So craving the standard making it easy for people to to use showing which tools are are compliant or support that standard like flax like Argo and then helping vendors understand. Hey, if you want to make your tool get-ups friendly, these are the principles that you're
going to follow and even providing a pattern different patterns and Community contributor patterns that people can look at to see if I'm doing a w s and I wanted and I want to deploy service infrastructure. What's what's the best way to do that? I'm having things that people can contribute to do that way. So there is essentially the Git Up. Which is under the Sig after Livery of the cmcf. And we're working on a standard that is called open get-ups, which is going to be, which is a project Under the Skin. And Cornelia is also a huge part of
the formation of that group. So crazy. It, by the following statement. The reason to be standards is not just for standard sake. If not just because we want to be take some purest, high horse. It's that there are certain characteristics. There are certain benefits that are realized only if you do things in a certain way. And so we don't want the water down, get off to be something where somebody says. Oh, well, I'm doing get off, but wait a minute. I can't do recovery. I can't do Disaster Recovery in the in
the story that I told you at the very beginning because I have a protected against strict detection and my runtime environment doesn't match with some get because I wasn't constantly reconciling. I had didn't have a convergence system. Well, then you're not doing it off. You're doing get plus automation, but you're not doing get off. So you don't realize the Spinners business benefit of disaster recovery. And so that's what we're aiming to do with the getups working group. And these principles in particular, is to identify the essential principles that yield, the
kind of results that we want to deliver with these practices. So I'm curious get plus automation is just sparkling get Automation. And for real good Ops you need it only comes from a specific region of the CNC stops working groups and we're going to find those standards and make it easy for everybody for companies like to play. Have we need to see those. We need to understand the standards. We need to have standard protocols. We need have standard Integrations. And so it will be essential as get Ops has embraced by more companies. That we start building it into the ACT Pipe lines that we
have a you know, a community we can go to and say, hey can we can we solve this problem? Awesome. Thanks. I think that wraps up our time here today. So now I will switch over to live questions. Eccentric.
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.