Table of contents
About the talk
GitOps is an approach where infrastructure as code lives alongside your application in the same Git repository, and any changes are automatically deployed when they’re merged there. This session demonstrates how to implement this approach for AWS-managed backend resources together with front-end services running on Kubernetes out of the same Git repository. You learn how AWS resources are deployed and managed in an automated way and deployed on Kubernetes via Flux. If you’re a developer or operational team member looking for ways to leverage AWS with Kubernetes and Amazon EKS to build and run applications in a simpler and cohesive Git-driven experience, this is the session for you.
Learn more about AWS at - https://amzn.to/30gRkKT
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4
#AWS #AWSSummit #AWSEvents
I am a hands-on technology architect and leader who can craft a compelling architecture and strategy - as well as successfully sell and deliver it within organisations big and small. Through constant reading and experimentation and an extensive network I ensure that I keep informed and relevant on where the industry is going. My recent experience has focused on helping AWS’ customers to build safe, reliable and scalable cloud platforms and architectures - particularly around containers. I also have evangelised AWS and its container offerings in several blog posts and Summits.View the profile
Hi my name is Jason and I'm a Solutions. Architect with a w s in Sydney and I'm here to talk to you today about get Ops I need of us and with kubernetes So, has any of this happened to you? A deployment just failed and you hear it. Worked on my machine or it worked in our staging environment. Or perhaps you have a major incident and you frantically try to work out, has anything just changed, what changed? Who changed it? And some finger-pointing ensues. And maybe that incident means that you need to rebuild some things. So you go looking for
the documentation and if it even exists, maybe it's not up to date. Or maybe things just seemed generally out of control. The good news is, if you have any of these problems then get off, scan help. We'll start with a brief overview of what get apps is. In order to understand get-ups, we need to take a step back and look at modern software development practices around how they manage change in software teams. Traditionally has been three Key activities and systems involved in managing that change. The first is the source code management
using things like yet. The second is the building and the testing of these applications which is normally done with things like Jenkins or AWS codebuild. And finally, we have the deployment of these changes which might be done with eight of us code to play. These systems are up and stitched together into pipelines and there's two major types of pipelines that you see within software organizations. The first is, continuous integration and that monitors, get repo for changes and triggers builds in response. And then continuous deployment, which is if
those builds in tester successful during deployments automatically off the back of them. And if you can find those two things together, it becomes a CI CD pipeline that takes you all the way from a committed change to a service that's been updated in production. Also, we traditionally have had a divide between application code and infrastructure. For example, here we see Margaret Hamilton who led the team that wrote the code that landed humans on the moon in the Apollo program. We also see the Apollo guidance computer, which is a very bespoke piece of equipment and a Handover on a team like
this where the code has to be handed over to specialized infrastructure, people to deploy and to operate that computer for them. The good news is that this has changed first with first with virtualization and then even more. So with the cloud where we now can have declarative Jason llamo or cdk to do this on our behalf, this abstraction has really helped and it's let us treat infrastructure, almost like code. And because we haven't read infrastructure, like code, we can manage it and some of the same ways that software developers do and get many of the same
benefits. So, for example, would get us with being here from the infrastructure perspective is that we have these infrastructure, Jason yemalur, cdk files and it get repo and changes to them. Would either trigger a codepipeline to update a cloud formation stack or flux, we will look at the moment to trigger those changes on kubernetes. Foxes a part of the cncf or cloud-native Computing Foundation ecosystem, and it is intended to monitor, get repos and do get Ops around kubernetes. And it's what we're going to be looking at in our demo.
Also I mentioned the AWS cdk or Cloud development kit. This is a new offering that allows you to represent, what you'd like not only and Jason are, but in real programming languages, like python typescript Java and.net, you then after declaring these things in proper programming languages rather than Jason or llamo had to compile it into a cloudformation template for you and then use that against cloudformation. However, you would never edit that confirmation template directly anymore. You would instead go and update cdk. One of the benefits of the cdk is that it embeds in it
many Tom and best practices. So a few lines of cdk code can generate quite a lot of cloud formation, that represents the best practices that you should be doing with in the 80s environment. With my will look at this a bit more in our demo in a month. When we talked about, get up a comet, approaches to have a branch / environment and you merge your changes to that branch, in order to trigger the deployment to that Branch. So as we see here, we might have a situation where each developer gets their own personal Branch, where they can get away quickly at any changes that they're
working on. And if they're happy with that change, they might then also merge it to an integration environment where they can test it against everybody, else's services and changes. And then finally if things work out well in that integration environment, they might merge it to the production branched, send it to production and have the customer's experience, the change. In many organizations. So that might be a little simplistic. You might have, for example, a main shippable trunk that you used to control the change. So you might as an example, say after you've had
successful integration tests, I would like you to submit a poor Quest against that trunk, and that's where we put on Mele controller change. And then from there, we could deploy to production as an example. Writing, strategies are a whole talk in and of themselves and are this is an area that you have to pay some attention to. If you're going to do get out swell So that's what that office is. Why might you want to do it? First, get us the tool that all the developers in an organization are already using. So by bringing people like your operations and infrastructure teams along to use
the same tool. You get consistencies, you have people that are using the same tool and play nicely together. Also allows management to not only see the changes within each team but across teams all in the one place so it can really help with making everybody more streamlined and efficient. It's also a great source of truth because it keeps a full history of our changes and it serves as a great auditing and reporting tool there as well. It shows us not only who made each change, but what they changed when they changed it, why they changed it and who
approved the change on the, why they changed it? It's important to make sure that people put appropriate comments in. May be linking to the ticket. That represents the bug of each request that they were implementing and when we talked about who approved the change, it's important to have something like an in for Spirit of You by the poor Quest, mechanism on every change so that you do have that approver and final gate through the production. That also makes it a great place to control change because if your production environment is your Castle, get can be the drawbridge in the door here.
You have a situation where you would force all of the changes through that pure view processing yet, you also would force it through the pipelines and enforce. All of the test that you might want not just from a functional but maybe even from a security perspective and only once that's all happened is this. The only way to get things into production so it's a great way to control things that way. It can also help you with rollbacks because the only thing you need to do to rollback A Change, Is to pull a prior version and then merge that back in. So, as an example, if we wanted to roll back
to the last production version, you would just do it, get pull of that particular, a prior commitment, and then merge and push it back up. It's important to note here though that you need to have done certain things, right? With your database, schema and props API, versioning of dependencies to have this work seamlessly, but it can definitely help you here. finally, there's an area that there's some discussion around when you manage the cloud directly versus when you manage it by a kubernetes, And this is because kubernetes has this concept of operators, operators are way
to extend kubernetes to do. One of two things you we can extend it so that it can manage containerized services on The Quest For You, perhaps a database or case your queue. And it'll run that is pods for you on the cluster. You also can use the community's operator to make requests against the underlined Cloud on your behalf. So for example, to manage and AWS RDS database or a load balancer, And so when you look at your options here, they form a bit of a spectrum of options on the one extreme you have a devious managed services that you manage directly violate
of us with things like cloud formation and cdk On The Other Extreme. You have everything running on the cluster with no ABS managed services. And then in the middle here we see that you have eight of us managed services but you're managing them from within kubernetes. The demo that you're about to see it somewhere in between the left. In the middle, we have our database, which is our most important piece of managed infrastructure being handled directly in the cloud by Agra tops of cdk. And then everything else is run by operators. Things like the load balancer in the secrets will look at
that in a moment. So that all said, let's have a look at it demo. In order to understand the demo, let's go through a little bit about what you're going to see. Is tomorrow is an app called ghost which is an open source app. For publishing personal blog posts, it is a 12 Factor app meaning that all of its state is externalized to its my sequel database, as well as it gets all of its configuration from things like environment variables. It is a well contain her eyes out by the community and so is a great candidate for running this way in kubernetes and showing off some of
these features. What we're about to see as far as how this all interacts between communities. And the cloud is that we have a CD case manage RDS, database cdk is gone and provisioned, a secret and secrets manager, that has the password for the state of Base as well as all of his connection details. And we need those in our pods, in order to be able to connect to the database. So we install an external Secrets controller, which is an operator to extend, kubernetes to integrate secrets from Secrets manager in to kubernetes secrets. So it watches these secrets and it up search them
into coronavirus for us. We can then reference those kubernetes secrets from our pods back into Plymouth back. So we deploy our Argo service and it retrieves the secrets on the east side and is able to connect to the database. To expose us to the internet, be chosen to use the lb Ingress controller, which is another operator that we can head to work. He's closer to manage the lb in AWS. And so, we Define our Ingress object and then the Ingress controller goes in, Provisions in a lb for us and it will monitor as the pods come and go and and remove those targets from the ALB and do
other seamless management of that lb for us from within kubernetes. And finally, we wanted to have a real name. Like in this case, ghost. Jason I'm a calm. And so we need an external DNS controller automatically update. The sea name that we want to match the lb name, that is probably not a good experience for our users to be using directly. And when we look at the kit flow that we're about to see, there's basically in one get repo, two folders, 8 of your staff resources and Kate's up resources and resources. You're going to see a codepipeline
and vocal codebuild which ones AWS cdk tools to create and manage the RDS database. And on the Kitsap Resources folder, we're going to see that fox monitors that folder for changes and it updates, the underlying eks cluster and objects to match. Those changes the address that you're about to see the demo live in, is down below if you want to have a look after the talk. So let's have a look at our demo. We see here ghost running on the internet at ghost. Jason. Com. One of the interesting things about this is
that we see that it's an HTTP encrypted website. So we look at the certificate, we're going to see that it is issued by Amazon NFL Wildcard, certificate, you'll see in a moment that this is managed by Amazon's ACM certificate Management Service and it manages to Ale be here as well. So, let's have a look at the infrastructure as code that backs. This thing as we said, there's an eighth of US map resources folder and a Kitsap Resources folder that we're concerned with here. This service has an availability issue, the database is set to single availability Zone and the
underlying pod is only running once in one availability Zone and so both of those are bad from a perspective. So we want to fix that what we're going to do is create a new Branch, so we can make some changes to fix this fail ability issue. we'll start with the aid of Us app resources and we'll have a look at our pythian, cdk code that manages the database But we're going to see here is that multi AZ, the parameter is false. We need to set that to true. So we're going to edit this file and we're going to flip that from false to true.
Now we're both going to come it. That change and leave a good comment explaining why we're doing it because he's comment says we're going to see are important in the peer-review process of our polar Quest. But we're in here. Let's have a brief look at the builds back. This is the file that tells codebuild what to do when the pipeline in folks. As you can see here, we're installing the cdk and running some cdk commands. Look at that, a bit more in a moment. Now, let's go to the Kitsap Resources folder
and Argos deployments back. We're going to see here that there is only one replica being asked for of the Pod. It's only one pod will run in response to this and we want to make that to not make it highly available. So, we're going to make that change and commit it with Sarah, here. And what we're in this folder. Let's have a look at the Ingress object, the singer subject. As you can see, he has some real be Ingress specific annotations that say, how to configure the load balancer, it also has an annotation for our external DNS controller saying, what is the real name that
we want to ensure its see named to the Shelby address? Now that we've made the changes that we want in our Branch, we want to submit a poor quest to get this merged so that it'll get deployed. One of the great things about Paul requests as it, makes it really easy to see exactly what changed as you're reviewing, that save the poor Quest. And as we'll see here it warns us that we're not allowed to approve our own changes this because you want a second set of eyes and a bit of control around
changes, the especially ones that are about to be merged to production. However, for the purposes of this demo, we're going to do it anyway. Now that we've emerged are change. As expected, codepipeline has noticed that that change has been merged and it's kicked off a pipeline. So we can see here are with this link on codepipeline that it takes us right to get Hub and shows us. What changes just happened that it's in the process of processing for us. That's handy. And it also links us directly to codebuild or we can tail the logs of what's happening in real time.
That's beside the first bit of this builds fact that it's running is it's going to install the cdk that installation is by npm, and then we're about to see that. It also runs a pip command because we're using python AWS cdk in. Some of those python modules are in pep, if I wanted to make this faster and more efficient, I could have included all the cdk tooling and a custom container image and ask codebuild to run that image instead. However, for the purposes of this time out, we're going to do it at the time.
It's now installed cdk and run the cdk to play command. And as you can see in the output here, it's creating a cloud formation change that, which it's then going to the play for us. It's indicated that that update is in progress and because this is cloud formation we should be able to pop over to the cloud formation part of the console and sure enough. We see that there's an update in progress on that database back on where as it's changing to a highly available multi, AZ
configuration, While we're here, we might as well. Also, look at the secrets manager and see the secret that the cdk is created for us. The secret includes, not just the hostname, but also the password of the underlying database, everything that the service would need to connect to it. So now let's have a look and see what is happening in the kubernetes side of things. We'll start with a cube CDL, described on the ghost deployment. The thing that we changed the deployment spec for to make it 2 instead of 1. And, as we can see here, it has scaled up to 2,
which is what we expected, which is great. We also can see that there's some environment variables getting past into container. And that these environment variables are coming from a kubernetes secret called ghost database. This is the secret on the community side, that's being managed by the external Secrets. Controller will have a look at that in a second. Let's also look at the nodes and this is interesting because the service is running in Target mode and its Target mode, every single pod, get to note as well.
Because Fox is what has been doing this. Get Ops magic for us. Let's have a look at its logs and it'll tell us a bit about what just happened. As you can see here at periodically checks, the repo for changes and it noticed our are committed change generated at Delta communities, spec file, and then applied it and we can see here exactly when it happened and the timings for how long each step took logs that out for us, which is great. Now, we're going to look at that external Secrets controller to
see how it got that secret from the AWS Secrets, manager into the kubernetes secret that we referenced in our, in our ghost, on its back. And as we see here, it looks for changes on that secret. And if it's a secret changes on the end of your side, it up, certs, those changes into that Community secret for us, that's how that appeared Finally, let's have a look at the logs of rlb, Ingress controller. what we're expecting to see here is that the English controller noticed that there
is another pot of ghosts and it added it as a Target on that ghost ald And we can see here that that did happen. So it noticed that we know to pause and it added that. And you can see here in the logs in a prior situation, it also removed a pod from the target group. So one of the key things that the alienist really does for us is manage the mapping of PODS to the load balancer. As you can see here, these are the IP addresses of our pods that are running ghosts. And if we pop over to the aob console will see those exact same IP addresses as the
targets for the sale. Be this actually also shows us that they run into separate availability zones Giving us a great high visibility story. And if we go to the monitoring will see here that the healthy hose count for that load, balancer has increased from 1 to 2. You get all of the normal. I'll be metrics and things that's all it's happening. Is its managing that for us. So in closing, it's time for you to try to get out yourself. So that's your app works. And every environment that your developers and operators use the same tool and
play nicely together that you know the who what when and who approved of every change that you're in control of your changes and that you can roll them back to a known good State whenever you need to go build.
Buy this talk
Interested in topic “IT & Technology”?
You might be interested in videos from this event
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.