

- Description
- Transcript
- Discussion
About the talk
Microservice architectures give us increased agility and scale, but as they grow, they can become complicated to coordinate and debug. In this session, we review common service coordination patterns and how AWS Step Functions can help us quickly build fully managed and resilient workflows powered by easy-to-understand state machines.
Learn more about AWS at - https://amzn.to/2ZvJn5d
Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4
#AWS #AWSSummit #AWSEvents
About speaker
I’m an experienced multi-lingual full-stack technical lead, comfortable building client-side single-page apps or backend APIs and services. Equal parts engineering manager, mentor, and software craftsman, I’m keenly aware of the critical tension between quick and good enough versus resilient and flexible. I value working with teams built on communication and trust, rapid learning, and getting things done. In addition to building great software, I build great teams. The hardest part of software development is teamwork and cross-functional communication, not writing code. I embrace agile development principles, recognizing that it’s critical to release early and frequently to maximize learning and validate our assumptions.
View the profileHello and thanks for joining me for this session. The art of the State Coordinating services using AWS step functions. My name is gave all of me and I'm a senior developer advocate here at Amazon web services. This is a bit of a long quotation but it's a great one. Someone's talking about a reliable automated way of orchestrating, very complex, queries, and processes between distributed systems saving time and money getting more productivity and Agility and making it easier to discuss Solutions with non-technical stakeholders. It's a quote from Paul Brown, a senior developer
manager at the guardian. Here's the full quotation for your trade. As you can see he's talking about a service called AWS step functions in this session. I'm going to teach you everything you need to know to start working with AWS step. Functions to achieve these kinds of results in your own project will start with some background knowledge. Discussing the lay of the land when it comes to working with distributed services and will explore a common Service, Coordination pattern called orchestration. We'll see. How orchestration is a great fit for handling work clothes. With
complex logic and will learn how AWS step functions. Let's have some build and execute these work clothes in the cloud, using something called State machines. Finally, we'll see some examples of customers using step functions and share some great resources to continue learning more. What start with an overview about how we typically get things done in the systems that we build is often easier to start out this way with all of our application bundle together into one Deployable unit. Here's a picture of a banking system functionality, including document of
identity verification and more all deployed as various code modules together with a common share data store in analytics system underneath But over time, what we usually experience is a need or desire to split up our monolith into a set of distributed Services microservices, which means we now have separate compute and data layers for each service. This is great because it gives us increase agility and scalability. We're free to have different teams work on each service with the languages and toolings that are best for each service and we can steal each service for independent needs some
Services. When a more computer data is capacity from multiple instances of the compute layer scale, horizontally all possible. But the Distributors customer service has almost always harder to coordinate Monitor and debug to tackle this issue, we typically turn to a Service Coordination pattern called. Let's continue to explore example baking domain. And look at how we might use orchestration to coordinate some Services together. In this simplify banking system. We have four distinct services for areas of concern and account application service, which is concerned
with accepting applications for people who want to open a bank account and is responsible for returning in approve or reject decision. Next of the data checking service, which is responsible for performing, various delegations of data, such as checking identity, documentation of a bank account applicant, or they're finding a home address appears valid. Next to the human review service, which tracks applications that need review and allows humans to make decisions about these flag applications. And finally, we have the account service responsible for creating and managing a bank account,
after a new account application, has been approved. Here's how these Services work in concert with one. Another application service takes in new applications performed, the date of checks required Flags an application for review by human, if required. And finally passes approve decisions on the account service. If you zoom out, here's what the same system looks like. And here it becomes. Even more obvious that we have is an instance of service orchestration minutes from the point of view of the account application. Service after the account application service is the one making calls
out to several other services in a reacting to their responses in a stable Manor. Play new account application is received it calls out to the data, checking service to check the identity information provided in the application. It also verifies the address of the applicant. Then if any of those checks, come back with a flag in Revere will get involved to make a decision. If an approval decision was made, they called service to open a new account. Let's look at these same steps required to process a new bank account application. But this time visualized a different way as a sort of flow
chart with morphologic steps in it. First, we'll verify identity documents provided by the applicant. Will check to make sure their home address of Heroes ballad. Then we might need to involve the human to review the data in the application. Then we'll wait for the review to happen. And finally, we can approve the application at first glance. This seems complete, but you might be jumping your seats. I know, this isn't enough. We can make it better and I'd say you're right. Let's see. How First, we don't need to be doing the identity check and the
address check in a stereo fashion. The result of one doesn't depend on the output of the other, so instead we could arrange to have these steps completed in parallel. Next, we can improve the stuff that happened after the two checks are performed. We should explicitly encode step to show that a human review is only required if the Identity or address checks failed. So it's possible to go straight from the checks to an approved application. Finally we should have an explicit step showing a rejected application and show that a human view step can transition to either
and approve or reject decision. There, that's much better. Now, you might not know it but what you're looking at here is a state machine A state machine simply describes a collection of computational steps that we want to split into discrete States. There's always only one starting State and only one state is active at a time. You can think of this like a workflow or an executable flowchart, as the steps of the state machine activate. The active state is going to get some input, do something useful with that input. Generate some output and indicate the next to transition
to passing the output from the current state into the input of the next state. This is such a powerful way to coordinate at 8 of the US has a specific service to work in this fashion. The surface is called AWS step functions and you can think of it as our service that provides full State machines in the cloud step functions and service work clothes. So you can build an update apps quickly using step functions. You can design and run workflows that stitch together services such as a divorce. Lambda AWS fargate and Amazon sagemaker is a feature-rich
applications. You can write resilient work clothes with built-in error handling and retry support and step functions can provide an audible execution history, with visual monitoring, for your state machine executions, letting you visually trace and do we need it? What's the weather this week? Like, in practice with the simple demo. So, let's start with a submission that results in a successful application decision. I'm going to interact with my account application service by invoking, a Lambda function from the command line here. I happen to be using the open-source serverless framework to
invoke the submit application Lambda function. You can ignore this, but if you're wondering what the SLS invoke-command is, that's where it comes from. I'm going to submit an application with a valid name and address. And as we can see here, I successfully submitted my application to lend the function that handles this applications. The mission will pass the info. It perceives onto an AWS step function, State machine designed to manage a similar application. Workflow to the one we reviewed just a few minutes ago. Now, here in Step functions Management console. I can see all of my state
machine executions and drill down into this specific execution. For my most recent submission, I can see my input and the final State machine. In this case, the application was approved. I can scroll down and I can see the visual workflow showing exactly what happened during this specific execution. It starts with these two address checks you can see, both address checks returned fall for their flagging output. I can also click into each of these validation state. If I want to inspect each, check input and output result on its own. And finally, I can click into the approval application
state-to-state a final output indicating that we transition to the application of proof state. Now I'll show a different flow through the state machine. Let's see what the execution for looks like. If we need to manually review a bad address, that was submitted Secure, I'm submitting a valid name gave but about address one. Which doesn't match the address to Elevation rules which will cause the address check to return a true flag causing the state machine to transition into a pending review waiting for review decision to be made. The current state is highlighted in blue, which means
the state is in progress. This means a state machine is still running and effectively, paused waiting for a call back from our review system to pass a new decision, will stimulate that. Now, back on the command line between another call to submit and approve decision for the review. And if we're quick back in the step functions Management console, we can see in real-time to the execution picks up again. Ultimately, resulting in, an approved application, and containing the state of the steps that came before it including the review decision to approve the application. So now you seem step
function and practice. Let me tell you a bit more about how AWS step functions work. First son terminology is represented as a state machine and neat in your work clothes called estate. When is Step function, executes each move from one state to another is called a transition. And one cool thing about step functions is that by splitting up. Your work flow into discrete steps, you can reuse those two among multiple State machines, and even easily View and edit the sequence of steps as your needs change, each step function, is defined by a Json document, that conforms to a
specification described by the Amazon States Language by Jason base structured language used to define your state machine as a collection of states that can do work. Like task saves determine which states to transition to next, flight Choice, tapes, stop execution with an heir and so on In the workflow shown here, we're using four types of states have states to do work. A parallel state, do some work in parallel to Choice states to provide branching logic, and a special kind of cascade integration that waits for a call back to proceed
in order to implement our wait for review step, let's dive and a little deeper into each of these contributed to our work flow performance work. This could be calling an AWS Lambda function which is a very common use case. But you can also use task date to wait for a polling worker process running. Basically, anywhere to pick up, work from a cell phone, perform some competition and return results back to the workflow and I passed State can also be used to invoke, apis and other AWS services that step functions is integrated with. I think the Apple in our workflow, we use
land of functions to do the work for all the steps in blue. Here's how we can Define the first step and verified ID documents. Which state is defined by a name and is represented as an object with various properties which vary depending on the type of escaping to find here. You can see the state type is and it also is a Resource Property which contains an errand of a Lambda function to execute as well as a set of parameters to pass from the state machine State into the Lambda function and location. The parallel State type unsurprisingly, lets you execute work in parallel.
It takes me Ray of State machine branches, which it was executed. Parallel returning an array of results from each state Machinery when they are finished. Executing Here. So we could find the parallel state for our workflow. The type is that the parallel and it has a branch is property which provides an array of objects representing sub-state machines to execute the result of property which we can use to store the output, all the parallel executions. In this case, check to the human review required state next,
when all the parallel branches are complete. The choice to type is like a switch statement from many programming languages. You can fit in a real Choice Expressions. Each comparing a state variable to some value which the choice did. We use to determine which state to transition to next based on which Choice expression evaluates to true if any Here. So we might be required step from our workflow. It just typed choice and we passed two choices freshens in each one's looking at the check State variable. That was returned from the parallel state
of the Top. If either validation check return, the flag value of true this otherwise it will do the approved application state. Finally, in some cases, we want to be able to pass that machine execution and wait for some external competition to happen. For example, we want deposit to wait for review state and our flag application and can make a decision. What's the human makes a decision that review service needs to somehow tell this step function to resume execution, and it needs to
pass it, whatever value the state machine was waiting for. In this case, decision about the account application. the way that works in Step functions is with something called a task token, which Step functions will generate for you automatically ready to use an attack type to find with a special invitation style when the external system cause back to step functions and passes this test token back using YouTube east, end task, success, or send task failure API call, so that the step functions system can find the appropriate podcast to pass the result to
Here's how we might implement this and I worked. Well, it's a task type and this one involves a Lambda function as well, but the definition of this type is a little bit different instead of just passing. The AR end of the Lambda function, we want to execute, we use a special resource type, describing our intent to invoke a Lambda function and wait for a task token call back from an external system. Then we passed in the function name of the land, we want to invoke and the payload that we want to pass on to that land up here is where we passed the auto generated past token value. In
this example, case the flag application for review Lambda function, might go and find an application record by its ID in some database. Updated status to flagged and persist this specific task, go get a one-sided, then the application review system could show a human review all applications with a status of flat and when approve or reject decision is made by the reviewer. The review system would call back to the AWS step functions API passing into review decision. So the pasta workflow could resume executing When working with distributed systems failures Captain for a number of reasons,
call system Services my time out, task could bill for intermittent reasons, or you may even run into insufficient permission to fortunately, AWS step functions. Have a robust are handling capability. Each task State can be configured to retry when it encounters heirs with a configurable back off, right up to a configurable maximum number of attempts. Cast can also be configured to catch specific errors and transition to other states as appropriate. Let's see this mansion with one more simple demo using a slightly updated version of our example. Workflow
here to stimulate and are throwing a custom unprocessable of data exception, if it encounters a name String of unprocessable data, when we run this application through our stuff function State machine, we know that our name or address check might fail to the receiving data. That these stats cannot process. It all in this case we want to catch errors in transition to a flag application as unprocessable stayed instead of submitting the application for human review. As you can see, we're able to inspect the execution and see that our step
function Kotb. Are we can even do the exception to see more details? Now you've already seen most of them in the demos from before but just to explicitly call it out, here are a few points that nicely summarize. The experience of how you work with death punches. You define, you were close in Json, you can visualize these work clothes in a Management console, and you can even monitor current executions as well as audit previous executions, there as well. Allowing you to see the precise, execution, history, and inputs, and outputs of each state in your workplace. That
covers all the basics. Firstly, here's an exhaustive list of all of the state types to find in the Amazon States language today. You've already seen examples of most of these. In this talk, the first one we haven't talked about yet is the map State type. This date type is very similar to the map function or concept from functional programming in programming. You might use a map function to apply a function to each item in a real values returning, a new array of the same size with the function results from each application.
Similarly, the map state allows you to process of an input arrays items with its own State machine returning. An array of result was even cooler is that you can configure the maximum degree of parallelism for the map, say to use, which is incredibly useful between lots of work at once or limiting the simultaneous processing. If you need to Other new state times. We haven't talked about yet for the 6-speed and fail-safes. These are simple states that you can use to semantically Signal, a successful or failed execution, State machine to stop execution as
well. Finally, the past eight is very simple and allows you to just take input and pass it through to outlet without performing any work. It's very useful when you're first beginning to sketch out, a scout function. Work for the definition is you can use the pass States. Like, Place orders to be replaced by other state types later as you flush things out, Earlier, I mentioned that step functions can integrate some of the apis and a number of other eight of your services. You seen examples of invoking. AWS Lambda functions in task States, but you can also perform work and
these other services as well. Some particularly popular ones were calling out. Include our Amazon ECS and fargus support allowing you to get work done via containers. The Amazon sagemaker integration, which is great for steak, filet managing your machine learning training and deployment were closed. And the ability for step functions to call back to itself, which lets you reuse logic between workflows by calling out to other step functions as needed. Well, I've been speaking a lot about the great things you can do with AWS step functions. But I'd be remiss if I didn't share with you, how
some of our customers are achieving great results using step functions in the real world. And I beheld fun is one of Australia's fastest-growing Health insurers. They've got over a million customers, across Australia and New Zealand. Now, the health insurance industry is highly regulated. Of course you want to move quickly, but you need to do so in a safe and secure manner. For example, when someone needs to access into a secure Computing environment, perhaps to break the glass in troubleshoot an issue, they need to use a jump box or Bastion host to connect into the secure resources.
And I be an operator can request permission to access the specific environment at that point and eight of your step function. Begins the process that starts by notifying the required approver to make a decision to Grant or deny access. And then wait for a response, once it receives an approval response, it will kick-off provisioning as securely configured Bastion host for the operator to use as needed. Also when the operator, first request access base, Best Buy, how long they'll need access to the security system for the step function, State machine uses this information and starts a
timer when the Bastion host is created. When the timer's up, the Bastion house is cleaned up automatically. Another great use case that I think is really nice and illustrates some of the non-technical values that step functions comes from Coca Cola. The Coca-Cola company has many, many products, they manufacture, and these products need to have nutrition information on their labels, and of course, that means that whatever the product formulations change, or whatever labeling requirements from regulatory bodies change, they need to generate new labels. Believe it or not, we used to
take them 36 hours to manage this whole process. Each time a label needs to change and through the use of AWS step functions, they managed to get this down to only 10 seconds. That is such a dramatic Improvement. I think it's worth talking about some of the key points that they highlight here. Show me how step functions was so transformative and helping them achieve such dramatic results. The first point is if the data validation and transformation steps that need to happen in their pipeline can be designed visually with non technical Personnel. It's a huge advantage to be able to
see your algorithm encoded as a state machine, a series of steps, and the ability to generate these flow diagrams right there in the step functions Management console. And the fact that these validation and transformation steps can be verified in real time As Dead As flowing through the state machine is really useful. As you saw, the demo is a few minutes ago, as we step function State machines are executing, we doing Spectrum status in real-time sing the current state and each steps input and output visibility to inspect
LED them to identify and Implement process optimization right there on the spot. Speaking of getting work done, dramatically faster. I would be remiss if I didn't also mention a new and exciting addition called AWS step functions. Express workflows, work clothes are designed to allow you to orchestrate compute database and messaging services at rate up to 100,000 events per second. They're suitable for high-volume event processing workloads. Like microservice orchestration and
streaming data processing a transformation. Unlike the standard were closed. I spoke about earlier Express, work clothes are designed for short durations of less than 5 minutes, and they can be very cost-effective at scale for high frequency data processing. Here's a quick overview of the standard were closed for Estes Express, work clothes to call up here. If that standard work, clothes can run for up. To one year will Express work clothes can only run for 5 minutes in a single execution. Another point to call out is that you only get the execution history with two legs are
visual debugging in the Management. Console with standard work clothes Express workflows, only send logs to Amazon. Also Express work clothes are not designed to support. The use cases that require the execution to pause and resume, like you saw what the standard workflow examples for earlier in this talk. Hopefully have inspired you to work the step functions. The next time you need robust fully managed service orchestration. But before I leave you, I want to share a few developer tools that I found useful when working with and some tips for where to learn more The first is AWS step
functions local for testing and development purposes. You can install and run step functions on your local machine which can be used to invoke. AWS Lambda functions, both in a tab us and running locally. You can also use step functions local to coordinate other Supportive Services next statement, which is the command line tool. You can install to inspect State machine is written in the Amazon States language ears. While you're riding your state machine definitions. If you're like me and you're a fan of open-source services framework tooling, you'll be pleased to know that there's a service
framework plugin for working with step functions. And finally, because I like working in Visual Studio code, I really enjoyed the AWS step functions Constructor extension for vs code, which gives you a nice visual preview of your state machine as you're writing it in a similar manner to the experience. You get when working in a Management console. But this is right in your editor, here are the three links, I recommend the most, the first is the self-guided workshop. You can do at your own pace. The second recommendation I have, is to spend the time to read through the
step functions, developer guide documentation. It's not very long and it contains a ton of useful, info and examples to help you learn quickly. And finally, you can visit the step functions website to see a number of popular example of reference, architectures to solve, many common problems using step functions to sum up. I just want you to remember that functions, gives you a fully managed service with high availability and automatic Skilling built, right? In if you do step functions to orchestrate, your work clothes, you'll get visual monitoring auditable, execution, history, and robust,
build in error handling. And of course you only pay for what you use with standard work clothes, you're charged based on the number of State transitions required to execute your application. And with Express work clothes, you're charged based on the number of requests for your workflow. For the time in memory takes me to work for example, that don't charge you for the time they spent in Waiting, State and express work, clothes are likely going to be more cost-effective and processing, large volumes of executions that finished quickly. Well, I hope I've
been spared you all to get a divorce or try. So what are you waiting for? Step up and go build. Thank you so much for taking the time to stay with me for this session. Your time is valuable, and I appreciate the chance. You've given me just share this with you today. If this talk helps you to build something, I would love to hear about it. Please keep in touch on LinkedIn and Twitter. Thanks again.
Buy this talk
Ticket
Interested in topic “IT & Technology”?
You might be interested in videos from this event
Similar talks
Buy this video
Conference Cast
With ConferenceCast.tv, you get access to our library of the world's best conference talks.
