Events Add an event Speakers Talks Collections
 
Duration 40:24
16+
Video

The Effortless Development of Custom Computer Vision Models - Level 300 (United States)

Sara van de Moosdijk
Sr. AI/ML Partner Solutions Architect at AWS
  • Video
  • Table of contents
  • Video
AWS Summit Online 2020
May 13, 2020, Online, San Francisco, CA, USA
AWS Summit Online 2020
Request Q&A
AWS Summit Online 2020
From the conference
AWS Summit Online 2020
Request Q&A
Video
The Effortless Development of Custom Computer Vision Models - Level 300 (United States)
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
225
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About the talk

Do you want to use computer vision in your projects, but find the idea of training a custom neural network model daunting? Have you used pre-trained computer vision models, but find that these models don’t cover every aspect of your use case? With Amazon Rekognition Custom Labels, you can easily customise existing computer vision models without needing an expert data scientist. Come learn how to prepare your dataset, customise Amazon Rekognition models with your data, and deploy these models in an application. We also discuss the difference between training computer vision models using Amazon Rekognition Custom Labels and doing so using Amazon SageMaker.

Learn more about AWS at - https://amzn.to/2WopVpc

Subscribe:

More AWS videos http://bit.ly/2O3zS75

More AWS events videos http://bit.ly/316g9t4

#AWS #AWSSummit #AWSEvents

About speaker

Sara van de Moosdijk
Sr. AI/ML Partner Solutions Architect at AWS

Sara van de Moosdijk, simply known as Moose, is a Partner Solutions Architect for Artificial Intelligence and Machine Learning (AI/ML PSA) at Amazon Web Services (AWS) in Australia. She helps AWS partners build and scale AI/ML solutions through technical enablement, support, and architectural guidance. Drawing on experience from her previous role as an AI consultant at Accenture, implementing and deploying machine learning projects in EMEA and NAMER, Moose aims to make machine learning accessible to all levels within an organization. She holds degrees in Natural Language Processing from Charles University in Prague (CZ) and the University of Lorraine (FR). Moose spends her free time figuring out how to fit more books in her overflowing bookcase.

View the profile
Share

Hi, my name is Donna from the most like I'm a machine learning specialized partner Solutions. Architect hear a w s. Hopefully you're watching the session because you're interested in learning how to build custom computer vision models. And I have a little bit of a confession to make sure I'm actually more of an expert and that's the language processing. Then I am in computer vision but about six months ago, I found myself working on a very different type of language.

So were you able to understand what I said in that video? Even if not hopefully you at least noticed that it was sign language specifically Australian sign language, it's about six months ago, I found myself working on a project to try and build a tool which could translate from sign language into written English sign language. I do apologize. I'm only beginning a beginner in this language. If you don't know much about sign language has a couple of facts for you, we believe there's about 300 estimated sign language in the world. So

Australian sign language is just one of those and we actually use the term. Also want to refer to Australian, sign language originated around the nineteenth century. When it split off from British sign language known as via sell. And it is two major dialects, a northern dialect and a southern dialect also and is actually a very complex language. So, meaning comes from several different factors. The shape of the hand is really important, but also the position of the hand in relation to the rest of your body, the movement of the hand is important, your facial and body expression, but you

can also use the space around you to really illustrate what's happening. And to provide more meaning to what you're saying. It's a very, very complex problem, which makes it a really interesting problem to solve with machine learning. And when we started on this, several months ago with a small team of people, Very excited and we had loads of ideas for how we want to approach. This problem. We were thinking video classification, action, recognition post segmentation sequence sequence and we fell into the Trap of starting with the more complex models and therefore the more interesting

ones for us and finally as the deadline was approaching, we still didn't have a working model. So we came to a point where we decided okay, will it be possible for us to simplify this problem? What if instead of working on videos we could actually make this work for image classification which is much much simpler. But we did it on this slide. You see examples of my image training data. So we build a script which actually extract 90 frames from a video of somebody performing a sign. Is there a range of these frames in a three-by-three grid? And this

image is actually what we use for our image classification. Algorithm luckily for me I had 64 wonderful colleagues who agreed to help out to create training data. We end up supporting 12 signs short words and phrases things like hello how are you pleased to meet you up with a complete total data set of 764 images? once we had our images, we then had to look at how we were going to build our image classification model, when it up using the default pytorch environment, provided by Amazon sagemaker, and we

could transfer learning on a resnet, 18, pre-trained, resnet, 18, If you don't understand anything, I just said in the last two sentences, don't worry that you really don't have to understand the details in order to follow along with this session. But basically, it took us around four months of work a hundred and Forty-Eight training jobs of which three hyperparameter tuning, jobs, 49, training hours, and we finally achieved an accuracy of about 83%, which is pretty good. Four months might seem like a lot, but keep in mind that this included in a gathering, all of the training data and keep in

mind that I'm not necessarily an expert in computer vision and hadn't really worked with these algorithms before. Let's take a look at how we actually built this in sagemaker. I'm going to show you how we built the first version of our sign language demo in sagemaker. Don't worry, if you don't understand all of the detail, I am going to walk through the code very quickly. It's just a really give you an idea of the amount of work that we put into it. And the amount of code that we needed in order to get this running, shows to use pytorch in order to apply to

transfer learning on the best 918 model. But you can really use any deep learning framework that you prefer such as tensorflow mxnet or anyone that you're comfortable with. What's the code? What you see here is my training script and actually took it from the pytorch tutorial on transfer learning. And really what it does is relatively simple over here, we have some methods in order to save the model and save the classes to disc. But really the most important message that I went to look at is the train message. It says a couple of things. First, it will load in our data,

transform it and put it into a dataset for us to use. Next it shuffles and splits the data into training and validation sets. Then we go ahead and load the pre-trained resnet 18 model from the pytorch library. and all of the other code that you see in this file, really just starts up the neural network that we want to apply transfer learning to and goes through the training loops and the different epochs This is the training script. I'm extra if I want to show you, is the surf script, which is really just the script that we use during inference once the endpoint is up and running.

So this is a much shorter script and only has a couple of methods. First, we have the model, effing method model function, which really just loads the model from the location where we saved it. Then we have the input method here which takes the input from the API call and transforms it in such a way that we can then feed it into the model. Send out here. We have the output method which takes the answer returned by the model, make and formats that in such a way that it's just a little bit easier for us to read. So this is the

inference ribs that we use. Finally, I have this jupyter notebook here to show you how I actually applied a script in Amazon sagemaker. So the first cell here, just shows you very high-level how we run hyperparameter tuning. So we actually applied tuning to multiple hyperparameters. In this case, we point it to our training script over here and then we can go ahead and run that through sagemaker which will spend up all the necessary resources in order to complete the hyperparameter tuning. What's 3 / from returning is finished? Sagemaker, can

hopefully display this table with the results and the top row right here, the one with 80% of final objective value is our best performing model, out of the hyperparameter, tuning shops. This is the one that will actually use to deploy the endpoint, which is what the Nextel of code does down here. What's the endpoint has started up, then? Go ahead and apply inference to all of the images in my test set. So you can see that I had an image where the correct label was cat. The model actually predicted that the sign was Kat with a 99% confidence, which is great.

Using the results from my test that I didn't generate a confusion Matrix, which, you know, how to read this. This is a pretty good result, it's quite positive if you're not sure how to read this, don't worry. I will explain things like computer mattresses, Precision recall, F1 score a little bit later in the session. Finally, at the bottom of this notebook, I calculate the Precision recall an F1 score for each label separately, which is what you see in the stable, as well as a Metro average Precision recall, an F1 score over here at the bottom that you can see, the results are pretty

positive. These results were pretty good and we were actually able to take our demo to reinvent last year in December and try it out with a live audience where it performed extremely well. So we're very happy with that. Now at the same event at reinvent last year, the Amazon recognition team announced a new feature called custom labels, and I'd like to dig into this new service a little bit deeper If you're not familiar with Amazon recognition, it's a service which was launched in 2016 and it allows developers and customers to easily add computer vision features

to their applications. It works on images and video and it allows you to detect objects scenes, faces inappropriate content, and text in those images and video. Amazon recognition is great, but it's sometimes not specific enough for your particular use case. Let's take the example of a car company. They might have images coming. In of used cars that need to be sold on their Marketplace for example. And they might want to not only detect if there's a car in an image, which the pre-trained version of recognition would be able to do, but they might

also want to be able to detect the make and model of the car. And this is where the pre-trained version of Amazon. Recognition isn't going to be specific enough. To Justice customer pain point, the Amazon recognition team launched a new feature called custom labels. This allows you to train your own custom computer vision models on your own data, but it's still as easy to use as Amazon rekognition was, you really don't need any machine. Learning expertise. It's a simple as using the console or using the SDK providing your data and Amazon rekognition will take care of choosing the

right, algorithm doing the type of condor tuning and starting up an endpoint for you. So if we go back to the example of the car company, they could now as well as you think the preacher and recognition model to the cars, they go to train, a custom recognition model to detect the make and model of every single car. And all they really have to do is provide the data So when I heard about this new service, in this new feature, I figured well this is interesting. What if I tried using it for my sign language demo? So let's take a look at the results. So here are the results

of applying Amazon rekognition custom labels to my sign language dataset. Remember it took me around for months and 49 training hours in order to build the same thing in sagemaker. Keeping in mind that I did have to do all of the work of gathering the data set and pre-process, pre-processing it from video into images, which I didn't have to do in this case. So, it only took me about an hour to recreate my model from sagemaker in Amazon, rekognition custom labels and it actually get slightly better results. If you look at the F1 score over here, the average

Precision over here. And the overall recall here, total training time was only about 45 minutes. So this is really great results. And possibly a slight Dent my ego. If you think about it, it makes complete sense. Amazon recognition as a team of world-class computer vision experts in order to build the service. So it real Makes sense that the resulting model on the same dataset performs really well. Let's take a look at Amazon rekognition custom labels and how you can get to this point with your own data. So to use Amazon rekognition custom labels, you are going to

have to provide it with a dataset and there's a couple of different ways in which you can do this. If you already have your data stored in an S3 bucket, then you can simply upload the data from that Amazon S3 bucket into Amazon recognition. If you happen to have use Amazon rekognition custom labels before, and you have an existing data set within the service. You can also create a new dataset from this existing dataset. If you do have the images stored on your local computer, there's also an option to Simply upload them directly into Amazon recognition. And finally, if you have images

that aren't yet labeled, you may want to consider using Amazon sagemaker, ground truth, the first label, your images store them in an S3 bucket and then upload this as a dataset to Amazon recognition. So, let's take a look at how I did this for my sign language. In my case, I already had my sign language data uploaded into an Amazon S3 bucket because I was using sagemaker previously and this is what my S3 bucket looks like. So right now we're looking at a directory that contains my training data and I decided all of the images for the training babe data into

separate subdirectories, where the name of the sub directory, indicates the label for that particular image. If I click, let's say on the cat subdirectory, you'll see here that I have all of my images with my colleagues, signing the sign for cats. Now, the advantage of storing your data like this is that Amazon rekognition can actually read a folder structure and use the names of these folders to automatically label your data. So you don't have to go through any extra work in order to get the labels. This is a very common way of storing data for image

classification, use cases. But if you are going to be using object detection or semantic segmentation, then this isn't going to work and you are going to have to use Amazon sagemaker, ground truth instead. If you're not familiar with Amazon sagemaker, ground truth, it's a service which makes it easier and say to a lot of time and effort in order to label your data. The way that it works is you provide it with your unlabeled data and you have a set of human out of haters that you want to label your data. But as the data is being labeled by the human and taters, Amazon sagemaker, ground.

Truth will actually train a machine learning model in the background. This machine learning model will improve over time as more and more labels come in. And finally when the conference is high enough, it will actually take over part of the labeling effort which no longer has to go to the human entertainers when this happens. And if the confidence of the model is high enough for particular label, it will simply send that label on its way, but it's a conference is too low. The model will send the image to a human attitude or anyway, and that label generated by the human is son returned to

the model in order for continuous learning to happen. Know when you are using Amazon sagemaker. Ground truth in order to label your data. It will generate what's called a manifest file. This is its output, a man of his file contains. All of the information on where to find the image, as well as the label that you've attached to it on the slide. I've included a small portion of a manifest file. As an example, of his file is a Jason lines file, which means, it has one Json object per line, and the slideshows just one of these Jason objects, this represents one image with its

corresponding label. Let's take a closer look. It might seem a little bit daunting at first, but if you go through it, it's actually pretty simple. First line is what we call The Source reference and this is really just the S3 location of the image that you're currently looking at. That's pretty simple. Next, we have. What is really just an identifier. So note that the name of my teeth all you pair here, but the key itself is called signs. And that's because this particular manifest file is, for my sign language. They decide you can actually change this key to be more

relevant to your particular use case. If we think it's the car company, the key could be make or model, for example. So the key is really up to you. And this line is really just identifier. That's all it is. Next, we have the start of a sub object within this Json object and here again, all I need you to look at, it's just the key. In this case it's called science metadata again because it's for my sign language project, but you can also change this name here to make it more relevant to your particular use case. Everything within this subject relates to the

label that you've assigned to, that particular image of the very first line within this subject, is simply the class name. This is your label. So, in this case, I had an image of one of my colleagues, using the sign for hello there. For the label, is simply. Hello. Next. You have the confidence. If you have a human labeling, your data at the conference will usually be one, but it's the machine learning model within Amazon sagemaker. Ground truth took over and get the labeling. The contents can be anywhere between 0 and 1. Hopefully at the higher end of that scale. Next you have the type

of labeling task that you're using in sagemaker ground truth. There's various different tasks that it supports in this case were doing image classification because we take an image as a whole and assign a label to it, but you also have options like object detection semantic, segmentation text classification and more this value will stay the same for your entire data set because you're really only using one type of labeling per day to sit. Text. You have a flag, which indicates if this particular label was generated by a human, or if it was generated by the machine learning model

within sagemaker, ground truth. And finally you have the creation date, exact date, and time at which this particular label was generated. Now noticed that I've actually highlighted the top four, lines and blue and that's because these are really the most important lines. But I want you to focus on the rest of the information in white. It's useful to have, but it's not going to be crucial to training your machine learning model. To really make sure that the information and blue is correct. Make sure you have the correct location to your image. Make sure that your identifiers are also correct

and make sure that your labels are correct. That's really the most important bit. Now, if you are using sagemaker ground-truth, it will generate this manifest file for you. But now that you understand how it's built, you can imagine. It would be pretty simple to create a piece of code that can generate it if you're not using sagemaker ground truth. And actually if you're going to use the same method of uploading data that I use, by storing your data in an S3 bucket and folders that indicate the label then recognition custom labels will go ahead and generates manifest file for you as

well. So let's take a look at the Memphis file that is generated for my particular, use case. But here you see the Manifest file which Amazon rekognition custom labels has generated for me based on how I had stored my data into these subdirectories that I showed in the previous demo. As you can see I have lots of different lines and each line looks very similar. Each one is a Json object. So if we look at one of these lines in a little bit more detail by making it easier to look at, you can now see that we have a very similar structure to what I was trying on the

slide. The first line here is the source ref which is the location of that particular image in my S3 bucket with that has the identifiers that I was mentioning before and we have all of the information related to the actual labeled which in this case was cat You might notice that we do have one additional key value pair in this Json object, that's because they keep the keys. All your parents that I was showing on the slide are all of the keys out of repairs which are mandatory. You can actually add additional information to each Json object that something which is useful for your

application. But in general, you should really recognize the structure that I'm showing in this file generated by Amazon recognition as compared to the structure that I was showing on the slide. So once you've uploaded your data to Amazon rekognition custom labels, the next step is to actually trained the model and this is super easy. It's literally the click of a button. You pretty much Point, Amazon recognition to your training data. You point it to your test data, you give the training job name. That's all Amazon rekognition will take care of analyzing the data for mining the data,

choosing the right, algorithm doing the type of proper training and everything afterwards. So it's really, really straightforward. Once it's finished raining, you'll have a couple of evaluation results that you can use to figure out how good your model actually is. These results will be shown in the ugliest console, but you can also access them through the SDK. Here, you may need a little bit of machine learning knowledge. In order to understand what these results mean they're provided to you as Precision recall F1 score as well as b-metro average Precision recall, an F1 score.

So if you're familiar with machine learning and you've been in this field for a longer. Of time, you'll know what these mean. But if you're new to this field, I'm going to try to demystified these terms, just a little bit. I like to use Venn diagrams to explain what these terms really mean. So let's imagine that everything within the gray box is our data. Everything within the blue circle is the data that we know to be true and everything within the orange circle is the data. That our model stinks is true. When you do in France,

in the model says, data is true and lined up in the orange circle. Let's take a look at how this works. Everywhere that the circle overlaps, these are called are true positives because it means we know the date as true and the model thought it was true predicted. That was true. Show me everything that's within the gray area of the rectangle but outside of the circles, these are our true negatives because it's the day that we know to be false and the model correctly predicted that it was false. If you look at the blue circle again, and this time, everything was in the blue

circle except the overlap, these are are false negatives because it stated that we know to be true. But the model predicted to be false to hear the model was incorrect. And similarly if you look at everything within the orange circle but not in the overlap these are are false positives so it stated that we know to be false but the model actually thought it was true. I can't imagine that it's a little bit difficult to understand with an abstract problem of true or false. So let's take a look at this with a real world problem. Let's imagine that you have a Spam classification.

Model, an email comes in, it goes through the model and if the model classifies it as spam it gets sent to the junk folder, if it classifies it as an important email inbox. So let's take a look at this. Everything that is within the blue circle. Is the emails that we know to be spam. Or is everything in the orange? Circle is the emails that the model predicts. This time for True positives are where the model correctly predicted that the email as spam are true. Negatives are where the model correctly predicted, that the email was not spam. Art false

negatives. Are the emails that we know to be spam but the model predicted that they were not and are false, positives are the emails that we know not to be spam but the model predicted that they work. So I'll give you a moment to think about this but in this particular use case, would you want less false negatives or less false positives? Hopefully if you think about it for a moment, you'll come to the conclusion that in this case, you would want less false positives. These are the

emails that we know are not spam but the model did mark them as such which means they ended up in the junk folder even though they really shouldn't. And I don't know about you but I don't really check my junk folder all that much. So I could miss my important emails, if I had false positives, so here because we want to decrease the number of false positives, the metrics that we want to look at is precision. I'm not going to give you the exact formula because it's really easy to look up and understand yourself but Precision is pretty much a formula, which measures the true positives and the

false positives. Send this case we really want a high-precision for this particular, use case to make it clear. Let's imagine that we have an algorithm which analyzes an x-ray from a patient and try to determine if the patient is ill or healthy. So everything was in the blue circle, these are the patients that we know to be ill everything in the orange circle. These are the patients which the model predicts to be ill. So let's have a look at. This are true positives everything within the overlap. These are the patients that we know to be

ill and the model predicted that correctly are true negatives. Everything in the gray rectangle area outside of the circles. These are the patients that we know to be healthy and again the model predicted that correctly. Are false negatives. In this case, these are the patients that we note the ill but the model predicted that they were healthy and are false positives. In this case, are the patients that we know to be healthy but the model predicted that they Hopefully, it's clear than, in this case, we actually want to have less false negatives. These of the patients that were

ill. But the model predicted that they were healthy, which is definitely the worst in this case and it's what we want to avoid. So if you want less false negatives, the metric that you have to look at is recall. And again I'm not going to give you the formula. It's really really easy to look up but it's really just a measure of the true positives with the false negatives and the case of spam classification we wanted a high-precision and in the case of the patient classification we wanted a high Recall. Why are we looking at these two metrics separately? Well the reason is that

these two metrics are often opposing forces generally. If you increase the Precision, your recalls going to drop and the other way around, Now, if you do have a use case where you don't really have a preference for either recall or Precision, that's when you can try to use the F1 score and the F1 score is really just the harmonic mean between the two between precision and recall. Hopefully this helps you understand a little bit better. What these metrics mean? And now we're going to do is we're going to take a look how this actually reflects on my own use case of sign language. So here

you see the results of the model that Amazon rekognition custom labels trained for my sign language dataset. And now that you understand F1 score, precision, and recall, you hopefully noticed that these results are not bad. But the top, you see the Metro average F1 score recall and Precision in the table below. Amazon rekognition will show you the same F1 score precision and recall but this time for each label individually, you can see that the label cat performed particularly well, keeping in mind that I'm only testing it on 5 images which is not a mess of test set. Now,

what I'd like you to look at on this page in particular, is the assumed threshold column all the way on the right of the table. This assumes threshold is a threshold, which Amazon recognition will calculate for you. And the way it works is as follows, if you have an image that you send to the trained model, the model will return a label that it thinks it's likely as well as a confidence level. So let's say that, in this case, we send an image of somebody signing cat and the model return. The label cat with a confidence level of 60% that because 60% is higher

than the assumed threshold of 31%. The model will go ahead and return that label. If however, the confidence level was below 31%, which is the assumed threshold for cat than the model would actually not return the label cat. So why is this assumes threshold important? It's important because this is what you use in order to control your precision and recall metrics. If you increase the assumed threshold closer to 1, your Precision is going to increase, but your recall will most likely drop so you can really use these thresholds. But you can actually

adjust when you make an inference call in order to find the right balance between precision and recall. For your use case, you some fresh holds calculated by Amazon recognition or simply what it recommends and it thinks is best for each label. But you can adjust this for your for your own use case. No. Amazon rekognition also allows you to look at the results of your model in a little bit more detail. You can see the results for my test said, we have Stephanie here signing cat and we can see that the model

returned the label cat as a true positive, so it got cat correctly and it had confidence of 98.5%, Which is far above, the send threshold, and it was therefore able to return this label, this is an example of a true positive. Now, if you noticed in the filtering box on the left, I have true positive, false, positives and false negatives. But no, true negatives. This actually makes sense because I don't have a single image in my dataset where somebody is not signing a really there's no such thing in my use case as a true negative and therefore there's nothing

I can filter on Let's take a look at the false, positives and false negatives. Here we see an example of Dave signing 8 and you can see that it has both a false-positive and false-negative. So let's understand what this actually means. So the model when it was fed this image it did actually think that the label ate was one of the potential options but the confidence level that it returned for 8 was 30.8% which is below its assumed threshold and therefore it didn't end up returning 8 even though this is the correct

label and this is what is a false negative in my particular case? A false positive in this example is Grandfather. So the model thought that grandfather was another potential label and the conference level that it has for grandfather of 25.6% is above its threshold. In this case, the model returned grandfather, which is incorrect there for a false positive. It did not return 8 which had actually been correct there for. This is a false negative using this user interface provided by Amazon recognition. Really allows you to understand what the true positives, false positives,

false negatives and true negatives mean in the context of your own use case. So now, you know, how to evaluate, how good your model is from the console, but Amazon rekognition also saves these results in two files into an S3 bucket. So, if you prefer, you can also use the SDK to get the results directly from those files. The two files, it uploads are a summary file and a manifest, a shot, and it pretty much contains all the information that you just saw on the console as well as an additional confusion Matrix. So let's take a quick look at the

summary file that Amazon rekognition generated for my sign. Languages case sit here. You see the summary file that has been generated by Amazon rekognition custom labels after training and model on my sign language data, Again, it contains all of the information that we just saw on the console, but in addition, it also has a confusion Matrix all the way at the top here. This will be very similar to the confusion Matrix that you saw me generate in the Amazon sagemaker notebook. So all you have to do is grab it from this file and visualize it in order to take a look at it in more

detail. If we go further down in this file, you'll see that it provides the other metrics as well. Sit here in the middle, we have the Metro average F1 score, precision and recall for all of the labels as a whole. Below that, it shows all of the unique labels that it found in your data set, as well as the number of testing images, and the number of training images that were involved in this particular model. Finally, at the bottom of the file, you will receive the individual F1 score precision and recall as well. As the assumed threshold calculated by Amazon recognition

for each label separately. so now, you know everything that you need in order to properly evaluate your model It's not good enough. You can simply go and edit the data and perhaps retrain it. But there's also some options to actually have a feedback loop built-in. Once you have a model that's trained, the Amazon recognition team has provided an open-source model feedback solution which is available on GitHub effect. I've included the link in the extra resources that will be attached for this session. What are the last you to do is once you have a Trane model and you're sending images

through this model, you can use the model feedback solution to then send the results of the inference to an Amazon sagemaker ground. Truth job, we're human and potatoes can look at the results and provide feedback. They can perhaps edit them or they can accept. The answer is the correct one. This is just another way in which you can ensure that your model that you're using is really correct and that you're happy to use it further. So hopefully once you finish the evaluating your model, you'll have a model that you want to deploy to production Amazon. Rekognition actually takes care of,

starting up an API endpoint for you. So you don't have to worry about any of that, but you do need to start and stop the endpoint when you plan on using it, this prevents you from paying for resources that you're not actually using. So always remember to start your in point when you're planning on using it and don't forget to stop it. When you're finished. About in terms of pricing for the endpoint, you're actually paying certain price per infants unit per hour. An infant's unit, really? Just represents the amount of throughput that the endpoint can. Except in this case, one inference

unit represents about five transactions per second. Whatever you're starting up an endpoint, you can determine the minimum TPS value that you want the endpoint to have. And this is really just the minimum throughput that your endpoint will have at all times. But don't worry if you do experience of peak and it's all recognition actually takes care of the oddest feeling for you. It will scale of the resources behind the endpoint to handle the additional load of incoming data. And once it's finished, it will actually scale back down to the minimum TPS value that you set for your input. We

always recommend, starting with the lowest TPS value of one for your endpoint. You can monitor, how much is actually coming in through Cloud, watch and adjust the minimum TPS based on that. Don't you start up your own Point? There's two different ways that you can actually send an image in order for inference to happen. Amazon rekognition, accepts the images either as a base 64 and coded by a ray, or if you have the image stored on an S3 bucket, you can simply provide the location of the image within the S3 bucket. Remember during the demo that I showed the assumed thresholds,

which Amazon rekognition calculate for every label, well, when you actually make an inference called, you can add an additional parameter to adjust the threshold that's being used. And this is how you control the Precision and the recall of your model, you can change this with every single inference call. That's really all you need to know on how to use Amazon rekognition. So in order to close off, let's you might be wondering, you know, when you use Amazon recognition, or when should you use Amazon sagemaker? I've used both successful. Wait for my sign language project so what

are the differences? Let's take a look at a quick summary. Amazon sagemaker is a really powerful tool that supports Advanced machine. Learning use cases, anything you can build in mxnet pytorch tensorflow, whatever. Deep learning framework. You're most comfortable with, you can build an Amazon sagemaker. Amazon rekognition custom labels, currently only supports images and it supports the use cases image classification, object detection, with bounding boxes and semantic segmentation. So if it does fall into one of these three areas, then it's actually easier to use Amazon,

recognition, it requires no, machine learning expertise, but even for advanced data scientist, it makes for a really fast experimentation tool in order to build your first base line model. so really use Amazon rekognition if you have limited experience with computer vision or if you just want to get a model up and running very quickly, Now there's two more reasons why you would want to choose Amazon sagemaker, currently over Amazon. Rekognition one, is if for any compliance or regulatory reasons, you need full control over the environment and the algorithms that you're using

that it's probably better to choose Amazon sagemaker. Amazon rekognition will not reveal the underlined algorithm that it's using or the high-profile murders that shows after tuning. So, again, it's for compliance reasons, you need this information then go ahead and build your own in Amazon sagemaker. And finally, although they pre-trained version of Amazon recognition is currently available in the Sydney region. The custom labels feature is unfortunately not yet available if your data needs to stay within the Sydney, region for data sovereignty reasons than currently, you will need to

use Amazon sagemaker. So, I hope that helps you make a choice and I hope you choose to use one of these two wonderful services. Know if you're interested in the sign language them out, which I've been talking about repeatedly throughout this presentation at we do actually have the first version of our MVP code out on GitHub. I've included the link in the additional resources for the session. If you want to build your own version of this demo for a sign language of your choice. I'd really love to see what you do with it. Have fun. Finally, I just want to thank you for

attending the session. I really appreciate that. You took the time to listen to me, talk about this project. Please fill out the survey. It would really mean a lot to me to hear your feedback. And I hope you have a great day.

Cackle comments for the website

Buy this talk

Access to the talk “The Effortless Development of Custom Computer Vision Models - Level 300 (United States)”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free

Ticket

Get access to all videos “AWS Summit Online 2020”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “IT & Technology”?

You might be interested in videos from this event

November 9 - 17, 2020
Online
50
276
future of ux, behavioral science, design engineering, design systems, design thinking process, new product, partnership, product design, the global experience summit 2020, ux research

Similar talks

Pedro Paez
Specialist Solution Architect at AWS
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Donnie Prakoso
Senior Developer Advocate at AWS
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Kapil Pendse
Senior Solutions Architect at AWS
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video
Access to the talk “The Effortless Development of Custom Computer Vision Models - Level 300 (United States)”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
944 conferences
37527 speakers
14298 hours of content