I am a computer vision engineer working at Descartes Labs. I employ machine learning techniques to hard problems in the domains of energy, environment, and infrastructure.View the profile
Kyle Story is a computer vision engineer and applied scientist at Descartes Labs where he develops analysis techniques to turn satellite imagery pixels into actionable insight and decisions. Kyle earned his PhD in physics from the University of Chicago then worked as a post-doc at Stanford University for two years, during which he studied the cosmic microwave background from the South Pole in Antarctica. Now looking towards Earth, Kyle uses satellite imagery to learn about the natural and humans systems inhabiting our globe and build effective, efficient analysis solutions over a dynamic range of scale from your door-step to the entire Earth.View the profile
About the talk
Petabytes of satellite imagery contain valuable insights into scientific and economic activity around the globe. In order to turn geospatial data into decisions, Descartes Labs has built an end-to-end data processing and modeling platform in Google Cloud. We leverage tools including Kubeflow Pipelines in our model building process to enable efficient experimentation, orchestrate complicated workflows, maximize repeatability and reuse, and deploy at scale. This talk will walk through implementing machine learning workflows in Kubeflow Pipelines, covering successes and challenges of using these tools in practice.
Today we're going to be telling you a little bit about understanding the Earth machine learning and then a relatively new Google tool called cupo pipelines. I'm Kyle story a computer vision engineer at Descartes labs and my co-presenter non-fasting way. I'm also computer vision engineer. And with that let's get started. So we're going to run through a lot of things during this talk that three key points that we hope that you can walk away from this room remembering our that in the demand that we work in satellite imagery. There's an incredibly large amount and a growing amount of
satellite imagery and it provides a very powerful way of understanding the world around us II machine learning is a very powerful technique for under processing understanding these satellite imagery, but it presents unique challenges in this domain and third we are using array of tools within the Google Cloud Suite to be able to understand is imagery provide value for a customers in her particular. We've been exploring proof of pipelines and we'd like to tell you a little bit about that today. So what do you think of when you think of the earth?
What do you picture perhaps it's this iconic picture from the Apollo space mission almost 50 years ago at this point a blue. Out in space. Well since that time technology has allowed satellite and satellites to take an increasingly large amounts of data covering our globe through a raccoon and a regular time interval. And what Descartes Labs we're in the business of taking that massive amount of data using the Google Cloud as a backbone and processing got into valuable information and decisions that customers can make so
just to give you a sense of what we can do a satellite imagery. For example in this picture on the left hand with use a particular combination of spectral bands that pull out vegetation loudest understand cross product development in the center of this this synthetic aperture radar band allows us to understand infrastructure or mines and minerals in the further to the right and finally high resolution satellite imagery calling us to study and understand in this case the gas infrastructure. These are fracking installations. Outside of the of active sensors in the visible bands.
They're all is there a sorry pass the sensors there are also active sensors for example here synthetic aperture radar. So this is actually a satellite that's flying over sending down a radar signal receiving the reflected signal and so that allows us to both see the reflections off of metal structures or you can see all the boats flowing into this Harbor and also learn about 3D information in in this picture is that all of us to say that there's an incredible amount of data that's being produced all the time from satellites and other sources. That didn't give us a really valuable
picture for understanding the world around us and driving actual insights. This is a picture that was taken by The Sentinel satellite shortly after the camp fire erupted in Northern California in November can see the plumes of smoke coming off of the place where the fire is burning. Am. I looking into different eye spectacle different I can see we can actually pick out where the fire is actively burning. This is a short wave infrared and a near infrared false-color image we can see where the fire is act is actually actively burning in this case and so by using satellite imagery
that allows us to really understand what's going on in our world in in a very powerful way. What's important than is turning that massive amount of data into decisions or understanding that our customers can use so here is an image of crop Fields over in center of the United States in Nebraska. And we've built out a pipeline that allows us to predict the amount of corn and soy that are produced in a given growing season and then that helps our customers in particular cargo be able to understand how
they move grain around and then what they can expect for prices in the agriculture Market. As I mentioned earlier on we find the computer vision and machine learning is a very powerful way of understanding these data sets and be able able to produce inside at scale and that's why the Romaine that's what their major this talk will largely focus on So within within satellite imagery and computer vision, we think about these along three axes to text map and monitor to detect is finding using computer vision machine learning to be able to find infrastructure find the points of
interest the things that you're interested in all around the globe at your map and goes across some napping those across the globe in the different regions of interest that you are interested in and then finally being able to monitor those in real time being able to understand changes and then use that to drive your business. Sofa an example of detection who built the computer vision algorithm that can detect wind turbines and we can outrun this anywhere in the world where you able to use Google Cloud to map all the winter months in the United States overnight. Laughing so this is a
convolutional neural net that Maps out the footprints of buildings and high-risk resolution satellite imagery and see the satellite imagery on the left and then the output of this neural-net algorithm in the middle and on the right. And then finally Monitor and change over time. So this is a pixel base random Forest algorithm that uses the special information in from the Sentinel satellite to be able to make a mask of where water is it anytime and what you're seeing here is running out over sequential time slices of satellite imagery and you're seeing this Reservoir in Northern
California dry up during the the drought of the previous two years. So is that I would like to hand over the rest of the presentation to my co-presenter faustine to dig into all the things that I've been presenting a little more detail. Thanks Kyle. So I want to preface my part of the talk by saying we're not really all that interested in digging into the algorithms behind our models. If you're interested in domain-specific algorithms, Kyle David great talk at last year's next could this talk what we found a scientist was that you really couldn't treat the
model as a black box, even though we really wanted to. Date of goes in and answers come out. Well what happens to the data and how does it arrive in our model? And what do we do with the output? Where do we store it and so forth. So what we found was that we should focus on our work clothes and how to streamline those. So let me step through this So there's a lot of things that happened after the imagery comes off of the satellite. We have a great and adjusting the vet in just data into our platform that we built on top of Google Cloud. But of course imagery is completely useless if it
can't access it. So we host microservices the allow scientists to programmatically query and access imagery and we interact with those microservices using are Descartes Labs python Library. Once we have training day do we time to trainer model and we build convolutional neural networks in tensorflow and we heavily leveraged Google compute engine to trainer models and found that deep learning VM or really fast to get us started. But of course now that we're ready with our model it's time to deploy and for
large geospatial applications that often means inference over multiple reasons of Interest. We actually have a task system that we built and host and that allows us to scale out to our models by a hundred or thousand times and I'll talk a little bit more in depth Twitter. So to dive in a little bit deeper, I don't want to belabor the point, but our imagery is not just pixels. In fact, it has a lot of earth science related to it. And so tired to analysis we have to go register and co-locate all
our Oliver imagery and we also have to care about things like atmospheric reflectance or weather clouds are obscuring or objects Adventure. 1 * imagery is analytics ready. We have to accept this and because our data platform holds large multimodal data sets. We have API that make it really easy taxes. So in four lines of python code we get imagery over New Mexico almost instantaneously and this is really great for developers because it means that if I want imagery over a different satellite, I just change one
line of code. But deployment is where it gets tricky often. We might want every square kilometer of a large region of Interest like the United States. So how do we do this? We have a task system that we built that allows us to scale these highly elastic work clothes and the way we do it if we have a client that create tiles over the Earth of arbitrary size and each child has a hash associated with it. We have a deployment scripts that knows how to instantiate a model how to predict how to
pre-process and post process that data. And we seeded into our test API if not all runs and Cooper that is so the Earth is massively parallel. I liberal. All you have to do is Chip it in two non-overlapping tiles and each tile goes into a separate Docker container and this leverages all the really nice things about kubernetes. Like I do scaling CPU and GPU no pool things like that. But we want to results back. And so we host a catalogue API weird populating those apis that rest endpoint as we go a
synchronous. So at the end you get a map of the Earth with your inference on top of it. So why are we using to flow? Well conceptually Machine and type ones are pretty straightforward. We getting data we trained them model and then we deployed but we found that really that's not the case and research projects become quickly on maintainable. And I didn't really talk about model training cuz it's probably the least inspiring part of what we do is scientists. Currently, we serve manager own VM infrastructure. We have to manage
our on gpus we have to make sure that Cooter is properly installed and so on and we spend a lot of time debugging things like this instead of focusing on the science, and that's not what we really want. And this is a really awesome system, but deploy script is pretty monolithic and hard to maintain and we really want to be able to have component eyes are deployments. So with those pain pain pain points and many more like for example, we had difficulties with managing a
b training environment force of the environment and we had a brain into hard to debug box bugs with that coupon pipeline seems like a great option for us. It was immediately attracted because the main points of kupo are things that we are running into and we want to solve internally. So what is coupla pipelines it is a machine learning workflow that machine Learning System that manages machine learning work clothes on kubernetes. And it seems to make machine learning reusable and reproducible and take a product and Chillin and product all the way from
experimentation to deployment and of course, it's runs negatively on kubernetes, which is great because all of our infrastructure runs on the cloud on kubernetes and it works really well with our apis. So specifically about pipeline Corey unit of work in pipelines is a operation or components and they all run into a Docker container. And each component has an output and input and we do grass entrance. So it's a directed acyclic graphs of work 11:00. Just all
the wonderful things about Cooper native like auto-scaling. So this is kind of what a pipeline looks like. It's basically a python function with operations inside and we know that this was a pipeline because we had a little decorated pipeline. And all this pipeline dogs, is it download the text file from a Google bucket and print it out and I'll show you this later than demo. Google Map Okay. So here's the fun part. I wrote a end-to-end machine learning Pipeline and cool flow and going to demo it really soon. But the application will be detecting oil and gas sites in
the Permian Basin by these are what they look like they're Caldwell pads. If you don't know it well pet is basically a large concrete rectangle that they put on Wilder drilling for oil and gas and we might be interested in looking for these because You know, we might be interested in monitoring where fracking activity is or by seeing where they're being built that can tell like monitor our energy infrastructure. So we want to turn this into this. So how do we do that? Well, thankfully, I wrote a end-to-end
pipeline in kupo so it goes from pulling the imagery from our platform from just a simple label to deploying on our task system and seeing that one. Okay, so when the demos running you'll be able to see that this pipeline does both training and deployment. And there's some things that we really liked about pipelines. So for example, each components friends is on. Continue working on it and we really can control the very specific packages that we put in it so we can make sure that our training and deployment environments always match
because each component is containerized we can mix and match components so it promotes reuse Under some neat things about type ones that I hope to show you so they have native 10 support integration, which is really great because you can Expect When You're Expecting models performing and you can compare different runs. So it really supports the experimentation side of model building. And there's some great things about deploying with type ones. So it
leverages all the kubernetes infrastructure that we have in a house. It interrupts well with the tools that we use so we have a used tack driver if it's so all those things were cuckoo plow. Managing identities different like work clothes with different scientists is really easy to use like or like Google cloud and carbonate is Authentication. And I didn't really mention this but not the focus of the stock. But what we really found great with that cool flow pipelines are first class has been in a hot
tub, and it's a problem internally where we get we on board New Scientist and we want to get them up to speed and how do we do that while previously we've had to point them to a GitHub repo and say install these packages Rose code and it's really awesome to be able to point to somebody in a tie Hub and download this pipeline running a cluster. And it's very very straightforward now. Okay. Well, I guess I will close off this talk to them was not working and
contextualize what we're trying to do in the so I will actually show you this because this is really awesome. Okay. So this is the UI for coupla pipe buttons and this is. Hello World pipe one that I just showed you from before and it's just it's as easy as clicking run. So let's call this run. Hello. start it takes a couple seconds for kubernetes the schedule for me. The Pod but it should show up very very briefly. There we go. It's running. Looks like that completed. So let's refresh. Hello Google, next.
But you know, that's not really what we're here for. We're here for machine learning work clothes. And this is bad oil and well pads talked about earlier. So it's basically just as easy as last one. Let's call this demo. I can add a description. I can add it to an experiment not do that. There's some parameters that I can set on the Fly. I know I can quick start and it's on its way. So the idea behind to flow is that machine learning should be machine learning should be as painless as possible. And previously there wasn't really great tools
like tensorflow democratized training models. But really training is only a small part of the whole machine learning work workflow and cycle. In fact rain model just one one operation of this multi-step pipeline. All right. So let's check on what are models doing? We go. awesome skimos basically are Json parameter file that tell each operation certain characteristics about the data like satellite. It should pull from what date range. It should look for things like that.
And that's all we provided to our pipeline right now. It's actually pulling imagery live from are Descartes Labs platform, and you can see it streaming now. And this just goes to show how fast are they to access is if you were to try and do geospatial analysis 20 years ago, you would have to work with these very bespoke and oftentimes slow interfaces. Okay, well training images completed it pushed it to a Google bucket. So let's check that out. Here we go. So RV two images
or in the bucket, we can scroll around it. Look at that. So very briefly the machine learning algorithm that we're using to train our oil and well pad detector is What we called course segmentation, so the input images looks like this. So, this is a Wellhead. And they target images. Look like this. So it's a binary image where the ones are where the well pads are and you noticed that it's much much smaller than the original Target image and that's because For simple shapes like these well pads. You don't really need that Hive infidelity. And because
the target images really small bees model train much faster. And you still get pretty good results with a limited amount of data and aluminum not be pox the train. Let's check in on our pipe pipe line. It will update on the Fly. It looks like it's training our model right now. What we did was we pulled imagery down from what we've created and produced UPS and it looks like a funny talk forum. And the great thing about pipelines is bad. This stuff is training on the GPU.
And as long as we have GPU nodes available schedule them. And what's really great is that upgrading your Jeep used to TT use is just as transparent as adding TPU notes to your notebook. Which should be done in a couple minutes? 7 so bad that straining I can show you very briefly. What are labels look like. So this is our space satellite imagery and these are labels. We don't have a whole lot of them, but we found that these Corsicana tration segmentation models trying pretty quickly. Unless we can zoom in on and take a look at some of
these Lopez says this is 60cm imagery over Nate natives from the national agricultural program. So a lot of this imagery is up sensibly for like to research Zoomed in on Wattpad that you can kind of fees if there is a little dark spot on the centerpiece. That's where they put played. Well when they put the pump jack to pump Willow. Okay, so I think that our training should be close to done. Yeah, it's looks like it's saving a model. Takes a little while to upload good. Here we go. So let's take a look here.
And we can see that it's saved our motivated in a club. and if we go back we can see that this metric panels done take a look at that. And we can open up tensorboard right from cookbook. So that's great. We said that our loss went down over time. And it's just launched our catalog and points. That's where we're store a day. Now it's launching our tax system. So this will do the actual interest. So our task takes a little while to start up and that's just loading a task controller making sure that I deploy scripts in there. I'm distributed to the worker nodes.
And it takes a little bit of time to spin up the like to get the resources. So we can see that our tests are pending now. Once that's done. It should start the inference job and it should be pretty fast. yasso good part-time job or logs are available here after multiple multiple in one file. You can like compare the runs and things like that. Okay, so our task system started and we can look at our past Monitor and stupid to do then dude running. So it's already scheduling work on these nodes and look we already have 127 active workers and it's already succeeding. So
as soon as this is done we can look at that butt. And is going really really fast. Okay, so that is completed and coupla should think that that's completed very soon. All right. Awesome. So now this test this component is just pulling down imagery from our catalogue product. That's where we deployed our model over and where we store our results in the should be very quick to And then once this is done, we can look at our final results. So yeah, this is incredibly powerful. We just took a computer vision model from no data to deployed in
less than 10 minutes. And it's completely reproducible With Coupons. All right. Awesome. This is done. Another nice thing about koplo is that you can embed arbitrary HTML assets. So here we go. We can see a tile of what we predicted in the results. So what? Take a look at our product. This is also in dire. Let me zoom out a little bit. Okay, so this is our deployed computer vision model. I can overlay. The base imagery. So let me do that. And then we can prepare.
So it didn't it didn't rain with a whole lot of data and it didn't rain over the whole lot of epochs but it looks like I did learn something. And you know, it's really easy to start the loop-back over again add more data retrain try different models because we've really reduced that. I'm develop recycle exponentially with Koopa part 1 awesome. So that was the demo we can go back to the floods. Maybe Okay, so I want to Kentucky. I want to wrap up this talk and contextualize what we're doing in the in the scale of the science that were doing.
And of course everybody likes pretty pictures. The bees are projects that we've actually deployed the scale. So for example, this is all the trees in and if you're interested in looking at trees, this visualization was done by your awesome Creative Marketing guy Tim Wallace. He has a great medium article available and in principle, we could math all the trees in the United States. And of course, this is really useful because we can monitor where do for stationary reforestation is happening. I'm for something a little bit different. This is a global tunnel to
map. So nitrogen. Dioxide is a byproduct of combustion. So this map is really telling us where pollution is happening. I'm you can see hot spots of where there's a coal-burning for example and the hotspot and South Central Africa is they do periodic the vegetation burning. So this is something that you normally can't see with the naked eye, but it becomes very visible with certain sensor. Literacy where we can see you. And this project shows some of the turnaround and Agility that we have a secret lab. So these are hundreds of thousands of buildings
over a region in North Carolina. And this was done overnight ahead of hurricane florist so that we could serve Monitor and predict how many buildings are at risk. I just a wrap up we have a lot of exciting science sets are crabs closely related to the geospatial domains. And Cooper oil pipelines is a great resource for us because it allows us to handle the scale and scope of our project. Cooper also works really great with the tools. We already built on Google cloud and it's becoming better and better. It's an open
source project. You're Everybody's Free To supports and contributes to it. And there's a lively Community already. So here's a little bit of information on how to get started with Coupe low. and here's a little bit of information about. Thank you for your thank you for your participation.
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.