Yisroel Mirsky is a Tenure-Track Lecturer at Ben-Gurion University. He received his Ph.D. from BGU in 2018 and was a Postdoctoral Fellow in Prof. Wenke Lee’s lab at Georgia Tech. His main research interests include deepfakes, adversarial machine learning, anomaly detection, and intrusion detection. Mirsky has published his work in some of the best security venues: USENIX, CCS, NDSS, Euro S&P, Black Hat, DEF CON, CSF, AISec, etc. His research has also been featured in many well-known media outlets: Popular Science, Scientific American, Wired, The Wall Street Journal, Forbes, and BBC. For example, he has exposed vulnerabilities in the US 911 emergency services and demonstrated the threat of deepfakes in medical scans, both featured in The Washington Post.View the profile
About the talk
Ben Nassi, Ph.D. Student / Security Researcher, Ben-Gurion University of the Negev Dr. Yisroel Mirsky, Postdoctoral Fellow, Georgia Tech This session will demonstrate how attackers can apply split-second phantom attacks, causing two commercial advanced driving-assistance systems (Tesla Model X and Mobileye 630) to trigger a sudden stop in the middle of the road, apply the brakes, and issue false notifications. A countermeasure consisting of four neural networks will also be presented that assesses the authenticity of a detected object.
Hello and welcome to our talk. Securing Tesla mobile. I from the second and I will resend the first part of the talk has been to see on the video on Vanguard University. Okay. Now this talk is based on these papers with Republic FC, S & S Auto Tech 21 workshop and you can find them online. The first paper which name? Discuss about perception of computerized driving and visual perceptions ability, to interpret the surrounding physical environment, using an algorithm is a plane from Vegas. And there's no perception of study or pool.
And how do I factor in all the years? And as you can see it on the bottom picture from leaking from the Death. Note real estate stencils. Our radar has lighters in GPS, and yes, I sold on top of this layer. I saw it just before cancelled by the Congress Street, and the reasons are usually used to provide redundancy and increase. And the combination of sensors used that I don't have any pictures of her example. Tesla includes on facing radar. Set up, video cameras, and sensors, Yandex driving cars. On the other hand, include lay down lidar. And instead
of Not on top of this layer by layer algorithms. On the top of the cell. Does the player I will speak about this bike from now. And the last daily perception of the situation refinement, which mainly consists of which intended to intentions of the nearby objects. And this is reception information. Delete search around big truck. Inspection started a house with Musical. The next time you can see the prototype for the following robots. Call Diaz later today. Commercial Samuel to start running date of commercial sending, please charge Tesla's are about to acquire
in the future. We are about to see any action as improved significantly over the last year. Now don't get me wrong. That's still open challenges for Mutual perception. One example, is acquiring 11:45 unknown or unnecessary considered the complex. Also, the science need to find a way to overcome and perceptual challenges and Tom will know they do accidents Devils place. You have in these two, well-known fatal accident drive directly and crash. RC cars with parts on the side of the road. The old hotel. The
stimulate also emphasized, the difference in the perception of challenges currently and I want some stuff about the dust and rocks in our contacts are physical objects that are perceived differently. Then buy you a human and showed that by Eddie physical artifacts and each was done by Evans speaker. It was done by a road sign on the right side. However, we are hearing about adversarial attacks for The Last 5 Years in the wild. And we tried to think about about
the disadvantage of texting, which is something that I would like to avoid. Also, consider a wired mainly limits the amount. Acquire. Automatically forensic evidence is the texting for the fastidious may help the police to find that occur. And also some of that wire and for those of you who have trained, you have left work when I just left work, somehow I got to come pick a train mod. No, I want to spend his life in picture and identify the location of instances of objects of 7. +. This can be the best you can do.
Play as in put and how to set up volume boxes with precipitation Associated option. In this light, you can see on the left, on the left side of the detector in the middle. You can see the power of and Start with the rain and it was indicated before your brain as an input and all the lives of object acted in supplies to remove duplications of several architecture. Do some other interesting architecture and how to remove the duplication of object, which is followed by.
This is another company called the most positive detection and verified with them except the object. No. Mexican restaurant in the lower a commonly used practice in the area. You can expect the desired by increasing the pressure. No, income tax return. Drive Issaquah is the object, a distance, as possible, so that the car immediately. So how to trade a phone where they feel special. But it should be short enough to detect objects and our cases, please second. Driving car.
Okay. Now it's only object. Of time greater than With this question, can we apply for the time by flying in the duration that is slightly longer than about. 2nd and Jackson from the car. Or he's going to be done by hacking and internet connected, digital keyboard and embedding the phantom, phantom object into and advertising. A, the significance of suggested or do not require. They do not rely on White Horse Road. Do you want to claim any evidence at the scene
and speaking in French? What does a commercial on a tree? In the picture on the left side. Edward Place, mobile. it's probably not that surprised because it consists only of we made it out of experiments. We took the picture and put it on the road, and we placed in front of this now. Hit it. An interesting fact, I wish was to .5%. Radar detected by you. Tesla Model, X. Repeating the same experiment. However, this time with projected on the road again, back to .500
or the picture of the speaking, from the West. no, I think that you would agree with me today that he thinks and what is action Bill and no, I want to barf. If you don't come see the face of an injection to distinguish between Why are friends of design training process and a white bird in the context of autonomous driving? It seems that the training process creates its function better than human TV light. And you can see the article about 2 years ago, 95 State Road
about musical, however, are not taken into account, or you can see how Coronavirus projected on pre-cut Collins. I'm taking into account, the road signs. Consist only of being detected by a sequence. What is the factors for basically be considered an object? That looks like to understand. How about in the Bible and the first experiment try to use to stop at 2 a long time. I forget to stop and reflect. Its valuation in front of the light side. And we found it. How
do you say 100% of the time? Greater than No, it's a beginning explained about the remote and one was an experiment and music. Splash a road. Sign, 125 on the building, on the right side of the road. Do you drive your car was equipped with the movie? Like, it's coming from the fraud detection. Balcom road sign is Green, Arrow sign and notified notified about the driver. Also conduct an experiment for a road sign. We protected The Pedestrian on the road, from the side of the road are
using it for Jetter. And we didn't choose the second projection. I mainly. Because if you can sell and you can see how the car. Text The Pedestrian and immediately and you can see it again right now. Find a second way to if I guess, I'll ask you to reply to here. Mini digital billboard advertising block and block based on how much is this area. Are basically on respect to the people basically on how distant the block from the expected to point. And we also compute a global scope in every block in the global scope computers with respect to the score
of the block. This is the time you come back to me. How old is Olivia? This is a demonstration of the outfit of the advertisement and indebted. Stop sign food, advertising accordingly. Big Flash for 500 and conducted in. Winn-Dixie and pepper, and as you can see, I did not see the best board. Where is Ivy Tech? Mexican stop sign and stop in the middle of the road. Now. I will try to continue to talk. Thank you. Been hybrid, buddy. My name is his real risky. I'm a postdoctoral fellow at the Georgia Institute of
Technology, as well as a research project manager, at the Penguins cybersecurity Research Center in Israel. Now that we've seen the vulnerability and understand its implications. Let's talk about what we can do to strengthen our AI systems against such a text. In other words. How can we close this gap between the AI? Now, even though it's on those Vehicles, use Sensor, Fusion, for example, radar and lidar, the tunnel assistant will still be active object detected in the images alone. However, this makes sense. If you think about it, if the person in the middle of the road based on the
camera alone, you're going to react because you don't want to cause a fatal mistake. Therefore we need a solution which consolidate objects identified by the camera sensors AI. We also need the solution to be independent and lightweight said we can easily integrated into existing systems. So how we going to do this while? We thought about this a lot and we try to think of reasons why he can flat projected image in a way. I cannot be understood that an object detection. API has been trained to do. One thing, only identify objects by matching patterns of geometry and contrast flag. As soon as
Texas or contacts are right, for example on the right. We can see that Google Google's Cloud AI is convinced that this hairy amorphous blob is a cat, just because it has four eyes and some Of course, we still love deep learning models, but we just want them to be a little bit more intelligent. Until then what we will do to identify date. What will we do to identify these objects to take to buy the camera? Well, instead of throwing another class part of the problem, which will fail for all the same reasons. We will take a look at the detective object and then analyzed different aspects
of the object to determine if it is real or fake. We've identified seven aspects which can be extracted from an image to better capture the truth. Let's look at the attack scenario where Phantom road sign has been projecting you the car. First, we can consider if it's beside the side is larger or smaller than it should be, that might be problematic. For example, a traffic sign which is not regulation, size, should be ignored. The size of the sign can be determined on vehicles, which is multiple cameras to stereoscopic Imaging. Second, we can look at the angle of the sign cameras capture
perspective. So the shape of the signs does not matter if this does not match, then we can consider it being fake because it is being ankle facing away from the car service. The focal range of the camera can also be used to identify range. Whether the sign is contextually that and understand whether the sign makes sense. Speaking of context, if the placement of the sign is impossible or simply abnormal. It indicates that is a phantom. For example, a traffic sign that
does not have a post or some pedestrian floating over the ground. We can also cancel the service for the sign if it is distorted lumpy or has features which do not mess with them cool features of its effective sign. For example, think of what a stop sign may look like if it were projected on a brick wall. Another aspect is light being emitted from the sign since Phantoms, commit their own lie to the other projector, or TV screen. They will be inherently different compared to the light reflected from the This can be determined past week to image analysis or actively by shining a light source
onto the object. Finally, we can consider the depths of the sea. We can, we can detect Phantoms. If the object has the wrong, certain surface shape or its placement in a 3D seen is that normal for example, a traffic sign, projected onto a tree or a three-person work are projected on the surface of the road. However, how can you get the depth perception, from a single camera? So, we use it to technique, where we take a look at two subsequent frames, captured by the camera. If there are captured while the vehicle or projection is in motion that we can compute. The depth implicitly using
optical flow much like steres Garfunkel. Give you a better idea of what I'm talkin about. Here's what it looks like. On the left, is the first train. Then using the second frame. We can compute the image on the right. The directional shift between the pixel and the brightness is the speed. Now I mentioned what a person projected on the road would look like it would be a solid flat color. Just like the rest of the road. This is a strong indication of a phantom compared to a real person. So how do you utilize multiple perspectives about falling at the same traps as before?
So we suggest using a machine learning and so Hope Road called the committee of experts model on the same image. Then you can find it for their predictions by. Considering each of their viewpoints. What is Kian and a committee of experts is that there be disagreements. These disagreements help in some produce more robust predictions. To violate this approach. We developed this defection model called Ghostbuster. Here. We only consider four of the seventh aspects, context, surface light and depth. The way the system works as
follows, when the onboard object detector identifies, an object's, a road sign their crop images past where motor for verification, then for different aspects are extracted from the image and passed to their corresponding experts. The experts, the object in their perspective makes sense, or not. Finally the internal representations also known as embeddings from each of these networks are concatenated and pass through a single Network which makes a decision based on the experts feedback. Although this model may seem intimidating is important to note that he only has 1 million parameters,
which is nothing there for adding. This model has a negligible impact on the cars resources. When evaluating seventh day of the art object detectors, be found that the models were simple, to send him a text with a 92 to 99 % the tech success rate, however, with the defense model and place, the attack successfully dropped one below 1% even when tuning the model to have zero false alarms at work. The table shows that every expert has a use and unique contribution with a combination of all
experts out before the Baseline bottle, which in this case is a single classifier. Garrison visual examples of these disagreements, which ultimately lead to correct for a speed limit sign to be over the highway but the light X. Okay, so we know that the model is robots to the environment, but is it to the adversary? In other words? What if the attacker uses adversarial machine learning to fool our defense model? We attacked a model with eight different adversarial, machine learning a text through these experiments.
We found that the committee as a whole is much stronger than any single expert. This is the combiner model can identify an outlier. Only one of these aspects have been altered. So in summary, there exists, a perceptual gap between a II and human drivers models to consider multiple aspects or perspectives on the data and we can mimic a human intuition. So we should always consider redundancies When developing safety critical AI systems, especially if the AI subsystem has
one, we should analyze the edge cases of an area model. And by trying to answer the following questions. What did it not appear in the trading set that the a model could encounter in the wild? And what did the model really learn? 2, we should test the robustness of our AI models by providing edge cases, and three. We should Sakura and models by retrain model edge cases. And we should make sure we add features of perspectives that distinguish between good and bad cases.
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.