About the talk
Following the recent initiatives for democratization of AI, generative models become increasingly popular and accessible. The widespread use of generative adversarial networks (GAN) is positively impacting some industries like entertainment, however, they are also used with malicious intent. Believing the fake video of a politician, distributing fake pornographic content of celebrities, fabricating impersonated fake videos as evidence in courts are just a few real world consequences of deep fakes. This lack of authenticity and increasing information obfuscation pose real threats to individuals, the criminal system, and information integrity.
As every technology is simultaneously built with the counter technology to neutralize its negative effects, we believe that it is the perfect time to develop a deep fake detector. Deep fakes depend on photorealism to disable our natural detectors: we cannot simply look at a video to decide if it is real. On the other hand, this realism is not preserved in physiological signals of deep fakes, yet. We present novel approaches to detect synthetic content in portrait videos, as a preventive solution for the emerging threat of deep fakes. We observe that, detectors blindly utilizing deep learning are not effective in catching fake content, as generative models produce formidably realistic results. Our key assertion follows that biological signals hidden in portrait videos can be used as an implicit descriptor of authenticity, because they are neither spatially nor temporally preserved in fake content. We exhaustively analyze with traditional machine learning approaches and deep learning methods; the signals extracted from heart beats, PPG signals, eye vergence, and gaze movement of deep fake actors to create robust and accurate deep fake detectors. Moreover, we trace the source of deep fakes by exploiting their heart beats via residuals of different generative models. Achieving leading results over existing datasets, and our in-the-wild dataset, justifies our observations and pioneers a new dimension in deep fake research.
Dr. Ilke Demir earned her Ph.D. in Computer Science from Purdue University, focusing on 3D vision approaches for generative models, urban reconstruction and modeling, and computational geometry for synthesis and fabrication. Afterwards, she joined Facebook as a Postdoctoral Research Scientist working with Ramesh Raskar from MIT. Her research included human behavior analysis and deep learning approaches in virtual reality, geospatial machine learning, and 3D reconstruction at scale. In addition to her publications in top-tier venues (SIGGRAPH, ICCV, CVPR), she has organized workshops, competitions, and courses in the intersection of deep learning, computer vision, and graphics. She has received several awards and honors such as Jack Dangermond Award, Bilsland Dissertation Fellowship, IEEE Industry Distinguished Lecturer, and GHC Fellow, in addition to her best paper/poster/reviewer awards. Currently she is a Senior Research Scientist at Intel, leading the computer vision and deep learning research in the world’s largest volumetric capture stage.View the profile
Awesome. First of all, thank you for having me at homes. And it wasn't the pleasure before. And this year in this virtual settings. Thanks for organizing everything perfectly. And I hopefully it will be a much better experience for everyone. So, nice topic. I will talk about his heart to heart and eye to eye defects. And what I mean by that is actually a heart-to-heart, which you can see the information here and eye to eye, which is the eyes of different different and the real images here. So we will see about how we can use those heart and
other biological signals to detect. Before. I got a little bit about me and you can share three to everything and reach out to me from these things. I particularly like this photo of mine because it was actually me and Emma come 2018 and it was a pleasure to share what I was looking at a time and a pleasure to Shared to be a part of a Malcolm 2020 and Studios. If anyone doesn't know, this is the huge. Don't We Just Dance. 2000 square feet and doing boy magic chapter that has the world's largest Furniture capture
spice. What is this bombardment of all? The news has heading, right? This is what you will learn today. So all these new pieces are from our work, and From using a pulse and heart rate effect on all of this news outlets. Let's start the presentation. So why are defects and problem right now? Right. That has been many advances in Jensen approaches for computing, power for open-sourcing resources for many developments to have them. Marked rain tomorrow in easily
and like sharing more easily. And for example, orientation go Cinco. There is one of the first ones, then I'm older. Like I just lost our advancements in Yangon in a arch that lead that enables different and what are the effects as you are? Probably already know or is he can see from those images of these are portraits videos which are synthetic and which are most important. I think other people it may be a case of people that don't don't exist. No, I didn't know that is giving besides
there's also the dystopian scenarios on the effect that says they are used for multiple tickle misinformation. Most of these cases for, can you fix are far apart and I'm celebrity people are there are fake court. Evidence is important Nations, Ford Road, and it is foreseen that this will cause a source of which is all the more for authentication and detection, detection approaches for battling with the effect. Stouffer's, for detection of defects, there are many
apples are, there are some similar, porches, at the published, some of them, look at the color changes, some of them, look at the headphones and consistent headphones. I get to the effect is also looking at my place and some of them are looking at other signal, such as blinks be, actually a divided them into his or her face prior. So this will be her first fires on this would be. My birthday is actually has that. I just talked about, look at the fakeness and how
fakeness habits falls on the other hand looking for a unique signature in real videos and their signature dance. Our own heart rate or pulse information has been examined in the most, PPG settings. 4minute different applications for health monitoring for a patient's diagnosis, as such a soft heart rate, as signals can be used for what time do the signals Roblox won't create stop to change to do so. In your veins veins, there may be some changes. The color is a motion changes that is invisible to the eye ball. So
she is actually measuring what blood pressure, be a skin color change. That's what we are using data model, can create the effects of constipation. For picture of you first started with the question, you want, any pears or fake and real videos? Can we find an implicit in the character of biological signal? That being said, this is just to pay my settings. So, there's one take video by Noor Jahan video and they are the cons. Of course, you say with high confidence, which one is which one is real busy. I'm probably be just sitting up here
from the original and they actually the heart rates and the signals very dramatically. So based on our analysis of the sickness, we can check the paper for more details, but you can actually find that implicit function not even find a business function using a scroll spectacled Estes and that just gives us 99% points with 9% accuracy for the pair of our separation problem. And if you just need to cut those for the respective, Dusty's are actually very nicely. Sean
nicely, in nice behavior for fakes and very intuitive. Right? Like for face the time change from your left cheek is not messing the heart rate from you or something. Bring me to the problem. They were separation. Problem. Is it to some of the features face before it's ready on that feature, space of like some of the corned beef hash and some of the other Transformations and we reach that 75% accuracy by using a journal bias, SPM for classifying any video in the world is not that
good, right? Like I can't say, I'm so look for more space. More complex than me. S u b. S t a for the linear, discriminant analysis and separation approaches. And he found out that yes. This price is more complex. I forgot me and you listen to, just text me later is just two legs or just reached that level of complexity so that we can cross paths, and this is a 96% accuracy. She just use a three layer synonym for that other parts of the face. We should take it. As you see here. We did all this
approach with other approaches like an infection and search more complex, more complex structures bought me. Why don't you go see? Those are we are better than the most complex pains and by almost 9% just because we are using biological sing. We also did look at something that the bill or something synthetic image processing approach, is to find out why to CR robustness and how much are such operations? We can accommodate. We also look for multiple faces
or a fake video of its long-anticipated action confidence or a real video retarded in confidence and aggregate. All of those features. We can actually still classified. I'm easily is the second is the defects that has so many videos from a different resources from YouTube to each other presentations, whenever we see if you figured you. And that actually tried to Encompass all the wagon, termination of a couple different generators that are in one that aside so that we can we can actually have Real bass line for defects in the wild. That brings me to
another respect. So is it the same benefits as real or fake anime pictures of the source detection for Generate for generated? Like there are some approaches like fingerprints or the like costing? The artifacts are. So this is our pipeline for find the sources for defects defects come from and where are the casinos as old as the signature. We actually want to see biological signals as the projection of giant of sources in the biological daughter and I'm sure there's
a difference between different effect generator. So this is the real friends. This is a whole different side effect generator. More exercise. They saw the signature just changed their diagnostically. So how can we go to the go to the truth? I will give a very bright future off to approach. Sorry, first. I expect a friend from the video divided them into. Oh my God, blank, the window, and Align them using delaunay triangulation and not in the original request for Frank and extract CBD. Value storage core
values. Are,. We have about two values for frame and an Omega power Spectre of those signals and scale it back to America. So this is what we are plus final. 4 oz equals classification is the most coolest bus station on pbgc with the Symposium architecture, equally probable classes for generators for the zombie. Use the Region 19 and we also had some other. We tried with other backgrounds and Region, 19 is the best between like having enough. And there's also the vehicle association off the ring of
classification. So we need to get those solutions to airport or sometimes, dominated and sexually classified am a correctly based on their confidences. This is our confusion magic Source or seduction. You can see that like we have overall 93% 93.39 % detection. The first one you can see is neural, textures, 81% accuracy. And we are still improving these results. We can also extend to new model. So we doctor dependency on the mudlarks, text Jeremy. The dependency on The Originals or of original videos of the original videos that goes models are
usually. So random text from you before, I knew changes, in our face on five classes, but based on 6, Plus in improved, our protection agency to 96.89 and 4 set up. Yet. We still have 92% that you see, even though the mother never solve the reels of salad. So this is a nice nice Improvement of our sight generalizations. Do any defect in the world. Me to that place in studies. You can check the paper for Diet V8 check for window length. And which window is long enough to capture heartbeat and short enough to
bring out the noise. We also look for us in generator. So this is just on the inference time. Are we created the new data sets from 48 real and fake videos from the five six days? And we looked how in the inference that have power Model Behavior. Because there's not even though there's compression at artifacts that such as you can still see that the face fines expects which come from face to face chat or correctly classified. And other important parts here is the defects that decides where you can see that distance from many,
many different resources and some of them comes from Facebook, even though you can actually find which Mother's Day come from That brings me to the ice ice part of my presentation. So more prayers, more biological prior to detective, fix. This is a very simple. I find a color vision, dad, buy you something at your eyes actually, John. And if it does distance, they still on the same place. And if it has a new you're like the screen or like some prompter that actually in their clothes, myself, I bet there's are on the same play,
you can actually say that it's a real person. And we looked at some several dacus a text to find whether those eyes are fake or not. So here, you can see these like a real eyes and these are all customized difference generators, creating different fake eyes, and you can see the sum of the Arts Tax Service. Some of the effects are quite not visible. But if you carry that to the geometry domain for this point, you can actually see that affect cases exhibit more. Like this is not like, you see, it's not perfect. Play me some Tucker. So
there are different. I classes so they can make some cards or something stations can be longer, because it's not updated correctly. I'm actually have quite the regular distribution, as you can see here, and there is actually the regular tea of the real one. Here is actually lost by Laura Story. I don't have labels. This is the real 3D games point, and these are the gas points of several different effects for. Now. We have so much about for Sunday. If it's very simple
based on what we are working on that to actually explored this the games space. Let's be use the underlying geometry on the line by line formation for improving the defection. So Maya can take a nice. Sorry, these banks have hearts with you need car seats across these harpies. I'm not really a giant. In my side can be used as a ve proposal first met with the detective deepfake videos and vile v. I v believe that have stronger discriminator features from projected onto the bug signals. As I shown in many of the example
something that this doesn't add up to Metro to New Jersey's models with similar and it is very important for continuous integration deployment. Because, you know, every day there's a new generator, if you can come by, whenever there's a new generation we can actually cause fights and we can do maybe different approaches for detective, the samples of that new generation separately. So thank you. This is my contact information. Just check the time. And if you have any questions, I can answer dumb.
LOL, key great presentation. This is Pamela. We do have a question from the audience from the nod. The question is, do you consider changes to buy logical signals heartbeat pupils related to the emotional content of the video, especially when the speaker in the video talks about emotional topic. That's a very good question. I know that some of my colleagues actually looked at not the emotional state but the action units on the face to detect whether they are constants for finding defects or not. But I would say Obviously our
emotional state can be one of the one of the prioress that we depend on because that's also has a consistency in human space, like me cannot be like Angry in one frame and happy. Now, the remaining, of course, there are some factors that can sew what in regular videos by. That does not focus. That can actually be a nice indicator of consistency among different friends. And of course, the gays information and heart rate information is, depending on the emotional contact, emotional
state of the actor. Those are changing their changing consistency just because you are more emotional, your heart rate on your way. I'm getting 10 example. Sorry, but the heart rate on your left cheek, but will be different from the holiday from your license. So nothing formation is changing continuously. For one actor that is to consist of in spatial and temporal tomatoes. So we can still use the word affect. Thank you, and we have a double set of questions from David. He asks
one. Do you expect these biological signals to be faked in the near future and two? You want to answer that when first you want to take them both? Okay. Because that are cheap for that, just because I did, you get to use custom so much. But I told, like, it's good to be prepared for that. So that can be a different TV text sections would be differentiable. And since it's not the brands, like the actual rate that we extract. Gigi from different signals in temporal and spatial so it's not easy to just say. OK. Google Play for Malaysian and like my profile that
is why it can't be done in the signals from the face is actually hard because they have some approximation for the PPG. You need to create a heartbeat for all of the video shows, all of the all of the duration that is not even like that is amazing. Priorities. Also, heart and love me. Even if you don't use, exactly PG formulation to have an approximate. Cancel at PPG signal by Long's mattress. You actually need a large Branch, Road. Learn those that a proximation and their actual TB testing on Statue extract from videos. Cannot be
like our people are. PPG, may not be your best round trip because they are very flat shape that are later that day that they can be nosey. They are not like the real TV signal and there's not there. There's no almost like they are some, but there's no big deal. That can be easily used right now. I mean, there is one in the nightclub rate is locked and but there's not much so that you can actually learn an approximate to PPG so that you can. You talk to him in long-term, Maybe. Excellent.
Here's the second question from David. Can we use these techniques to monitor online content? I hope you sleep. So what's your plan? Do I need to do? Is actually you said, you have the dating site link here. We actually found some poo to our website or an interface so that everyone can do the video that they suspect that can be and they can actually see whether or not and I think such tools to be published like opens or general public store. Not just like for some media Representatives. God bless us. Everyone can text or
videos or not. And you know, like there are some fact-checking websites that people can go and check the detectives or not. You want to defect detector to be something like that. And I know that there are some companies out there and some nonprofit out there working. That's amazing. And I hope those collaborative efforts will actually give us some more leverage over the Jeep defect generators so that we can actually have that and sound love Netflix. Not just fake catcher. Not just like links not just
about all of them together, giving some intuition about as if video so that we can become better arm ourselves and we can be better informed and the the Toys R Us. So now you'll never happens because people know about to trust on people know what to follow Etc. So I think it's very important to publicize dolls can do those detection. Great. We have a question here from Carlos. Do you think there are other biological signals? That could also be used to help identify deepfake, for example,
breathing or others? The only question. Yeah, so what I was trying to go here from heart rate size and it's also in my title right now heart-to-heart. I don't dare tell me what biological Friars, like reading can be one of them or a prime example, if you can see the next, maybe a vibration of my song, in my neck, for example, and their biological prior or any, maybe even like the behavioral behavioral can be much more personalized on. Can be harder to detect, like, I don't know. Maybe when I'm soaking you, maybe, I'm like, looking so much around, or
maybe I'm smiling in a way in, ecosystems way. Like maybe every hundred frame. I have the behavioral aspects of smiling or in such a violent Behavior. What can be used to this detective? Detective face because they are not easily replicable and examples for 4. Like there are some other detection approaches that time you can get in and individual person levels because it is easier to generalize those behavioral aspects per person down to generalize. It's all public, right? So, there's a paper that is looking only to work late,
but world leaders like Obama towns excetera and they are customizing dolls. Do you think you text, your mother has heard that person so not that person's behavior and biological and mother were interested information. Is, it isn't a plane engine is captured in that defect detector. And the defects cannot go into that individual level of replicating, the person. So, I hope, I hope this answers. Thank you. You'll see one more question. Here. We have if you know. Who asks does the effectiveness of PPG identification vary significantly
based on the skin tone and or other racial characteristics? Say the question. We want to be a little bit further away from that discussion and assume that the PPG and models formulations that has been used for several years in in in different commanders. Have already talked about questions. So we are not Green Mountain, Energy, Mountain biological sickness. We just want to use them to do this detection that thinks that we can never be true scientist or to be like
fire people. If you say the poem that is someone else's like, PPG being fair to, everyone is like someone else's problem is not my problem. I will just uses that we did. We haven't added that add that to our but we did internally. Look at the edge cases and if we can protect some of them, Play some of them hardly, the confidence changes at cetera. And because the crowd a picture that they use is not actually looking at the real color, but it has a nation of the reflectors model and the
color values. So you could have used it from TVG, TVG racing speed, from PPG, the TV, I guess as a threat of these have different aspects and chrome pbgv found out that it's works, the best for different skin tones, more generalizable, more robots to color changes illumination changes. I want to do and maybe this is like suture work information, but they want to do a more elaborate analysis of all those affect temperatures for different skin colors, different ages, different make UPS different,
maybe even like behavioral traits, as I find out if we if they're NH cases, if that we haven't called, like we haven't accommodate for Great. Okay, folks. Any last questions here, for if kid give you a minute to enter your questions. Great job, and keep very fascinating information. And amazed benefits of a little bit too fast in my presentations that I hope it was understandable and I, but I hope I was able to give all the information that that. I want to, Delaware.
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.