Jess Garcia has 20+ years of Infosec/DFIR experience. Founder and Technical Lead of One eSecurity, a global DFIR firm, he has led countless complex and mediatic incidents around the globe. A Senior Instructor with the SANS Institute, he has been teaching 10+ different SANS courses all over the world for nearly 20 years and is one of the most prolific SANS instructors (Windows/Mac/Smartphone Forensics, IR/Threat Hunting, Reversing, Cyber Threat Intel, ...). In 2020 Garcia launched ds4n6.io, a community project aimed at bringing data science and artificial intelligence to DFIR, under which he is doing innovative research with real-world applications. A Space Engineer in his early days and a scuba diving lover, He is also a top-rated speaker at international Infosec/DFIR conferences.View the profile
About the talk
Jess Garcia, Technical Lead, One eSecurity AI is changing the world, and Cybersecurity and DFIR are no exception. This pioneering, first-ever talk on the topic will share how machine/deep learning helps in real-world threat Hunting and complex investigations by tackling problems perfectly suited for AI (complexity, volume, correlation, etc) with open AI DFIR resources (aidfir.io).
Hello, I'm just Garcia. I am the founder of wine security. A global detection and response firm and I'm also a science instructor a senior instructor with the sound system. Okay, I'm very happy today to be here. Talkin about artificial intelligence supplied to the fire specifically investigation by talking about the GF, I are missed and we have these red bottom myth in which are in a distant future. We will have to type of elite. People actually, press the red button
and half our investigation salt out to magically. I'm sad to say that artificial intelligence is not there yet. I don't think it's going to be there anytime soon. This person patient is actually showing you that even when we are not there or we will not be there. Soon machine, learning learning can be a very powerful weapon in Turning aai into a powerful weapon. Now, our objective is to help forensic haters and he has forensic experts to move from day or two complement their already
existing capabilities with their powerful commercial door, open for tools for school, supplies into, including older and newer Technologies such as big data on most interesting, Lee data science on artificial, intelligence project, called a data science forensics. Okay, you can go to the data science, Francisco tile website. And you was Final about, is it free project is a community project and we've been doing quite a few things during the last year the road so far
for the supernatural sounds have beer has been pivotal Sue projects that we been creating such as the day. Defiance Francis Library, which is a project that facilitates danger danger system and Analysis of a forensic data from the output of your tooth to your data science environment. Typically Jupiter for those who are familiar with that. We've also been working on a machine that facilitates old is coming at 8 to science, environmental, Jupiter, and all the plug-in. Got to be a little cumbersome. So we are working on a vehicle machine, they see
which is designed for data science, certificate, and sending him. Apply to deify, are we posed to be in working in defining? A model ham? The harmonized artifact model and some other format, which facilitate the harmonization called a tool out. Put into a format which is digestible by data science and artificial intelligence programs. And other very cool project we've been working on is alarm. They are there, sorry emulator specifically for data science. As I will introduce later with the start today before I'm out with,
basically. He's learning for DFI. So last year at the signs, if your assignments in July, I presented an appetizer of these project. Okay, the first step and I was able to percent how is Pacific spread and I mean, assisting machine learning model, call call bears are very good at detecting anomalies in that specific case about milesians Lawrence. If you want to see the whole presentation, you can go, you have the link in the bottom. So that was the first step. But
during this year, from July till now, we need to dig in deeper. So we know that machine learning cannot catch Ebola. Police something blurry in any case, right? And what technology can provide? Okay, so we know the machine learning when you provide a specific date, the said he's good at understanding. The big picture in that big to say school is good in complex, in areas with many variables in a very specific field. You peel those, those will be doing a good job on Mississippi. What type of
things can take me to the machine learning. Do another me going into the technical details, but it can do pacification. Plastering, prediction, noise. Filtering more interesting before us and normally the Texas Indian and malicious event tends to be an anomaly in many cases. Okay? From different points of view that they will be as play. Later, so, we will not be using an anomaly detection in different areas, but hunting the domestication. And possibly, the future CTI on other thief thief, your area. So what is your objective? Well, our methodology is going to
be first. The first part is going to be the traditional a traditional at DFI our process or you have some data, would you collect from your computers, from your a different environments, you professor. And you get some out from there. We would ingest. That involve data science environment Transformers to have a common Harmon format. He's let's have a friendly for the data science in my room and more interested in the safe for the environment. And once we have that, we will be able to would try to detect anomalous in. They okay.
In order to make this much more interesting. What we will do is ask ourselves to be questioned good. We be able to detect and a known attack without indicators of compromise. Just been doing multi artifact machine learning and normally analysis. And to make it even more fun. We will apply to very specific case. I love you know about which is full. So would we have been able to take the solar wind sabacc with a methodology? Like the one we are going to be pretending? He's we were doing something back in the time, the way we are going to be discussing today.
He'll answer this question throughout the presentation. Okay, so you can see here. The first major is a traditional Fred Hampton or you going to be hunting for some drinks, and then you're going to get them some Output from your friends that you're going to be analyzing the second part, which is the second that you're the yellow light year is going to be a machine-learning. We're going to be fine. If we want to the nearest aching back to the real world. Okay. Now that we
have machine learning and make it work, and let go for it. Okay. So let's start running starting with d f. I r Lair is a traditional and you're going to be fine in a threatening manner, in which we're going to be using the microphone and we're going to be using for example, and let's see a policy where we were. We're going to be hunting for the top five, most common adversary techniques. I need you to take a look at the report says that the most common adversary techniques. Nurseries
are processed injection, scheduled for Windows Powershell. A remote final copy for that. Guess what? Solarwinds actually uses 10530 * 5 which is basically Red Canary talks, which is the creation of Melissa Hughes schedule. So this means that we were hunting for schedule tasks, their schedule password. We're up in the even our analysis scale. Okay, at least it's okay. Go tasks. We could have been detecting. There is solarwinds attack with this t1050 3005 picnic. Okay. If you want to know
how from your fancy point of view, how it works of eight, how the creation of a OK Google Tasks malicious. Can you stop? Google password. And it the more the project by Roberto Rodriguez and his brother basically is a great location to look what they've done. Working term self and profiling. How are specific types of techniques of Microtech techniques are actually implemented and showing you how they work. Okay, if we actually find how to detect dose pack mostly by using
detection, on a gun box. So if we look at this, we could be able to know how to detect the creation of schedule tasks. You seem that we only do you unlock but we are going to be also covering for Nancy contact. For that would we can do is rely on the sounds of a fire poster. That's on TV. The poster is a fantastic resource for identifying different types of meat from the forensic point of view in the slide for it specifically for schedule tasks in the destination machines. You can see that there are three types of
artifacts that are people created even lost registry on file system, artifact in the side. You have the security men love. I'm more specific date this past, even lost in the registry. You happy for Nancy's and in the file system. We're going to be focusing on a very specific directory, which is the windows system32 folder which are which is where. And if I created, when you create the schedule tasks, and what we want to do and how we write and Or my DVR or a friend's occasions to collect then I will protect them as needed with other tools potentially like a
poor Eric Zimmerman tools or at Placerville ITT or what-have-you. It will eventually generate a number of files in different formats. Typically Jason or see if we have finished Jeffrey our process without analysis, but with the output of our Francie to see which is the most interesting obviously. These people what we roll out, put what you have is the data that you put on your friend, say Tools in typically are at Jason Garcia before and that's going to be So what we need to do it
harmonize. Malo in the library that I was talking about helps in harmonizing bad for different tools like someone from their machine learning process of feature, selection and featuring you needing to get what we called HTML format, which would be the features that are useful for that specific type of protection from the machine learning model. Let me go now to the explaining you what model? I didn't talk to nobody about that one called every what color and how it works. How does it work? What
you're going to be doing is preventing a number of data entries, to the mall to this house and color. The weather going to be doing is presenting many, many, many, many different cots. And the Machine learning model will understand when you get the essence of what a cat is. What is what you can see in the mirror. This is by doing the process of dimensional at the reduction, would remove all the important stuff and retains the excess of what you're looking for. And then why do people try to do is with that except try to reconstruct the original cat. So if it's a very normal car,
And you will be able to reconstruct. If it's not a car, you will not be able to reconstruct pick up properly and that will create a huge error. The measure of the terror will tell you if we are good. I mean, if it's a cat, or if it's not okay, so, these map to the discussion we're having here and what we have, let's think about the task scheduler, scheduler past events lost. Okay, and the specific gravity of each of the Cullens, each of the field of those men lost is what we're going to be providing to the machine, learning to the machine that you were ready for Mike and Dave, and
Heidi the computer name. That's it. That's it. That's it for many computers. So out there seeing this, it will know what the Expense of a schedule past lost. Look like we'll be able to identify what is not the same as the rest. Okay? And supervise a no point. There is a human saying this is a cat or dishes, not occur. While the mother will do is like this is very different from what I know that the car is you need to evaluate if it's the cat or not the cat, but he will give you the
measure of the anomaly. And this is where we go to the precisely, the measure of the anomaly, what we called. Okay, with you percent normal cat, you will see that. The error is very low. Are you seeing the bottom of the screen? Okay, the normal car, if you put some cats with her skills cats, but we're cats. Okay, give you a higher error which is which are the cast that you were seeing the middle of the screen view. See if you provide to the nearest Network in elephant. It will say. Oh my God, I got the records
fraction error is huge because I cannot make them strike an elephant, with the absence of the cat by knowing. How big is the lost that ever? We can know if that is an anomaly or not. Okay, we've been doing that research, are there. When I was in color, is called a lesbian short term memory loss and colors memory. They'll remember the past. So he's not only. Now, if the cat is strange or not. It will also take into account when the cat appears. So, if I have a normal cat, which is a completely normal cat butt. Never appears at 2 a.m.
And all of a sudden I see the cat appearing at 2 a.m. In the morning. The L St. In Oakland color was hey, I have an anomaly here. This is not because it's a cat, it is strange. Because it's a card of his appearing at a time but he's not supposed to be taking the time until we have selected two models based on the router and color and the LSD album color. And the next thing we're going to be doing is injecting their malicious in Our Deck. OK. Google. Okay alarm will emulate
the attack by injecting their malicious attack. Following location to provide in the data. Going to be ingested by the machine learning model and then we will see if the machine learning model is so fantastic. Let's say we are happy. We have identified that this out and color and I can call their lspm are good at detecting, this type of the day and week have fine-tuned certain parameters to make you defected. So, what would we do with wood? We do at that point. Why do we do? We would do, would be to go to the real
world? They know we have this. This later is fine. Machine, learning animals and out and figure out what animals are are shown, if you have multiple, you can follow the traditional, the domestication process of analyzing. Nothing, and then from the findings, you find a filter other type of thing. And then, and then analyze bad. And feel to them people to another artifact, on Sonos support. You could use it in case there is a slight. Remember, in this game today. We're not going to be time to come clean with. We're going to be at
ceiling. There are no ioc. So we will leave behind the message. We will just focus on what we can do. So we could actually analyzed it even locks for this, then select the top, 20 top 25%, You will decide that percentage based on on your analysis, on the machine learning face and then you extract do sand filter that if I least you can generate a pie if I'm listing with my time or other tools Etc and then you can filter Indiana lost. I just don't want us there for. You could keyboard to all their friends to contacts. As you do in the real world. You could
analyze analyze. What is a found, what are the top analysts and then dig deeper on those in real-world late, but will be presenting is we're not be restricting. Our analysis, to only two are depressed. I think it's enough to to show how did work and is that not going to be complicating things, too much data file listing by wisdom is not the contents of the file for the metadata, for their files in the windows system32 folder, which are where they pass 508 are created when you create a task. This is going to be. I just I want
you to know that this is real world beta. This is not fake data. We have been partnering with the customer powers and abilities. The most Fortune 500 company. They are provided from 1000 April faction servers, okay for 200,000 220,000 event or less for their schedule, a task to be done lost, exclusively about 23, 24 million lines in the file system, tablets quite a lot of things. Again, totally Real World 8, Production service is friendly as I've shown you Jesus and we haven't Solar Concepts,
what we've done is going to the shortest Italian sources that explain you what this attack actually look like from the point of your creation. And then what we've injected into location in the file listing date. Okay, and what we are going to be deciding or framing is the following. If we count with a possibility from the resorts in point of view, my second attempt to analyze the top 100. Okay, we'll be able to go through all day at ten hundred top anomalies and analyze deeper if they need to identify. If it's a false positive
from being on your resources. Did you decide is 1,000 or on a 50? I will detect the solar Winter Park is in the top. 100 announced it. Is that the resources I can use. That's not it to for you to understand very well. What I'm going to be doing. I'm going to be dividing the demo and three phases. Okay, the first place in the first place. I'm going to be introducing you to their own color. I want you to feel to get it feeling know how and I'll tone color actually works. Okay. So this has nothing to do this first part has nothing to do with the solar when's attack. Okay? Is
just to get the feeling and I will explain some machine learning process and that we would use the evil team methodology. Okay, so in. I will explain it later. What exactly, what we're going to be doing, but we want to be a people team and we want to be analyzing data. We're going to be on the light, seen the listing and then we're going to make people think from the face of the demo will cover it specifically the use of the LSP. Open call there. Remember is sable to the tech there.
The time variance, okay, or of the anomalies in time and we will try to define, or what would I find if I, if that that's actually a picture? Okay. So let's start with getting a nice feeling about this out on color. How does it work? Let's learn about Here, we have a total of 6000 and face, which would be the whole day aside and worry about that. And we have a total of 8501 Bingo to be doing is I'm going to be running in machine learning function called find anomalous. Ml, which I have coded. Okay, is, as I say, this is open source,
and I will be running this on the, with a simple album color as you're seeing on the screen in order to produce some enormous telling me, what are the most anomalies in the next less animals interest by looking at the error. I have injected first event, which is really strange. I made it out. So it's really strange and the Machine learning for three Loops. Wifey looks well. It doesn't need to be three. It can be several Loops, but it's important because the machine learning has is a big Random.
Okay, so you may run it one time and not work properly. So you can see their did the machine learning model. I was strong before you run this. You will detect the event and position number 2020, top animal out of 8,500. Well, that's a pretty big. So it has detected at this very animals with it. Less strange event. Okay. This is more similar to the rest of the events that are seen in those terms different fields. A username and actual name, etcetera, etcetera, and machine learning model. Again. I will basically
the tech Beast. Anomaly, number 271 out of 8,500, which is a steal. If it's not the one on top. 200. Let's say a couple hundred, let's do something which is very average. I'm going to be injecting an event, which is very, very similar to the rest of the events that are happening in. And do server. My detection is 2,000. There's something normally 2000 hours of 8,500. So, obviously when I eat dinner around that were something which is very similar to wax
a, you seen that thing is that you will be not, you will not be able to detect, these have very similar to the other events. Okay. So now you are coming on how Color Works, did Maurice change the input bit bigger that anomaly raped or David, the more similar to the rest of the data set the lower. It appears in the anomaly skating. That's right. Let's go to the solar winds case, number to remember. He's in the Peabody inside. So what I'm going to be doing these three things. First, I'm going to be running my machine learning model,
Delta and called are the standard can color only daybell update. Then I will be running the sink all day. Find a listing date span elope. In third place. I will filter. They finally Sting with their normal. He's with the top 25% anomalies of The 11, most strange things, and then out of the past 5. I still there, those that appear in the top 25%, okay? With that. Could be a creative older and things in there, a fight listing in the real world is filtering out.
Okay, but we're going to be seen now. Remember, I don't like seeing only the end. Out of the 8500. I'm going to be looking at the solar winds attack this. What you're seeing the four entries that are associated to event cash manager, which is that the name of it. And I just created by there two events Associated to the creation of the task, it to events as fated to the execution of the task. I meant you can see the results are not very good, the past are not so
animals. I get a depiction of 1700 hours of 8,500. Smell pretty good. So from Maya nadhi. Supposed to be a good have not detected it. Okay. Is not in my top 100. Let's go to file listing now, where I'm going to be in this entry. This malicious. As you can see. I have a inserted it into a file listing, you can see the different times times. You can see the size of the file, the computer name, except for running the simple outline call there. And I'm going to be seen how strange that solarwinds file East,
1800. Nobody, good fun. Not able to take that day. I said, Deborah Degen Amelie on this has been on top. So the 3rd power as you may remember, I will going to be filtering. The finest thing with the top, 25% anomalies of the event locks and that gives me later. My and machine learning. Send a text to Erin is top number, 95 Nursery, 91, and normally, according to the scenario. We had originally Define. I have detected. Malicious file on the top. 100 soul. My analysts would have investigated it in there and good have
Likely detected this. Okay. So well, it looks like a summery if I do the analysis of the scheduled task event. Only. I don't detect anything notable. If I do the analysis, only on the listing, I don't want to take anything notable either. 35, people what I thought you doing a real run investigation on the top 25% and I get a review day to bed. And my kids is now in the top 100, which means I could have texted this solarwinds attack in my strength of the process. Fantastic. Okay, that's great. So we have methodology to the
face, three of them. In this case. We're going to be using the lstm out on color only for a day. Even looks. Okay, so we are not going to be keeping it simple. Let's see how good that this new Palestine. I am out on color technology. Okay. In detecting the, the solar winds scale, task event. Okay. We have injected for is Cayo task, even love to for the creation and two for the execution. That's the normal thing. Okay, but we're going to be doing is running that machine learning process on 30 days of
8220, South Center events of your seeing the screen now, that's a lot of headaches, right? We're going to be running the machine learning and machine learning model different than the previous one eats an ant has p.m. Learning model. It has other older layers. For each larger data said, it will take about 15. For the, for the day, tattoo, get process on to get the prediction to identify the annulments. Okay. I have obviously edited the video so you don't have to wait, 20 minutes or 15, or 20 minutes. And guess what? Now? What is the detection
of the creation of the test their creation of attack? Which are locks. And even take these 108 on 140 are detected in the first and second place of the top two anomalies in the data set. Individually this event. We're not special but because they appear when they appear when they don't. Not even this does not normally appear in the in the moment in time in their in their 100 server. They just said that is detected as extremely analysts as I don't know how to listen to most Timeless event, in. Mom. Okay, update, pretty and interesting
result of being able to detect the song. I went to pack by your standard thread hunting process. Using a similar summarize saying, that everything is open source. Okay, so you will find these on the DS round steaks and the price is high tide be mentioning. This is real world later. We can make anything up with run. The only thing we did was do the injections in the real data and wealth. We are going to continue to investigate him to make this more and more action.
A powerful weapon. I'm closing remarks as you know, how do you apply these? When you have to go back to the office? You can download it. It did they see if there's a machine, which is going to be ready for analysis. And then more memory, you can give it to eat at them bigger. They have sex with, you would be able to process. We've been playing with memories of 256, gigs 1 terabyte memory. But if you have 16 gigs of memory, you will be still be able to play with
a third, a get on with a smart. They go to the DS 26 website to get familiar with how old is the oldest in everything that I appreciate the work, what my truck packed? And Denise. You want to hunt for identify. The artifacts, collect them to pass. Them feed them, the machine learning model and analyze the analyst. Review DeWalt, repeat, OK, Google. This is all. I hope this can be useful and for you. And if I have created, we have created a webpage on their DS Francis that I owe
last. Rsac 21 to collect all the information in this news conference. In general. You can go to the date of Saint Francis of Iowa, website for all the information about this topic store to want his security. Do you need professional help on this area? Thank you very much and I see you. I'm going to be, I'm around for questions. I'm going on a long, but that, and I hope you'll be able to see all your questions, ready. But if you have more, I'm here. Okay.
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.