Duration 39:32
16+
Play
Video

Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Native Library

Maddie Stone
Security Engineer at Google
  • Video
  • Table of contents
  • Video
Black Hat USA 2018
August 4 2019, Las Vegas, USA
Black Hat USA 2018
Video
Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Native Library
Available
In cart
Free
Free
Free
Free
Free
Free
Add to favorites
23.15 K
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Maddie Stone is a Security Engineer on Google's Android Security where she reverses all the bytes to keep malware off the phones of Android users. Maddie has previously spent many years deep in the circuitry and firmware of embedded devices including 8051, ARM, C166, MIPS, PowerPC, BlackFin, the many flavors of Renesas, and more. She is the creator of the IDAPython Embedded Toolkit. Maddie has previously spoken at international security conferences including OffensiveCon, REcon Montreal, DerbyCon, and the Women in Cybersecurity Conference.

About the talk

Topic: IT

I will discuss each of the techniques the malware author used in order to prevent reverse engineering of their Android native library including manipulating the Java Native Interface, encryption, run-time environment checks, and more. This talk discusses not only the techniques the malware author implemented to prevent analysis, but also the steps and process for a reverse engineer to proceed through the anti-analysis traps.

00:00 Who is the Maddie Stone

00:30 What analysts and malware developers strive for

01:08 What is “anti-analysis” for?

02:03 What is the “elf libraries”

03:43 About Shinhwa Android botnet family

05:17 Explicitly malware developers manipulate the native Java interface

06:51 What are the key features of a "wedding cake"

07:51 About different CPU variants

08:43 Analysis of one of the CPU options

10:04 The key thing to remember when we start disassembling an elf

12:40 How to look at the code that is labeled as Jayanthi Onload Idol

15:14 Why do malware developers raise to zero

17:43 Do I need to notify users of malware in applications

19:38 Python's main problem

21:24 How the decryption program works

22:35 About the problem of adaptability

24:21 Hexadecimal hexagon example

25:37 How finding different types of instructions can help decrypt an encrypted array

26:48 Benefits of using Ida Python instead of API

27:50 How does decryption work

29:24 About using regular expressions

30:49 What goals and expectations do malware developers have

31:47 About VX EG checks

34:38 What is the “Monkey tool” for

38:24 Report summary

Share

So who am I? 00:00 My name is Maddie Stone. 00:00 I am a reverse engineer on the Google play protect team under Android security and I've been there for about a year before that. 00:02 I have about 5 years of experience doing hardware and firmware reversing and exploit dev. 00:10 So why do we even care? 00:18 What is the whole point of this? 00:19 Where are we coming from so the reason? 00:22 Why I'm talking and wanted to focus on anti-analysis techniques was very first staff. 00:24

The reason they exist is this whole sort of dynamic between us as malware 00:30 analysts and the malware developers and so we're both striving for asymmetric advantage. 00:35 So they want to be able to create malware that super quickly that has the most market share that they are accomplishing their goal. 00:41

Well, we want to be able to detect it that much faster so that's this mindset that were coming from of they can create anti-analysis techniques, 00:50 but can we detect them and prevent them and get around them with less investment than it takes for them to develop them. 00:59

So what is this anti-analysis technique basically just to make it harder for you to figure out what they are trying to hide so that this is going to 01:08 encompass all of anti-reverse engineering. 01:18 Anti-debugging anti-emulation. 01:21 All of those things I'm packaging up into anti-analysis. 01:21 So let's take a step back and set the context. 01:29 What's the story? 01:32

Where are we I on the Google play protect team we have so many apps coming in all the time and certain ones are flagged for human 01:32 reviewer. 01:41 When that's escalated to me. 01:41 I want to take a look in decide as quickly as possible is this benign or is it malware and should be started issuing warnings so this app came up. 01:43 It looked pretty normal but there was 1 interesting thing it had an L file embedded in the APK that just it didn't look right. 01:53

I couldn't tell if it was actually malware or not yet, 02:03 but I also notice that there were at least 100 other digests or AP Kase out there that also included. 02:07 This elf library, so that got me in this mindset of one, 02:14 I need to decide very quickly whether this is malware benign. 02:19 So that we can get protections out, 02:23 but I also need to figure out why all of these different. 02:25 AP Kase are using it. 02:28

So if you're sort of new to the Android malware analysis. 02:33 We have our APK. 02:36 That's your Android application and in there, 02:36 you'll usually see it is mostly running on the Java Code, 02:40 which you will find in classes dot decks. 02:44 However, developers can choose to write and have functionality that is also in the C or C++ compiled code and that's what we're talking about. 02:46

Today, one of these elf libraries have shared object that is embedded in the APK and has the native functionality. 02:55 So what are we talking about we're going to talk about the wedding cake anti analysis library? 03:04 Which is this native code in why wedding cake is 'cause. 03:10 It's that lack the layers. 03:14 So we're going to go over all of these different layers. 03:15 Why it's so robust? 03:19

What makes it so interesting and how can you reverse engineer it more quickly and what would I have done instead of falling for each of 03:19 their traps along the way. 03:28 So once again why wedding cake why is this interesting so since doing this research. 03:31 I have found at least 5000 distinct. 03:37 AP Kase in the wild that contain wedding cake. 03:40

None of these samples are benign all of them are wow, 03:43 malware and one of the most notable aspects is the newer variants of the Shinhwa Android botnet family, 03:47 which this links to when the sides are posted A blog post, 03:55 we did about it before. 03:59 Is using this to hide their functionality So what wedding cake? 04:01 Is is it wraps? 04:06

The functionality that the malware authors are trying to hide so this diagram came from the initial blog post about Shinhwa in late 2016. 04:06 So what we're going to focus on is Stage 3, 04:17 which is the elf there, 04:20 so that's what they had studied analyzed back then. 04:22 Whatsnew is now you see this wedding cake packed jar. 04:26 But once I finally got through all of the anti reversing an anti analysis techniques. 04:31 The decryption and everything. 04:37

We're going to talk about today. 04:37 what I found was I had just unpacked the pact unpacker, 04:41 because that's what Stage 3 was so I was able to then say yes, 04:44 this is a part of this family and I now know that this. 04:49 These signatures of this elf, 04:53 which I've now called wedding cake just rap everything else. 04:55

So what are all these different techniques that we're going to talk about what makes it so interesting first one of the things that's interesting is previously in Android? 05:01 What we've seen is generally if someone was going to implement anti analysis. 05:11 Anti debugging types of techniques. 05:15

They were usually still in job because that's what the malware developers were already using its sometimes has a lower point of entry than C or C++ compiled code so 05:17 the first notable thing was that all of this is in native code first. 05:26 We're going to start about some of the J and I. 05:30

Or Java native interface manipulations, 05:33 then we're going to go into some places where they've used anti reversing techniques in place decryption an finally to about 40 different runtime environment checks that they use so 05:35 none of these in and of themselves are super novel, 05:47 but the fact that they embedded each one in each other is what made it so complex and difficult to both signature reverse and understand what was happening. 05:50

So once the characteristics? 06:01 How can you notice if you seen it or not very burst thing is that as we've said it's an L for dot SO file in 06:01 the APK usually it is 3 to 8. 06:11 Random lowercase letters is how they've named it probably not after this talk, 06:13 but you know, so the other thing is that the Java code that has to interact with this native library. 06:18

It is always random lettered class names as well So what that also tells us is that this is distributed. 06:25 As source codes or and so it is dynamically generating the class names in the library names. 06:32 Every time they build the application. 06:37 Lastly, 2 things you can look for but probably not in a couple of weeks anymore or these 2 strings in the comment section of the elf. 06:40

A few more of the more key characteristics of wedding cake is that there is always 2 native methods that are declared in the Java application. 06:51 So we're going to go over about how the Jayanthi works in how execution is passed from the Java code into the native code. 07:01

But what you'll see is that there are always these 2 functions and there's sometimes, 07:09 this 3rd depending on the sort of version and when they compiled it so that main function that our method that we will talk about that they implement in the. 07:14 Netcode is here called VX EG again dynamically generated at compile time. 07:24

But this is going to be the function that performs all of our runtime environment checks and starts that main functionality of the elf that the malware author was trying 07:29 to hide in every version, 07:39 though you will see that they have these same method signatures so for example, 07:41 VXEG returns an int and takes an object array as the arguments. 07:46

One of the other interesting things I found as I really tried to understand all of the different variants and how all these different samples were using it is that 07:51 there are many different CPU variants of it. 08:02 So the most common is in most of the Android ecosystem is a 32 bit what they call an Android generic arm so that uses a CPU keyword. 08:06 Army ABI, but I've also seen 32 bit arm. 08:15

V7 arm 64 as well as X86 here is a link to VirusTotal and and digest for one of the. 08:18 APK samples that includes three of these different CPU variants in it, 08:23 and what's really interesting and what we can talk about and keep in mind, 08:27 going through the rest of this talk is that every single one of these different CPU variants has this same functionality so that's not changing across any of them. 08:32

So let's start analyzing this is the sample that I have used as sort of walk through. 08:43 It later on, you're interested. 08:49 It's up on VirusTotal as well. 08:51 So you can look into it and follow along later when I post the slides if you're interested in that. 08:53 So first what is Jane I? 09:00 How do Android apps even use native code so basically Java native interface in your Java application? 09:02 Where execution has to start in the Android app? 09:10

You can declare that you have methods that are implemented in your C or C++ or other compiled code so you just declare it as you see here native keyword. 09:14 There's nothing else in that method, 09:26 then you just write it in C. 09:29 Or C++. 09:31 But the Jayanthi Interface has to actually know how to pair. 09:33

These 2 things so it has to know where to look for these methods and where they might be implemented for the so the very first thing you will have 09:38 to do in the Java side of your Cortland of your hand Droid. 09:48 Application is load into memory that native library. 09:52 So you have 2 options. 09:56 Both basically perform the same thing system download library or system download. 09:57

The key thing to remember as we get into the disassembly of the elf is that when either of these 2 methods are called in Java. 10:04 That calls the exported method in the exported function any elf called Jayanthi Onload. 10:12 So this is going to become really important later in our analysis. 10:18 So now you've loaded this into memory, 10:24 but how it still is this Jane. 10:27

I going to understand that this Java declared method is going to match up and run this native method. 10:30 There has to be some way to pair and know that these 2 things go together, 10:38 so you have 2 options. 10:43 One is discovery. 10:45 Where in your compiled code the method or the implementation of the method so the function there is named Java underscored the mingled classname underscored the mingle 10:45 Mesa name. 10:58

This is really nice because it's a really easy indicator to look and find in your elf if you're trying to pair an understand what's being run. 10:58 When that Java native method is called from the application. 11:08 The second option that developers can use is called, 11:12 is using the register natives function. 11:16

So this you don't have to have any of your functions named in the elf, 11:19 but you will still have to use is a string of the both the method name and the method signature so they know that this. 11:23 Um function in that is in the compiled code is what is run when you call the Java native method? 11:32 So this is what register native signature looks like and what the key. 11:41

We need to remember is that it requires this string or the car array of name and the signature and what I mean by signature in this context is here 11:46 is one of our Java native declared methods. 11:58 So if it was returning a string and taking an integer as an argument then you'll have the eye in the parentheses and then. 12:01 Type that's being returned at the end, 12:11 so these are really easy things to identify when you have your elf. 12:14

But when I opened up the library that was in my sample to start. 12:20 I didn't see any of this there were no strings. 12:25 None of the functions were named. 12:28 It didn't even have Jane I unload declared in the function. 12:30 And what this was is that in every disassembler. 12:34 I've opened or every disassembler. 12:38

I've tried this bar, including Ida pro when you try and look at the code that is labeled as Jayanthi Onload Idol was not able to define it as a 12:40 function due to these 2 blocks of data so that is another really strong indicator in signature when I've been able to open it up because this has been true 12:50 of every different sample that was compiled in 32 bit arm. 13:01 So. 13:05 First thing you gotta do to figure it out is super easy just declared his code. 13:05

You have your function, but now where do we start? 13:12 Because we wanted to focus our analysis on those Java declared methods. 13:15 They were declared for a reason. 13:19 We see them called in the Java code. 13:22 Yet. 13:24 We can't find with what is actually implemented to be associated with those methods because they should either have a native function here in the elf with that, 13:24 Mangold Java name. 13:36

Or they should have the strings of the signature in the name for a register native to run on it. 13:36

So where I decided to start with J and I unload because before any of those native methods could run, 13:45 you still have to load the library into memory and when I started looking at J and I unload it had all of these repetitive calls to the same function 13:53 at the end and it was taking in arguments of different blocks of memory, 14:04 so this is a really, 14:10 really strong signal encryption or decryption. 14:11

Because you have to run the decryption function over different places and then hopefully we'll have more information about how this works so in this case all of the yellow 14:15 blocks are the calls to the same function sub 2F30I highlighted one of them and that's going to be what we believe right now is our decryption subroutine. 14:27 So that's the next place to start because obviously I want to understand and be able to analyze. 14:39

This lab library as it runs in memory so go ahead bigger output. 14:45 The different arguments in there, 14:50 it takes 4 arguments each time it's called first. 14:52 The pointer to the encrypted fights the length of those bites that should be decrypted and then it has 2 arguments that stay the same, 14:56 the whole time, we have a Word Theater, 15:05 A, which is an array. 15:08 4 bites of each each bite each value is 4 bytes and then a bite. 15:09

Cedar, a so these are generated before any of the decryption calls start and then the same things as past each time. 15:14 So this is the Ida generated decompiler, 15:23 which I sort of cleaned up a little of what they see to raise functions were I went ahead went through this tried to understand it coded up a super 15:26 simple thing in Python to go ahead and generate it. 15:38 So I could see what those values were. 15:42 And what I found is that they simply were. 15:45

Allocating tour raise from zero to 255. 15:49 So they wrote this complex algorithm. 15:53 Instead of 2 lines to allocate these arrays, 15:57 so this was the first technique in a really great use of My 6 hours as I was coding, 16:00 it up and trying to understand what it was doing So what I would suggest in the future is that in what I would do. 16:06 Instead was just run. 16:14 It dynamically and grab them. 16:14

I already knew the same values were passed each time they weren't being regenerated, 16:17 but instead I stuck with static, 16:23 reversing an fell for their anti reversing trap. 16:25 So hopefully if you will see this algorithm in the future to you won't fall for the same things and I took the bullet for all of us. 16:28 So. 16:37 We now have RC to raise we can move on to the decryption. 16:39 The key and the overall framework of how the decryption works since it's in placed in. 16:44

It is running during Jayanthi on mode, 16:50 so before the elf is actually there in memory is that the decryption functions called them that encrypted array of bytes. 16:52 It does its decryption and then it actually overrides the bite encrypted bytes in the same place. 17:01 So this gives us an idea of how to decrypt it in Ida how we can start to analyze it as it would look in memory too. 17:08

I personally was not able to identify it as any known encryption decryption algorithm. 17:15 But Hey, if y'all can find it. 17:21 I would be more than very happy to know if you see it as something that you already know it is out there. 17:23 So at this point. 17:32 The key that I needed I needed a solution that was going to work fast and be flexible because again remember. 17:36 I'm still trying to decide and make the decision. 17:43

Do we need to start alerting users or is this benign and I can pass and also I knew that there were at least 100 other samples out there in 17:46 each of them are compiled differently, 17:55 so my key thoughts when going into this was that one. 17:57 I don't need to fully understand the decryption algorithm. 18:01 I just need something that's going to run over it and decrypt it for me, 18:05 so I can analyze the contents. 18:09

The second thing I needed was, 18:10 I needed it to be flexible because I had so many samples. 18:12 I didn't want to have to copy and paste rewrite it, 18:16 adjust for different memory addresses different registers that are used in different places to develop my solution so those are the 2. 18:18 Key things that I keep in mind. 18:26 Whenever I'm trying to develop a quick decryption solution for these types of packed. 18:28

Things so I did open source my Ida Python script and that is available. 18:33 There you can also just Google. 18:38 Ida Python embedded toolkit and it's under the Android stuff, 18:40 I chose to use Ida Python because it is one of those well. 18:45 It's one of my favorite tools to use in where I'm super fast and I also focused on translating the decryption. 18:50

Python rather than trying to create true pseudo code or a code representation of it, 18:58 what I mean by translating it. 19:04 And this is how I play into the speed and move as quickly as possible. 19:07 Instead of understanding. 19:11 In the assembly. 19:11

What each of these registers do or what the developers might have called them or what their functions were I just named variables in Python that had the same thing 19:13 and I run through and say a move is an equal you know you cannot just go step-by-step because that just allows you to follow along instead of trying to 19:23 be like pattern, matching and figuring out all the different aspects of it. 19:33 One thing that to keep in mind is that oops. 19:38

One thing to keep in mind is that Python is obviously not as strongly typed language here in assembly instructions know exactly what size their operating on whether it's a 19:40 bite 1/2 word a word, 19:52 so that leads to a lot of bugs. 19:53 If you don't keep it in mind, 19:55 so if something's not working the way you're expecting. 19:57 That's generally where I look. 20:01 I also tend to write helper functions, 20:03 which you can find and take in the script. 20:05

Or anything else you want to use for a lot of this signed operation since Python won't usually know that OK. 20:08 This is a bite that is operated honest signed or not. 20:15 So let's do some demos and talk through this. 20:20 Assembly. 20:25 So this is our sample library. 20:28 That's an interesting I love you. 20:31 Whoops OK. 20:42 Well, we're not going to get in Graffeo, 20:48 so the first thing is it's very small on the right. 20:50 But there are nothing name Java as we talked about. 20:54

There's only all of the imported functions, 20:58 so that's where we're pulling into our. 21:01 RJ and I unload function and so that's what I'm scrolling through right here. 21:05 Here is that decryption function that I already had showed you all this screen shot up in each of the places that it takes bites? 21:10 Is there all right after each other. 21:18 It's a block of memory and it's just random bytes. 21:21

There's nothing that looks key or anything like that idea doesn't know so all of them are declared is unknown data as we scroll through it. 21:24 So when we look at our decryption subroutine. 21:34 That one that will let me do. 21:38 Here is the overall graph structure, 21:42 So what we have is 2 while loops. 21:44

There's one at the Top in one at the bottom and so when you're doing translation for that of those translation to Python in order to just have a solution 21:47 that can run over it. 21:57 It's always. 21:58 I tend to find it's helpful to just that's where you can have a variable that says keep looping true or keep looping it false and you can 21:58 just set that in the same way as your. 22:08

As the instructions did for like branch less than or not branch greater than and things like that. 22:11 She's sort of have that translation. 22:17 So going through. 22:20 Set large enough. 22:24 Um. 22:24 The first thing I did was I coded up my decryption subroutine in Python that translation. 22:29

We've talked about and tested it over just one of the bytes Rays to see does anything come out of it in my doing this right are bugs coming up 22:35 and followed along once I understood that that decryption subroutine was correct. 22:45 That's when I had to start thinking about the second problem, 22:51 the adaptability. 22:54 I have so many samples coming in and I want to be able to compare them to each other. 22:54

So I don't want to have to re code anything else for each new sample. 23:01 I want something that I can run on anything and then be able to quickly analyze and check does this decrypted library look like. 23:05 These others or does something different stand out so that means that I can't hard code in where those encrypted array addresses are I can't hard code. 23:13 In what their bites are. 23:23 I can't know for sure. 23:24

Where does this decryption subroutine live so in my main script where I start. 23:26 Its 1st even just finding day and I unload just like as humans. 23:31 That's where we started that's where we can start with the Python script. 23:34 I then went ahead and at first just went ahead and initialized my seat arrays. 23:39 Uh. 23:44

But the next job that the script needed to do for me was tell me where is each array read its contents and what it's it's lengths because those 23:44 are the 2 dynamically changing arguments to the decryption subroutine. 23:55 So what I did was that I first wrote a subroutine called find a crypt sub really creative and went through Jay and I'll mode. 24:00 So it starts about hex 20. 24:21

Or no hex DO uh from the beginning of Jayanthi on load and then begins looking for repetitive calls using BL since that were in 32 bit arm. 24:25 I knew, at the last chunk of BJ and I unload function. 24:34 They just repeatedly call to this so once I found a subroutine and use some string processing on Ida's disassembly. 24:38 All I needed to get the disassembly just using git. 24:46 Dis ASMR from the Ida Python, 24:49 APIs. 24:51

I just did some string processing to see the address. 24:51 I then just checked that that same subroutine was called at least 5 times to make sure as a safety check that I found the right sub routine and then 24:55 I recorded all of the cross references are the addresses that called that subroutine each place because every time they call the subroutine. 25:05 They had to allocate or assign the arguments. 25:14

Earlier, they had to say RO is going to equal this pointer, 25:17 so if I have those addresses. 25:21 I can now figure out what's the encrypted array. 25:23 So once I did that I then iterated through each of those different cross references. 25:28 And every time I had a cross reference. 25:34

I would use regular expressions to look for the different types of instructions that could be assigning of the encrypted array bites which are at the Top of this thing, 25:37 so you can always change that. 25:49 If you are looking at a different CPU of some sort and then I pass it to this function called get array in length. 25:51

I do pass it previous the previous previously used to length as well, 26:00 and the reason for that is that there's a couple of different ways, 26:04 you can assign the length. 26:08 If we look back. 26:11 Let's see if it's going to load yes at J and I unload 1st. 26:11 We see an example of where they assign the length here to our one using this immediate assignment cool. 26:16 But later on. 26:24

They start storing that value on the stack as well, 26:24 and also loading the length from the stack. 26:30 Or sometimes they are using the same previous length, 26:34 so accounting for all of those different regular expressions in those different cases is what I did within get array in length. 26:37

So finally after that, we have a map of here is where the erase starts the encrypted array starts and here's what its length, 26:48 then we can just thankfully use the Ida Python, 26:56 APIs and call to our decryption function, 27:00 which we'd already tested and then we get back the results of the decryption now. 27:02 Another reason why I like to use. 27:08 Ida Python instead of API is it allows us to just Patch. 27:10

Those bites that previously were encrypted using the Patch bite API. 27:14 And write them over that address so now we are able to run or not run analyze RI did database in the same way as it looks after all, 27:18 of this decryption has run in memory. 27:28 So what this looks like is obviously here. 27:31 We still have all of these unknowns if we look at our strings. 27:34 Um there's a lot of jumbled but nothing really more Oh this is really small sorry. 27:39

I'm so a lot of jumbo things and then some of the common imports exports in here, 27:50 but nothing that really talks about that signature that we're still looking for or the declared Java method names that we still need to be able to do that peering 27:56 from the beginning of our APK says I have these native methods? 28:06 What's run in this library. 28:10 So when I run this. 28:13 Files. 28:15 Script file. 28:15 And then 28:19 Run our wedding cake crypt what we will see. 28:25

Now back in RJ and I unloaded suddenly we have all these strings. 28:28 So now we can start our analysis because one of those key things is look. 28:32 We have calls to dex classloader. 28:38 We CN it. 28:41 We see right here. 28:41 Was that class name that with all those jumbled letters? 28:44 So now we can really start understanding What is this doing in getting past the decryption and this script is going to run on all of our other samples too. 28:48

Oops where my sights got so again focus just on speed and flexibility. 28:58 It's always hard for me because I like to understand everything. 29:04 I'm reversing but sometimes it's not the right choice. 29:08 If you are better at setting up arm emulators or debuggers, 29:12 then that could probably be a faster route to but that would have taken me longer so yeah. 29:16

Out in one of the ways that I generally get around hard coded addresses as well as registers is using regex. 29:24 So just to screenshot the Top was all of the encrypted block of memory and then a segment where the same segment decrypted after running our script and the key 29:32 thing was is now we do have that string of our function name VX EG. 29:43 We have its signature and now we also see in the structure. 29:47 The subroutine that is associated with it. 29:52

So this is now where we can go we can finally get to what we actually wanted to analyze in the beginning, 29:54 the plus one. 30:02 At the end is just because it's running in thumb mode. 30:02 So we have our three declared native methods. 30:06 We know their native subroutines that are run each time are called. 30:10 We have their signatures because each of them are named differently in all the samples. 30:15

I just added a function number in the leftmost column to say that any of the native declared methods that have these same signatures would correspond in the same analysis 30:22 as what I have here for VXEG or. 30:36 Shit. 30:38 So that leaves us into our runtime environment checks, 30:38 which I didn't know at the time I just started my analysis on that function number one. 30:43

But what goal of the malware developers was they wanted to understand if they're being dynamically analyze debug or emulated someone managed to get through probably the Ji manipulations that 30:49 anti reversing in the decryption. 31:00

So now where are they and they want to make sure this is in a debugger that got around those things or it's not some automated pipeline and one of 31:02 the interesting things that sort of different in is changing with the evolution of Android and how there's not really the low hanging fruit anymore. 31:11 Is that they're willing to give up some of their market share as malware developers if that means? 31:20

Waiting at more heavily towards not being detected, 31:27 which is not one of those things in a malware family such as some wall which is focused on making money through different types of fraud. 31:30 What we saw, here is that they're willing to not run on every device as long as there, 31:39 not being detected. 31:44

So again VX EG prefers runtime environment checks and there's 45 different ones of them if any single one of these 45 checks fails, 31:47 then we will stop execution of the app. 31:56 So we're going to go through a couple of different groups of these that make up the 45 the first one, 32:01 is going through says all the system properties, 32:08 so all of them are sort of aimed at being whether or not your hardware. 32:10

You're running on is an emulator or being debug and the first 37 checks. 32:15 They do are checking to see if the system property has a specific value. 32:19 I obviously could not list them all here. 32:24 So there is a link as well as in that Ida Python embedded toolkit where the decryption is of all. 32:26

30 Seven of these checks if you're interested in seeing what they're looking for but most of them are based on looking for a common emulators debuggers, 32:32 but one of the interesting ones, 32:41 too, was they will not run. 32:43 If you do not have SC Lennox in enabling mode, 32:44 so that is one of the places where they're not going to generally run. 32:47 If your device is rooted and things like that. 32:51

The last five that they go through is using is just checking if any of these 5 on the right exist, 32:54 not even what their value is and it's. 33:00 Pretty clear why they're all related to different types of emulators or things that emulators and debuggers set up in order to make it look like a real device. 33:02 So the next thing they do is they want to make sure what architecture they're running on they will not run. 33:17

If you are not running on arm which is really, 33:25 really fascinating since we found X86 versions of this library to those X86 versions also include this check even though they're running because you are an X86 so they don't 33:28 do this with clear API call. 33:40 Instead, what they do is they open up slash system slash Lib Slash Legacy dot SO and read the 1st. 33:42 20 characters are 20 bytes from that file. 33:48

This is still a part of the L header being the 1st 20 bytes and then out of those, 33:51 20 bytes. 33:57 They only read 3 of them. 33:57 The first check. 34:00 They do is for the bite that's at the Y identity. 34:00 I class position in the header and that determines whether or not you're running at 32 bit or 64 bit. 34:05 The last 2 bites. 34:13 They do is what machine are hardware. 34:13 You are on so in this case. 34:16

They're looking for these values of either hex 28 R Hex be 7 to say you're either armor. 34:18 Ark 64, so the only two combinations that are acceptable is 32 bit and arm or 64 bit and art. 34:24 If you are running anything else than you exit in the application stops. 34:31

The next thing they look for Termine If Monkey is running if you don't know what monkey is it is a tool that allows developers to emulate or pretend that 34:38 a user is interacting with their device in their application. 34:47 So you know performing fake touches fake clicks and things like that. 34:51 The way they go about trying to detect if monkey is running is they open up rock. 34:56 They then iterate through every directory here, 35:03 yeah, so they? 35:07

After they open up crock they check what each file or notes value is and whether or not. 35:07 It's a directory. 35:15 If it's a directory. 35:15 They check its name to determine whether or not. 35:18 It's an integer and they then construct these 2 paths of the PID's directory for com or I mean file for com as well as command line. 35:21 They read then out the Max Max up to 7 F bites and then choose whichever one has more information. 35:30

They then see if that information contains the monkey package name at all. 35:38 If it does it means that monkey is running in May choose to exit? 35:42 So just to know this, 35:47 no longer does this doesn't work on anything Android inner plus to open up rock in iterate through the pids so in that case that they are not actually able 35:49 to open up. 35:59 Prakken pit instead of Ek sitting in that case, 35:59 they will still just skip over that check and run. 36:04

It's only if they are able to open do they exit if they see monkey? 36:07 The very less runtime environment check, 36:16 they do is to determine at the exposed framework is running as well so exposed is a framework that allows you to Hook or modify system code on your Android 36:19 device. 36:28 It's used on a lot of different forms for a lot of different reasons, 36:28 but they want to make sure that you haven't hooked their app for analysis. 36:32

So they're going to check if these 2 files are existing in Proc. 36:37 Self Maps meet, meaning they have been mapped into memory and then if that one passes. 36:41 They then also check using J and I find class methods if they're able to find either of these 2 exposed class methods are running too. 36:47 'cause you want to be really sure exposed isn't running. 36:55

But if you make it through all of those different 45 checks as well as through the in place decryption anti, 37:01 reversing what I've been found was it was code that I'd already analyzed before and that other team members had documented as well, 37:08 and the sample. 37:16 I had been looking at was a new variant of these Shinhwa family and it was just another unpacker, 37:16 so I spent a couple of days. 37:23

All my time focused on getting through this to impact the pact unpacker. 37:25 But what was really interesting about this and what I had been thinking about a lot is one. 37:33 You know they're willing even though there are a large sort of money, 37:38 making fraud botnet and that's always been their goal is market share because that's how you make money they'd evolved to a point now that they were willing to miss 37:43 out on potential targets. 37:52

If that means not being detected and In addition, 37:52 they were very intelligent in how they layered their anti analysis techniques because they targeted one. 37:57 I'm going to frustrate the human analysts through the decryption and the encryption in the anti reverse engineering. 38:03 Then they're also going to try and prevent dynamic static analysis tools running over it to find strings or understanding of what was in the L. 38:11

And Lastly you know they also are using? 38:24 Techniques to detect if they're being dynamically analyzed so packaging all of these together to try and target each of those different types of analysis that we all as defenders 38:28 or attackers try to bring to the table here. 38:40

So what I hope and what I was hoping to help or provide to you and why I hope you or I hope to have provided in the sense that 38:44 you stayed until 38:51 PM on the last day of Black Hat was some ideas of what is the current state of the art in terms of Android anti analysis 06:00 techniques and how could you possibly get through them faster things that you could look for know that they're being used so you don't spend the same amount of time 39:00 I did, and also how can you write decryption solutions with the goal being fast? 39:10

And sort of agnostic to the exact sample you're looking at. 39:15 And with that. 39:20 Thank you and are there any questions? 39:20

Cackle comments for the website

Buy this talk

Access to the talk “Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Native Library”
Available
In cart
Free
Free
Free
Free
Free
Free

Video

Get access to all videos “Black Hat USA 2018”
Available
In cart
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “IT”?

You might be interested in videos from this event

September 28 2018
Moscow
16
122
app store, apps, development, google play, mobile, soft

Similar talks

Alejandro Hernandez
Senior Security Consultant at IOActive
Available
In cart
Free
Free
Free
Free
Free
Free
Christopher Domas
Director of Research at Finite State
Available
In cart
Free
Free
Free
Free
Free
Free
Balint Seeber
Software Engineer at Bastille
Available
In cart
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Native Library”
Available
In cart
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
505 conferences
19653 speakers
7164 hours of content