Events Add an event Speakers Talks Collections
 
Request Q&A
Request Q&A
Video
The complexity of NDK crash reporting
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
2.15 K
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About the talk

Reliably capturing NDK crashes is no easy task. What does the device do and is it accurate? Do you really need a dedicated SDK to get the best native experience? In this talk, we explored the inherent difficulty of capturing and symbolicating native crashes and go through the evolution of the Crashlytics NDK SDK.

About speaker

Konstantin Mandrika
Engineer at Google

Konstantin is an engineer at Google working on the Fabric and Firebase app development platforms. Currently, he's focused on building crash analysis and classification tools to help app teams improve their stability. Before joining the mobile industry, Konstantin worked on database storage engines, real-time data processing, and warehousing systems. In his free time, you'll probably find him fishing, biking, or philosophizing about manual memory management.

View the profile
Share

Hey folks, my name is Constantine. I've been working on crashlytics for the past seven years or so. And today, I'd like to tell you. In particular will take a look at what the device does and then walk through the three, major iterations of the crash that extend e. K s t k. Let's begin by understanding what the device does. When you build your application Lipsy will install the signal. Amber its purpose is to help you debug your application. They provide some useful contacts. Like a stack, Trace here. We have a nap was address baseball somewhere between zero and a

bunch of apps. And wouldn't you know it this app has a bug it's a very common book. It's trying to write a value into the memory addresses zero which happens to fall outside of the apps address face. Turns out that the operating system might not be so cool with this and it tells the app by sending it a positive signal. In this case is 670. The consequence of this is that the app signal Amber is involved. Depending on what version of Android, the app is running. On the lips. You signal a member will request that Tombstone either from debugger D or crash dump. These are

two debugging process. He's available on Android. One of them will attach to the app in order to P trace. It and unwind. The stacks unwinding is a fairly complex process. Luckily, libraries exist to help an older Android spit slip cork, screw on your Android. This is a much different process than Java, crash handling and uncaught exception. In Java does not crash the GM and the GM keeps track of all the snacks for us. This is me because inside the uncut, exception Handler. We can simply worried for all of our suspects, in the

need of world. The process isn't crash State and nothing keeps track of the stacks for us. One must manually, scan the snack memory and use your rhythmics such as frame pointer walking to try to figure out where the friends are. I want to Stack is constructed one must and figure out what functions and symbols each address for response to a process known as symbolic ation. So once the debugging process has done, its thing, you should see something like this in the log cat. This output shows, the stack of a crashed thread and a path for the full teams. If

your app is released in the Play Store, the corresponding crash report should look very very similar. Note that the system symbols are present just by having the address. One can use the D ladder function to get the symbol name. Beyond device on Winder though, reads the debug information that ships with the system libraries on Android. This debug information is known as dwarf. On a small side, when we talked about script libraries were usually referring to a version of the library that does not have the dwarf data by default the Android Bill process will

automatically strip all binaries before packaging them into the APK. So the major difference between a stripped and unstripped library is the presence of dwarf data. Notice that I think all the new build ID, which of the unique Ash that identifies a particular build of the library, is the same in both versions. This is a critical trait in linking information to a particular build of a binary index. For example, uses this ID to associate symbol files to their respective binaries. Okay, back to our Tombstone.

Although it's useful. Let's take a look at what you really get. Well, you get very accurate system frames. Why? Because system libraries generally do have some dwarf data. It's important to note though that there's no guarantee that system virus contain any debug info or what type of debug info. They contain there are devices out in the wild that have no debug info in their system. Libraries by convention. Do there should at least be something called to call frame info Beyond device crashed on process. Can you use this call frame? Info to

fine-tune its unwinding heuristics to better calculate the cracked frames. What else do you get? Unfortunately, the frames of thing. Your application may not be so accurate. Why? Well, the binaries in your application are stripped of any debug info causing me on device, crap process to default with simpler set of heuristics when I'm landing. Without call frame information live on wind effectively guesses. Where the frames are and very ambiguous cases. Disembarkation quality is not optimal as well against symbolic. Yishun is the process of figuring out the symbol or function

name, but corresponds to a particular address. Do you want device crashed on process? Only has access to the symbols that are present in the applications shared object symbol table. So this eliminates a lot of hidden visibility function and anything that's been in line in general. The optimized the shared object is the higher. The chance of an accuracy. If debug information is absent. Last week, there's also a lack of bile in line number information, requiring the use of additional tools like a door to line at the end of the day.

However, a tombstone is much better than nothing in the very first implementation of the crash that extended KSDK. We tried to replicate exactly what the device does. We installed our own signal Handler and a crash time, it On Wellness tax generated a report and sent that reports are back on processing and Analysis. Back in our familiar case, after signal has been delivered to the application or signal Handler kicked in the process, for the purpose of Pete, raising it, it then use live online or live Corkscrew, online Stacks. Since unwinding was happening.

At crash time. We took advantage of a ladder in order. The symbolic, 8 p.m. Mom Stacks. A nice consequence of this is symbolic unit system. Symbols. This mechanism is very similar to what the device does but what do you really get? Well, you get the same accuracy as the on device crashed on process including system Sim. This is almost expected because the mechanisms are very very similar. However, you got low reliability. Why is that? Well as it turns out, the posix standard says that you can't really do anything useful inside a signal Handler. You

can't allocate memory. You can't acquire mutex. All of your functions. Need to be re-entrant and he stinks safe and you can't even access works for any Global state. We went to Great Lengths, to ensure that our SDK was in line with most of the posix requirements. However, live and wind itself was not written to be, re-entrant are e-cigs safe. Ultimately. Our first attempt resulted in an unreliable experience for customers. And other major downside of this approach, our various security challenges, executing code, inside of a crashed context,

can easily hit undefined or unspecified behavior that can lead to all sorts of exploits. The rule of thumb is to minimize the amount. You do inside a signal Handler, and we were doing way too much. Even if we were successful in rewriting live on wind, in a rear entrance and he sinks safe Manor, we'd still be doing too much in a signal Handler. So the crash 16 decided to explore a different approach. We took advantage of brake pad and open source, crash capture project. Backdoor familiar scenario. The application has just received word from the operating system about accessing

protected, memory. This time. However, it's the brake pad client that handles of signal. Instead of trying to unwind the crash of Crash Time, brake pads goal is just to capture the state of Stack memory unwinding can be done outside of the crack context. The state of all thread is written into the mini dump format. This modus operandi, Reedley reduces the amount of work being done inside the signal Handler. However, the mini dump must now be processed Unwound and some Bala Cynwyd. Where is before G ladder provided some

way to get on Device, symbols? The mini dump must necessarily be paired with a symbol file in order to facilitate symbolic Nation? Okay, this seems better. But what do you really get? Well, unfortunately, there's potential for in accuracy for system frames and a complete lack of system symbols. This is unlike the on-device report and the first version of crash there. Since unwinding takes place on the back end. There's no way to access any of the debug information that is available for system. Libraries on device. Capturing that information at crash time

before. Crash time would be way too costly. It's so impractical for the developer to upload this information via symbol files. They have to do it for all versions of Android for all vendors. This is a huge amount of work. Contrary to the on device crash report and the first version of Crash that if you do get very accurate application frames, when the back and made them processing is paired with the brake pads, symbol file break up symbol file contains, very robust symbol information and contains call frame information process and figuring out where friends. Are

you also increased reliability because less is done in the signal Handler. Of course, this comes at the expense of a very complex back end bumper fix worry about that. Although brake pad reduce the amount of work being done on the signal Handler. It was still doing way too much the Craftsman. 16 decided to experiment with crashpad, similar to break. That craft is an open source crash reporting project, that shares much of the brake pad to the principal difference though is the crash capture client. Crash pads. Crash capture mechanics are fundamentally different than those brake pad.

Once again. We're back in our app, that's being scolded by the operating system. In this case, a very lightweight single handle launches, a brand new healthy process, Queen, the trampoline. This is done via the on device, Linker or the app underscore process process, depending on which version of Android. The app is running on. Then if we Trampoline blows of shared object that contains the crash pad client similar to brake pad. The crash pad client, captures the necessary memory and writes it out to the mini dump. Okay, so this is tricky and very

similar to brake pad. But what do you really get? Well, since debug information is still missing from system libraries, there's some potential for an accuracy within system frames and system, symbols are just completely missing. Just like with the breakdown of paired with the brake pads, symbol file. The afternoons are very accurate. Again. This is due to the robust nature of the brake pads symbol file and the presence of call Freeman formation. What else do you get or Crash Pad is much more reliable than brake pad? It achieves this by doing much less in a signal Handler and most of

its work in a fresh brand new healthy process. One of them, more ambitious design goals for crash pad was to reduce the amount of work done in the signal Handler, to just a single system call right now. It's at 3 or 4, but it's getting there. Do the mechanics of crashpad? It's also able to capture more categories of crashes, brake pads in process. Nature creates problems for some categories of crashes, on Androids on Android 10 plus devices. Ultimately, with Crash Pad, you got a more reliable car, crash capture client, but prioritises

application, frame accuracy. Hopefully y'all have a deeper understanding of ndk Crash handling. A better idea of what the device does, what does and what the strengths and shortcomings of each are in solving the stuff problem. I'd like to leave you with my top four takeaways. The first one is that the device crash report me have app for him in accuracy, stemming from lack of app, debug info on the device, by collecting the information, for all of your apps by news via our symbol upload process. The

second is the more you doing? The signal Handler, the less reliable. It'll be. Third is capturing system. Symbols is a hard task. And the last is that crashed that access found crashpad to be a more reliable and accurate and Decay crash capture client. This is what's powering the latest version of the crash that if Sandy KSP, I heard you to give it a try and let us know about your experience. As far as what's next for crashlytics while there are few things that are top-of-mind for us. The first one predictably is system, symbols were

exploring ways to get this working in tricky. Crashes system, symbols. Come in very handy. Easier symbol upload, especially for projects where the native code is built outside of the main application. And finally, got a root cause analysis, a lot of issues stem from memory corruption. We'd like to make it easier to identify the root cause of that corruption. Not only will we help you capture the crash, but we'll try to help you fix it again. I encourage you all to give crashlytics endakay. Endakay shot, and let us know what you think. Thanks for listening.

Cackle comments for the website

Buy this talk

Access to the talk “The complexity of NDK crash reporting”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free

Ticket

Get access to all videos “Google for Games Developer Summit 2021”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “IT & Technology”?

You might be interested in videos from this event

November 9 - 17, 2020
Online
50
83
future of ux, behavioral science, design engineering, design systems, design thinking process, new product, partnership, product design, the global experience summit 2020, ux research

Similar talks

Dana Silver
Software Engineer at Google
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Patrick Martin
Developer Advocate at Google
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Yafit Becher
Product Manager at Google
+ 2 speakers
Vítor Baccetti
Senior Product Manager at Google
+ 2 speakers
Andrew Giugliano
Developer Advocate at Google
+ 2 speakers
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video
Access to the talk “The complexity of NDK crash reporting”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
949 conferences
37757 speakers
14408 hours of content