About the talk
This video is part of the R/Medicine 2020 Virtual Conference.
Alright. Alright, alright. Alright Hey, Michael not sure what's going on with your sound right now. I've been muted. You Okay. Much better. Thank you. Alright, welcome everyone to the first session for today, going to start the session with a 10-minute. Talk by Steve Master call reproducible data analysis, a model or Clinical Laboratory data use Is it still needed? Play on now. Okay, thanks. That's what I want to do with this brief talk for a minute, discuss something lady to reproducible data, and reproducible
analyses. I mean that I think is of great interest but I want to talk about it in a slightly different way to talk about it, relation to the clinical lab because the laboratory medicine is really in many ways and ideal venue for our the information that comes out of the clinical. Lab is already datafied largely quantitative data source from the lab took a large academic Medical Center on the order of 15 to 20 million results for year. Those results of them stored in a dedicated lab information system that information in The Labyrinth
formations assumption for speech. Hide in the information, the larger electronic medical record outcomes research and things like that. Of course laboratory data are significant contributor to patient management data are not only attractive Today to make their way in the larger Mr or into less a healthcare Enterprise data warehouse are they are very attractive data for modeling for others in the data analytics community and we can imagine the large set of use cases. I'm somewhat with what you're about later today. And I'm damn homes and
Patrick, both can be giving clocks that relate to some of this of course you can imagine using data from the lab and operational management that you passed the turnaround time quality control, quality management report. And of course, reproducible work clothes in those kinds of environments reduce ft. You have to approve result quality, there's a disturbing use of excel. That seems to be hanging on in those contacts that needs to be supplanted by are additionally. Of course, there's the issue of Predictive Analytics, which is great. Interest several papers. For
example came out last year using lab data and part 2 that you kidney and free. I just this past week and let us in literature. Crossfire based on a gradient boosting from Wild, Cornell medicine, that looked at using regular lab de tournon, blacks were allowed to predict covid-19 Predictive Analytics is the important part with laboratory data. And when we think about reproducible work clothes in this contact, I think we typically think about what's happening on the right here in the large light blue box weather. We're looking at a turnaround time to strip the
descriptive statistics, whether we're talking about moving averages to like a quality control for a stage ref. What we were talking about machine learning diagnostic, fluted tube for brightness of classification. In all cases, we sort of had an intuition For What and the our world will give us reproducibility. We want auditor Version Control. Our scripts are we want reproducible reporting using barktown at cetera. And so this is the world in which we typically live and think about reproducibility. What about the Highlight? You're just very briefly is that if we only concentrate on that
part of reproducibility that we typically think of, in the our message of unity and ignore the reproducibility of the wrong day. We will do our don't sell the disservice in the kinds of models. We actually produce, what do I mean by that? What the problem is, the lab data may not always be as reproducible as they appear over the appear, to be the ideal data source for the kinds of analytics that we'd like to do their quantitative. They're already datafied. And, of course, one would assume that getting taking the same measurement with two different machines would give the same result. In
fact, that's not always the case. And what I'm going to argue very briefly, is there a multidisciplinary team like Robert gentleman talked about yesterday, will be required to build, truly reproducible models from the healthcare. This is going to require the domain expertise both of laboratory in Santa Fe analytics Cruise. Let me just give one example to give you an idea of what I'm talking about. When I when I say this I use examples standardization harmonization typically a I think there's an assumption that all looks like. The next few slides are actual data for a crossover
study that he did several years ago, moving from a backbend set of instruments to see him instead of instrument and of course I think the world typically assumes that everything works like the UN here, wondering where the slope is. One intercept is very close to zero and everything matches up very nicely with the unity line. The reality is it from any assets. However, this is not the case, here is insulin on the same two instruments. And now, if you look at the dot at Unity line, you see that, in fact, the actual relationship between the instrument has a slope is far above the slope is
on the order of 1.7. And again, this is true for a number of different assays and it's up to the person building the model to know which assays are well, standardize, our house Nice and which are not even if you were to go beyond insulin here and say well maybe the same as just runs a little higher that's not necessarily the case your to tumor markers 50 and 3 and 125. You can see that there's a positive bias in one case and a negative bias with respect to these two insurances in the other case. And although you're seeing a lot of, scattering, the point that are very high. You'll see
that the same friends continue, even down in the low end of the range, I'm suggesting that again, if one were to build a model that didn't account for instrument type or a fire type and in fact, one would not have reproducibility in the way one wants it. And why is it so hard problem? You think it's sort of his song years ago, but of course, depending on the ass and dust can be very tricky to solve. I'm off. And there are no reference materials. If there are reference materials, those reference materials, may not really mimic, a patient sample. If they do, mimic a patient sample,
there could be variability in what's being measured and maybe, Post-translational modifications that are endogenous to affect the ability of one is a person's another to measure a given analyte, the many different methods of course, they're different manufacturers sometimes no gold standard black apartment ization. Something that's a known issue with them. Alive medicine, beginning. We know which acids are better, which I Cesar worse and I would argue that this kind of information needs to work its way into the models we develop, and why would this be an issue? Was certainly one can
imagine our class while you're here. I'm just trying to separate Green from a red in two dimensions. And I said, of course, it's a place that very nicely. If I start to think about the effects of lack of harmonization or a shift, a bias with one has acid vs. Another very quickly. What looks like a perfect last wire and start to call of false positives. In fact, they can be in founding if the out with the outcome of the prevalent steppers. Let me show you one of the Quake example of this is just using the data published several years ago
and I'm just a subset of the date of this was using hematology analyzer information to in our, to predict my oldest plastic syndrome. If I just take the top six variables, from this model and build, a simple logistic regression, I get until you see a point eight 4. If I now take a single analyte PDW in this case and the create a non harmonize version of that and he really all I'm doing is talking about the same difference that you saw in the info that I showed you earlier and now ask what that does to our ability to do classifications. Now, looking at that, the test that
if I were to train on one analyzer and the test using a mix of the virtual analyzers, what happens? Well, of course, I now they're losing a discriminatory ability of with respect to the original a say. So. Can discuss it but I think none the less you get the point. When we start to talk about harmonization issues, we can start to lose power and our models give you this as an example on to say that, although laboratory medicine remains an ideal venue for our base. Reproducible data analytics, that reproducibility needs to begin priority analytics. This cannot be done in a sort of us and
box of data that is taken in a sort of anonymous way, and just treat it as if it falls from the sky. We need to understand the using domain expertise, what the limitations of those data sets are. And I would like to highlight the role of laboratory medicine, professional societies state, who's sponsoring the conference in actively promoting data analytics. Literacy certainly and coding certainly within the community of laboratory on but also the Highlight that there's a potential for Partnership between by Prince data scientist. Although I definitely fall in the position should code side
of the debate. I would also say that it is certainly the case that they do scientists will be needed some of our friends can't do it. All themselves either will be able to do it alone and to just finish that off. I'd like to highlight the fact that there are venues certainly within the laboratory medicine Community, the ACC a meeting is happening in December. In South Beloit virtual December 13th, through 17th and you can go to the lake there and when I'm showing you here or number of the data analytics and are based on things that are being being held there, but also I'd like to I make the
pitch that if you weren't a data analytics and are working with laboratory that it it may be useful denigrate yourself into this kind of community to learn more about the pitfalls from the data source to allow us to achieve stability and run with that. All right, I think Steve, I think you said stay to try to stay on time. We're going to move on, but if you want to contact Steve with questions, you're more than welcome to do that. Thanks? Thanks.
Buy this talk
Interested in topic “Medicine, Health and MedTech”?
You might be interested in videos from this event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.