Мероприятия Добавить мероприятие Спикеры Доклады Коллекции
 
Продолжительность 53:03
16+
Видео

Caroline Uhler, Multi Domain Data Integration From Observations to Mechanistic Insights

Caroline Uhler
Associate Professor в ETH Zurich
  • Видео
  • Тезисы
  • Видео
BioC2020
31 июля 2020, Онлайн, USA
BioC2020
Запросить Q&A
BioC2020
Из видеозаписей конференции
BioC2020
Запросить Q&A
Видеозапись
Caroline Uhler, Multi Domain Data Integration From Observations to Mechanistic Insights
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
В избранное
530
Мне понравилось 0
Мне не понравилось 0
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
  • Описание
  • Расшифровка
  • Обсуждение

О докладе

Keynote: Multi-Domain Data Integration: From Observations to Mechanistic Insights

Caroline Uhler, PhD (ETH Zurich)

9:00 AM - 9:55 AM EDT on Friday, 31 July

TALK

Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (drugs, knockouts, overexpression, etc.) in biology. In order to obtain mechanistic insights from such data, a major challenge is the integration of different data modalities (transcriptomic, proteomic, structural, etc.). I will first discuss our recent work on coupling autoencoders to integrate and translate between data of very different modalities such as sequencing and imaging. I will then present a framework for integrating observational and interventional data for causal structure discovery and characterize the causal relationships that are identifiable from such data. We end by demonstrating how these ideas can be applied for drug repurposing in the current SARS-CoV-2 crisis.

Moderator: Levi Waldron, Charlotte Soneson, Erica Feick

О спикере

Caroline Uhler
Associate Professor в ETH Zurich

Caroline Uhler is an Associate Professor at MIT. After completing a master’s degree in mathematics and a bachelor’s degree in biology at the University of Zurich, Prof. Uhler received a PhD in statistics from UC Berkeley in 2011. After postdoctoral appointments at the Institute for mathematics and its applications in Minneapolis and at ETH Zurich, Prof. Uhler joined IST Austria in 2012. In 2013 she participated in the semester program on Big Data at the Simons Institute at UC Berkeley.

Перейти в профиль
Поделиться

like to welcome, everyone and especially to welcome our Invited guests, Carolyn loser, who is formerly an associate professor at MIT. And just recently came back to Europe to join eth0 as a professor of machine learning, statistics, in genomics. So we're very happy to have you with us here today and board the integration of observation in science. Thank you very much, Levi for this introduction and for inviting me to speak you. So what I thought I'll do is, I'll start with the kind of motivation

for the kinds of methods. We've been developing in terms of multi-domain data integration and then go from there. So something that's has a really excited me over the last couple of well maybe five years or so. Is this question of like you know, how is it that we all have the same 1D information. They made a genome inside each one of ourselves, but we have all these huge variety of different cell types in different states at cetera that make up our body. And what I think is really exciting is that, you know, nowadays with all the Single Cell technology so we can actually

look at these single cells into tissue and outside of the tissue. We can look at it. In terms of Imaging. We can open the context of expression and all kinds of different state than with allergies and not only look at them in terms of observation and get observational data, but we can perform interventions like see a knockout intervention or a drug intervention at cetera and see how the cell changes after a particular intervention. And so, I think what one of the big challenges do when you think about this kind of data is that many of these measurements are highly

destructive to the cell, right? So for example, you can still not get say that be staining and are in 1/6 on in the same cell. Also, you know, once you collect its RNA see, profiles and of course you cannot. Also look at how to sell would look like after a particular intervention. So say, for example, if I do a knockout experiment, I have to decide whether I look at a particular cell before dinner or after dinner, but I cannot do this post in the same cell, right? So you have this very big kind of data, integration problem,

where, you know, you have a population of cells and now from the spot. So if I can decide, you know, I can take some out for Imaging and others out for say sequencing but I can still not do both in the same cell, right? So I need to somehow integrate these different views of cells In order to get a full picture of what these cells are same thing where you have Interventional laid down, right? I have a population of cells. I take some out for sequencing before directions from our first meeting after and the problem where,

you know what, the kind of question I would really like to answer is where I able to also sequence to sell before the intervention, how would this particular cell have look like I think the same questions also, if you think about a process that involves overtime again because these measurements are highly destructive, never gets to see these standards time series measurements. Like you don't usually get to CSL and how it changes over time. Because once I take the image or you know, II sequence it, then I'm done with this cell, right? And I have to look at a different sell it at a

different time point. So I guess that's the same question where, you know from a population of cells at different times points where I sequence or image different kinds of cells. I would really like to be able to answer the question. How would they sell have? Look like where I able to image it. So I kind of want to work Cuz I think all of these questions because these measurements are so destructive, are always need to send data integration and translation question. So those are the kinds of questions. We've been working on quite a bit over the last year's. And so what I thought I'll do

with the first start with the kinds of questions where I'm just looking at one day time with Allison and I'm asking. But in one day, done with that lady, I have, in this case, it will be single. So I'll have observations and Interventional theater. Meaning, I have you no large-scale single-cell RNA, but I also have matching one gen II Region, 5 10-6, and what I would really like to be able to do is from the state of predict the effect of an unseen intervention, I do already have a lot of but you know, you'll not be able to knock

out any combination of genes, right. That's you just have to come before your explosion, but what you would like to still be able to do from such data is be able to answer the question. Will what? What happen? Where I able to Old so, you know, no Kylie Jean, 10, 11 12 and I might actually be able to do that try to predict this kind of intervention affect. What do you think about this question of predicting the effect of an intervention? That's of course a causal question and so we've developed quite a bit of causal message. That's a message for actually learning the observational and

Interventional pain. And so that's what I wanted to. Just give you a brief overview on the first part before. I didn't go into how to integrate different daytime with allergies. Okay, so if I want to, as I said, if I want to predict the effects of an intervention to Avenue intervention, so think of the problem like this. So I have some data we're in some cells, I have observational data and 1/2 dataware. I did two different kinds of interventions. Where for example, knocked out different kinds of jeans. What I would really like to be able to do is predict the effect of knocking

out a different Okay. So that seems like quite a hard problem and when you talk about interventions you have to take a I called I called Viewpoint and it because you care about their actions right? Okay. So call do crafts, can be representative so-called with Raymond Network can be represented by a diver and she will write in the 1920s. Wear every note to think of it as a gene is a random variable and it's an everyday event. A particular Gene is a function and it can be a hyena linear function of its parents in this crap. So, in this case, X4 here is a

function of X 2 X 3 and Some Noise. Okay, because it's some, it's, of course random and the noise of course, again, it doesn't have to be Kelsey anyways and in general when you look at single dollar I think they'd certainly not. Okay, so this is the very, you know, standards and old model that I'm going to use but I really want to use it in all. Its generality name is here now than your functions. And here, you know, any kind of noise that you want to have Good. Now, if you want to think about you, no predicting the effect of an intervention. Well let's Now set up a framework for

actually thinking about the intervention and again, LOL come from the genomics perspective. So I think I'll of a knockout experiment. So that's a very invasive intervention, right? I'm actually going in and setting the value of a gene 208. You go in and set the value of a gene 2-0. Well then it doesn't matter anymore what its parents do, right? Because I'm just set to 0 Sonakshi change in your grass structure and actually removing all of the incoming edges to to the know them. It's repeating. All right, there's no more effect from anyone of stream of me. So that's a

super amazed if intervention that's known as a hard intervention where I'm actually changing a graph structure of the all of my parents. I'm intervene and everything else is cool to self intervention. So think about scrapple and knocked down experiment, where portable changing the effect of how X1 acts on me, or I am changing the noise. And in this area, both here and intervention stink of knock. Knock out some very invasive interventions and some soft intervention. I'm, so when genomics

interventions that are hard intervention Center question, I really want to be able to do with the phone. I want to be able to predict the effect. And I want to just give an overview on the kinds of things that we can do using this. So first of all, in terms of experimental design, I think this is an interesting Insight, which is maybe not so clear before and genomics is that, you know, since he's hard interventions or so much more invasive, right? If you think about how many knockouts can you perform in a

single cell without to sell dying? It's usually not too many because it's so invasive. However, if he's knocked out, once you can actually perform quite a few if you don't preserve the jeans too much. So it was kind of belief that The Knockout experiments don't provide provide more information than the knockdown experiments with respect to the underlying cause of structure does Gene Preparatory Network in the cell, but it turns out that's actually not the case. So what we proved is that the self interventions provide just the same amount of causal information as the heart interventions

to sniping's invasive. So I think this is an interesting Insight from a experimental design perspective. And then, we have algorithms Sosa for inferring calls on that works from a mix of observational in Interventional data. So they're all available here and I'm all of these packages where you can really just put in, you know, what are single-cell RNA seek? That is observational all of your Steam account data on northbound, 280 etcetera, outcomes. Verify that whatever we're doing here is is right.

So we're verifying it. So of course we don't have to True graph structure, right? So that's why I always post a question in the way I did to you, by being able, to predict the effect of an unseen intervention by the effects of an unseen intervention. And that's exactly how we also draw, Roc curves for our models. Call. The graph is something that then allows us to predict the effect of an unseen intervention. And that's also how we validate or or, you know, analyze our algorithms

in terms of how well they perform. Okay, so we see this. So we always Define understanding between regular Network as being able to accurately predict the effect of an unseen intervention. And that's exactly how we balance. Also you know, you can use these algorithms directly for learning differences of Caldwell graphs. That's maybe it's very often. Actually more relevant, maybe because you know you have maybe a data from a deceased State and a non disease stage and Billy, what you care about our differences between them, or you have data from different cell types and you actually care about the

differences between them. And so it doesn't make sense to learn to very large-scale, call Sawgrass just to take the difference, right? If the difference is small, you should actually be learning them separate or together. So let's say that's what you can also do with these kinds of, as well. And maybe another important thing where I think there is just still very, very little at work is this experimental design question, which I think is a very important one, which really require support more work, is the question of, you know, of course, there are so many possible interventions that you

could perform Rites of which ones are the ones that would give you the most information about the under-19 regulatory networks. I'm so we started looking into this and this is certainly not the end of the story. So here is just one way of doing it by taking a Bayesian approach and taking that patched, a setting where and I'm full. So where you have constraints on costs because, you know, alphin Nike, the constraints that you have a say, I have a bunch of 400,000 cells that I can get miles. I have a budget for how many interventions I can perform and also I have a Time budget so that

means maybe I can do four batches of experiments. And now what I would really want to know is for every batch, what is the optimal set of interventions? I should do right in order to learn the most about the underlying causes the stomata Network, and then you get to see this information after one batch. And then you decide again on the next revengeance that you should perform. Now in general, you can prove that this is an empty heart problem so you would have to actually innumerable possibilities but at this problem has very nice truck service in particular, its butt modular.

Meaning that to know if you already have a lot of data from one intervention and getting more data from that particular intervention, doesn't give you as much back as if you get more data from an intervention where you have very little they don't come now. So this the diminishing returns property actually helps you that you can come up with a greedy algorithm that where you get guarantees on how far away you are from the optimal strategy and then that's where you can come up with a strategy that you can actually compute. And, you know, does well, in terms of mean obviously not as well as

the strategy where you and you worried all of your all of your different kinds of interventions book. It's not too far off from that particular, often strategy, but I think you're still a whole lot to be done in particular. Just let me also say something about the other limitations of this particular water cannot do, is it always takes only into account the setting where you can just do an intervention on a single note at a time. Now, obviously, nowadays, you know, we can knock out multiple turn down multiple jeans at the, at the time. And we would like to have a strategy that can

actually predict, which, which ones of these? You do multiple knock out. All of these algorithms that we have for learning. They can use in odata where you have no Couts on on multiple notes at the same time. So we'll have that works in terms of learning. Okay, so this was a bit about, you know, how you can integrate observational and Interventional data in order to learn the underlying cause of cramps. And then based on this causal, graph, try to predict the effect. If, and yet, unseen intervention. So that was all on this side of the of the graph on the gym

Repertory side of the of the problem. And now, as I said you know what, I really am interested in this, a question of like how does this one? The information I give rise to a few different cell types and Erica it has become clear more and more that you know the packing of the month of the mechanical state of a Cell match results and that's usually measured with images right to more than more than with with expression and So what we really want to be able to do is actually integrate these different kinds of taken with allergies. So images on one side and in gene

expression that work on the other side or even other modalities like maybe hi. CJ doesn't care about talking to you. And so, let me know switch gears a bit to talk about how to integrate different day than without of these. Instead of, you know, just one without let you know, servation Inland Interventional dating Okay, so far this we have come to love off and coders. So what are all turned colder? So wild and coders are just a special neural network and what is special about it is? You know, it's not the ones you for classification but it goes from because it goes from, you know,

some input space say you have images of size d by D to the same output space so it might damage of size. D by D two images of size d by D. Okay, it doesn't give you out to label. It actually gives you out to put in an image of another image. If you put that spits out and Arnie structure of the same dimension. What is interesting about it? Is it consists of two parts and end code, or part, which is the first part of the neural network and the second part, which is called the decoder disk space here in the middle, is this late in

space representation and so just as you can see it as a nonlinear often used as I mentioned reduction reduction, and we'll talk about that later but how is a trained? Well let's train to just reconstruct images. So you hot or reconstruct or whatever, you put in that, you have your training examples and minimize to yell to Norm of your training examples. So what that encourages the latest space here. Is it encourages the vacant space just to keep all of the information necessary in order to be able to reproduce the images? Or

are? They see themselves? Okay, so that's going to be a representation of the images, or you or your data, right? Important information necessary in order to be able to predict at 2 to reconstruct the training images that you put in. Okay, so that's why, you've course you can see. If it's just everything is linear and you only have one layer here, right then you can see this SPCA. That's a very special case of this, but in general, you know, you have your multiple layers and multiple layers to go out. Okay, so I'll take orders after he was all over the

place, right? And and computer vision Etc. Symbology and what I want to argue is that all drink holders are super interesting and if you want to do the state aren't equation problem, Okay. So how can they be used for data integration? If I have you no say images and R&A seek that I want to be able to translate between them, right? So really what I would like to be able to do is you give me an image of a cell and I would like to be able to generate the corresponding or neisy picture. Okay, so how do we do this? And how do we do this without encoders? Okay. So

first of all, no, queso let's look at the following problem and this is at this is of course an assumption of the office particular method is that I have a population of cells and from the population of cells I randomly picked out some for Imaging and randomly picked out some for sequins or some for high-speed data or song for chip seek whatever you want but it's always the same population of so okay so if you so what these different modalities are are really different views of the same population of stuff. Okay. So what I'm going to do is I'm going to have a late and space

that kind of represents the state of the cells, okay? So, before I was talking about cuz I like you, so you can think of this is, you know, the wiring pattern off all of these different cells, but this is the state of the cell, which I don't get to observe directly, but I get to observe. Excellent. This is you know, just like in space will play an important role. So now the question is, how do I get to Atwood Lake and space representation of Aldi's? Sell the information that I had from the images from Turin, a secret picture. Okay. So

how am I going to stop feasting damages. I'll have a chip secret cetera and I'm going to train one or two and coder on each one of the date of modalities cuz I'm going to do this here or I have here and going to Ale some space. And I'm doing this for you images. I'm doing the separately. So for images, usually, resides they are convolutional network that I'm doing the work, I'm doing the separately for chips, each day. At cetera, right? For hi CJ. Do maybe you can use a

graph neural network. You can use any of the networks are specialized So okay. So I was explaining that and, you know, each one of these alternate coders from these different. They done with allergies has to match has to stay straight because because I have this assumption that, you know, he images that I took they come from a cell from the same population of cells and darkness equals. It comes from the same population of South. So I'll be in space, it has to be the same distribution, no matter whether I come from image land. Or I come

from Arnie's Pizza and or I come from Topeka, And how you can do this. It's just buy it. You know, like how we usually do these things with neural networks is just you. Add another discriminator in the late in space which punishes me. So this is graminator. If I can you tell whether a sample came from image land or from our Mac? Well then I'm going to be punished because, you know, I want to actually match each other. Do you have a loss of function? That is the Reconstruction loss and each one of your data with all of these. But you have to sit in the Lost in Space

to make sure that whatever the distributions are that I got from each one of them. I know the amazing thing about the Nelson colder is that now I can actually go from one without a key to another one, right? So the Austin kosier is a function that goes from hearsay rd2rd, right? So I can go to the lake in space and in fact, I can take any point in the late in space and I cannot get back to this case to images. Right? So what I can do is I can take a s a r n a c profile, go to the lake space with the encoder, then use and decoder of a different author encoder to go to image space.

And so this way I can actually translate between RNA, see profiles and images sent, you know, images and chips Iggy, and whatever date that you have from the same population of cell. Okay, so this is what we did down here. I'm going from Ouran, AC to Gypsy gorger is actually paired data so that we can actually validate these kinds of my system toys, where you can actually see what is happening because it's a bit more difficult. As she know kedarnath, you can ship. See profiles, what say, you know, my date of birth women, blonde haired, male black hair and nails,

Etc. Right? I'm so what's this salt? Encoded and can do is take in say, a black-haired woman and then ask, how would she look like where she belongs, right? This is a generator that much. This is not a real and I tried, but what I did is I took the Alton Kotor to go from here to Layton space. And then I took the other alternative that was trained on blond hair, women to go to this blonde, haired space, Okay, so exactly, this is what you can do with our Mac Gypsy converging etcetera. And so I want to show you and maybe I should say. So for our native

troops, take your happiness, other methods kinds of methods. The problem there was a way I could not be applied to to Imaging data is that they are variables across the different, they done without that. He's too much or so. For example, you can do things that change level so where you have variables matching each other. But of course if you want to go from Imaging images to RNA seek, there is no variable in an image that corresponds to a G, right? So this is so that's where he's actually needs this kind of

different Dalton holders and he relaxed you show you how it works off going from images to RNA seeking how we actually validate it. This So the particular problem we looked at was and T cells. So this is a collaboration with Shiva Shambo shankara screw up who just moved from Singapore to etah direct and everything I've been doing with imaging is always and collaboration. Was she was loved and so they had a real paper in 2012, E4, plus she sells and they just look at that. There are two different populations of cells, one population of cells that that is, that

has more heterochromatin in the center. So you see it's here and another population of cells that has the outside. And so, And in fact, he's there was a functional state. So the ones with the Hatcher permitting inside. If they were softer T cells, they have a higher transmigration efficiency to have accelerated activation as compared to the T cells 64 + T cells with a hat-trick, do you know? So we want to bring sample which genes are the ones that are

may be driving. This kind of differences in terms of activation efficiency, Etc or being poised for Activation is Septra. Okay. So what does I said? So the first thing we did this, we wanted to see what your something like, this can work in humans. So here we looked at the state of that single-cell, RNA sea potatoes, that's in humans and she sells and we found again within a few cells. I'm here in blue and green. You see them here on our PCA space and blue and green here, I'm just also flawed and deactivate us that stage because then

you see that one of this is just how we need to soak. The one that is closer to the activate, Upstate will call it the state of Maine, youth. T cells? That is poised for Activation of City for plus, he sells on the other one. So, in this case, the blue one here, which is farther away from the activated state to a cold eeze, the client sent, one of them is closer to the activated. Okay, so this is the first thing out. Of course, he wanted to look again and images with a real CDs to clusters and because, you know, this was done

in my house before. And now we did here in humans. And again, you seem very, very clean two different clusters, one that has more Hatcher chromatin. And inside one that has more hetrick row machine. Okay so now what we did this we used our you know couple Dalton code of framework of going from real RNA seek they do to this late in space and bedding where we have two images and TRNA seek overlapping right where the two distributions are actually the same. And so using these to Alton coders, you can, for example, get, you know, take an image and go to the lake and space and predicts

for each one of the cells that corresponding or his profile, or the other way around, I can take on a sequel to the lake and space and predicts the corresponding image. Now, how do I validate something like this at the image and single-cell RNA seq down in the same image? So high. So the next best thing we're able to do with well. So we went over the first maybe a validation in terms of computational validation but one thing we can at least do is, you know, go from images to RNA seek, right? We have these two clusters of cells and we just look at that predicted the French

expression of each gene. Linda in the cells that are poised for Activation and Aquarius and won and lost this versus the, you know, the difference in gene expression that we actually do. You know, if this was perfect, you would like that. All of these points here on the diagonal, but you know, it's it's actually pretty good, right? So, no matter where I go from images to r, n a c, k e, r, n e, c profile and predict what the differences are between these two clusters of cells or whether I just directly measure the rmac, you're actually not far off your right? Even

at the level, of course, this is making all. I mean, this is some validation, but they would really like to have a biological validation. So of course we cannot get the full signal. So let me see profile paired with the next best thing that we thought we can do is. So we looked at these two genes that are in a very differentially expressed in the two groups, we can take any, we looked at all of the Chose the ones that have the best and then if you are simply build a witch in this case of all of the ones up here was kor180 and these ones up here was

standing, but we also have these labels of these to a protein's. Of course, there is a difference between the two different clusters. They do show with a friend sent the difference is going in the correct direction of this for 1 a.m. validated. Don't know what's 20000 plus. I think this is already pretty nice that you can actually do. It actually works in terms of translating between Okay, so how am I doing in terms of time? Okay, so so this is in terms

of moving around between different date. So let me get to the beginning is you do you also have another problem, right? Where's the destructiveness of all these different things that we have is really? Problematic is when you want to look at a process overtime, right? I am not able to get the Standard Time series datasets, we're just, you know, follow a cell overtime because getting the measurements, just means that the stroke destroying himself. So that's a problem you're tracing, right? So what I would really like to be able to do, is from

measurements at different times points of the same population of Dallas-Fort, of course, of different cells. I would like to be able to predict. How would they sell? Have looks like where I able to look at. I'm going to this really nice paper coming out of the road, if that's, that's used. I means that's a problem for single-cell. RNA seek unused. Optimal transport in order to Wolf between population of cells, stay over time between different distributions and say, you know, this could

be a different time points. That could also be a different intervention. Now, it's no optimal transport that needs to come up with a loss function, of how you move in space. You can use something like, you know, how do you measure a distance between to RNA? See profiles? Well, you can use something like L2 distance between them, right? Because you have jeans images. In particular. If you think of you, look at the tissue as I do, how do you do this for images? Right? So when images, I really care about this problem because I would like to be able

to, you know, right now I mean something bad time from attic is that somehow I'll never be able to do cancer detection earlier than a pathologist because I'm training my models on the data that I get from Pathologists. Right? So how can I predict, how a cell would have looked like Now classified as cancerous. How would it have looked like as an earlier time for welfare. I need to be able to make up my own data, right? So so if I want to be able to do it earlier than a pathologist nowadays, I need to be able to predict or two to generate my own data. And so that's exactly the

optimal transfer from right there. Half the normal State and I have saved the earliest date that the pathologist Kenda Kenda text. And now, I want to know from this earliest date that the pathologist can detect that many spells bear. Well, how would they have looked like that's like the earlier time points? Okay, so that's exactly what you, no one could try to do using optimal transport. Now, the problem is exactly cuz that's how do I measure distances between cells, when I have images instead of s a r n, a c Wright images, you don't pick. Someone doesn't mean anything or has no

relationship with pixel one, another image. So you cannot just take out to difference between a magistrate and they're not in the same chords in the system. So now, we're getting all turned coders can actually help you to do that, right? I can embed all of my images into a joint coordinate system, which is the states and States, just make them space and this is exactly what we did to your. So we have all of these images and this case in four different states of cells. We embarrassed them all into this coordinate system of justice, late and space. Now, I have a joint coordinate system and

now I can actually do optimal transport, right now, I can do this. I can actually get this map of moving, you know, from the Baymeadows Statics back to the cancer, has stayed back to the fibrocystic, State back to normal states of sizzling in breast, and I can do just transport transport map here in the Latham space. Another course, the Austin. Coulter allows me to get back in the ages, right? I can move around in the late in space, but then now of course I have a decoder, which can go from any point, ER, back to the image space. And so is that way I can take these, aren't the only real

cell images. Everything else is generated? Okay, I can take a mental status so I can map it to the latest face. I can use my optimal transport map to move it backwards in time and then I can use the Alton coder again to actually get the corresponding image, okay? So this way I can answer questions like how would the cell have look like at earlier times points? Kind of wish done this year. On this Alliance was down at the old Sonoco culture, system of fiberglass, turn and cancer cells, where we can actually do the experiment. Invalidate, the things like this work, and then we posted on

this on tissues, which I think Alton folders are super powerful Tools in order to actually move between different. They do modalities move between different time points, etc, etc, a single cell biology, okay? So here, this is where TED talks about is actually how to move around and how to actually get this to relevant problems on ecology. And maybe with that Sam, since I still have a couple of minutes, I can also tell you a bit about how all of these things were useful for us when we were thinking about

drug Discovery and covid-19 Do drug Discovery and covid-19. How does it fit into Kohl's ality and oldest? You know what they done to gracian questions? Well, you know drop drop Discovery Well, because of the given that its urgency, it's huge. It's mainly a drug repurposing question, right? When I would like to find drugs that are able to reverse the effect of stars cost to himself, now, there are all these huge drug screens that has been performed. For example, the steam update assets available, which

sample sizes, right? Of his gene expression Baxter's with thousands of privations and including say about thousand FDA-approved drugs on many different cell types. And now, of course, what you would read course. The drug has can have very different effects on different cell types. And in particular, for example, for covid-19 you care about particular cells. We'd like to know. Well what is the effect of these drugs on these particular cells that you care about for this particular disease? And maybe you know, not all

of these patients asking measured on each one of the cell type. So it's it's it's a question of large-scale data integration and ulcer cause a question because of course, so now what I would want to be able to do with, I measure the effect of a drug on one cell type and I would like to be able to predict what is its effects on a different Celtic where I have not yet to measure, right? And then I can correlate that's with a reverse effects of the virus of stars cuff to and try to prioritize drugs in this way.

Until we can alter encoders can be used to do this, right? So if you think about style transfer that's also coders and Ganz with that, you're having you so much to do is like, you know, what is this what I'm trying to do here is I'm trying to add a smile to a person. I thought I have here a person that goes to the lake and space and is this a smiling? As if the person person smiling I got the Spectre out and I put in a new person. I would like to add a smile to this person will have style transfer works. As I text just next door here in the lake and space, I move it up to this

person. Okay, and I know what does this point are delayed in space actually corresponds to in terms of an image and you see that this person is actually smiling. And of course I mean if you come from machine learning and this is probably the first kind of framework that you would like to try to apply in order to predict the effects of coming off of drugs right off of drugs. Well, and that's exactly what we did now and I won't go into this but I I encourage you to look at it. It works very well, it doesn't work. Well

if you standard Alton coders it works very very well if you use over parametrisch Dolphin Cove person. I think we also understand why. So, we have, we know now, though. The fires of uber parametrized, this will only work well, if two doctors are actually Alliance, right? If if I would take another cell type, where I'm also adding the same drugs, then it'll Athens Pistons Vector. Again, has to be aligned if they're not a line parallel to each other pointing in the same direction. Well, then I can also not shifted over and hope that for a third cell type, I'm going to get the correct effects

drop. Until just over permit regulation, in Austin, coders actually forces these different different vectors to actually be better aligned with each other. And that's kind of what you see here. What maybe I don't want to go into too much detail. I just want to say that, you know, in this way we're able to actually come up with the prioritization of drugs and you see it came out very, very clean. In the sense that we basically only found two different types of drugs, which is the chart Easter and true. Nine protein kinase isn't done these receptor tyrosine, kinase and some other drugs are

about to go over and expecting it to be. So cleanup makes you very happy that you're able to get out of this. And then, you can also look at the mechanisms of these drugs. Now, of course, again with single-cell RNA, see if they do with the kinds of methods that we looked at at the beginning of the talk, right? I can now look at well, you know, I have particular Focus, your efforts to ensure are there targets and I can look at 12, I would like, that's a drug targets are Upstream of the differential Express jeans by the virus. Write the French Express with us are scuff to war without

sarscov2 on what is nice as this real K1 is in fact Upstream of funeral most of the jeans whether you do it in a 549 cells are in 82 cells as the super interesting protein and you can look at the paper and it's also it has been found to directly buying to sarscov2 proteins, which is, of course, not something that we put into when we started this analysis. And I'm so, I think it's a very interesting Target that was found some of this analysis and of course, it's about validating this as well. Okay, so I thank God this is this integration of different data modalities and and in

particular, at causality also needs to project play by play, Kimball Caldwell drug Discovery platform. So I think he's still quite a lot of Warcraft. These two papers here in terms of how we applied this and you know, the thinking behind it. And with that I want to stop and think of course, none of this work, would it be possible without an amazing group of kitschy? Students, Master students are undergrads postdocs collaborators and then of course. So thank you all very much. Thank you so much. Carolyn

time to think about to think about some of these problems. And there are a number of questions here that came in on the poles. So I'm going to start with the most upvoted ones here for the Imaging data, does it need to be single cell images or multiple cells can be present on the same image? Can you have a single cell images only with multi cell images need to be segmented for the training of Auto in quarters? Yes, so let me go back to that little picture. So the

thing is that you know what we're doing, I mean, okay, so what is that you have the saying that you're doing the same thing in all of your data modalities. So you're taking some Aggregates in your rnac, then you can take some Aggregates although this is like harder to do in your images, right? So it just has to be kind of the same things that you're measuring. And what we do is we cycling them because we're looking at single-cell RNA, see the images of you know sometimes so maybe you would want to

match multiple images to one particular point in the our native land or in some other data modality enter a ways of doing that spots but that's harder I think of it more carefully about it for now. It's all single. So what we've done here, Just to get the set the population. You know, you're looking at the same question. I think you just like magic for many of us that it might, if it's worth asking us again. What is the advantage of reconstructing, the missing modalities with the decoder?

Phase, rather than work directly in the load rating space? This is a good question. I think it depends on what you're asking, right. So, so if I would want space, I think it's really nice that it, that it contains all of the information of the two modalities. So, if you want to do clustering or, you know, anything like that, you, I would do it in the late in space, right? Because it contains all of this. It's an aggregate, the full of huge difference in the modalities. And I also think of it. And, of course, don't really have a

proof of it as being more causal than anything else outside. Because, you know, it's kind of It's only containing information that is consistent across different ways of you and did. They do. So, for any, of those kinds of questions, I would do it in the late in space, right? But, you know, enter enter in terms of interpretability, I think, I think this decoder. What it really helps you is to get an interpretation of what is late in space components mean, or what is that, what is that different directions? Mean that's often would we do, right? So in an image right, I say I have two

different clusters which is what we had four different he sells right and I can walk into I can in the late and space. I have the two clusters, I can walk into this particular direction and now I can look at well what happened since you next Russian land? If I if I actually do this I can, I can, I can take like a very small step in lathan's face and I can look at it. I just take the differences between them in. They are nasik space and I see which ones of the genes are actually changing. What I'm doing this kind of Direction, walk on the same as what we did. And here it was very important

when we looked at these cancer cells right? Where I wanted to walk backwards in time. So she her again, it was very important that I can go back to the Imaging space because I would take a little walk into the direction of earlier times points, and I will take the differences of the images and back gives me biomarkers, right? That gives me biomarkers for the process that I actually care about. So it just helps you in terms of Okay, can you only learn cheerleading spaces? If you have multiple modalities for the same cell because there is some acid to give you multiple

modalities day for the same tissue, but unmasked cells Oh sorry. So all of this was not matched self. So yes, so you can do that in tissues, right? So this is important to say. So actually here, since we cannot master in the same cell Daffy staining and honesty, we never measured it in the same still, okay? So it's basically the same as what you're asking. Like we have a tissue and some cells were taken out for Imaging in some cells were taken out for sequencing, so we never had any paired measurements Bots. The distribution should be the same because we're assuming

that they were randomly sampled, the ones that go for Imaging at the ones that go for the distribution in the lake in space with me to stay, right? So that's what we're using. We're only using the paired data that is available where you know the two things are you do the same things as measuring the same cell for validation so that I can actually work. So how would it be different than if you if you were using later with the multiple modalities on the same cells? But usually, you don't have to get so whenever we have to say so, okay, if you have it, you can add it in as an

additional loss function. Because you also say here, I know that this cell here corresponds to just sell. Well, if I'm at them to the Layton space, they should be matching on top of each other. So, whenever I have some of this, you know, data that is already paired. I can just add in, in the last function. I can just add another term that punishes me, if they are not close by. So and that's why so-so all of this can be used for semi supervision and we use all kinds of different things for her. And, of course, you do, you have a whole lot of unpaired data while you can just bored that are

they? Are you can add in this additional loss function, of course, help you to get a better and wedding cuz you can add in some knowledge that you already have. Also, you can add other things for Sammy supervision. For example, if you take two different cell types, right? And you have Imaging data and and sequencing data, well, you already know that the two different cell types should match to each other. So again, you can add that in two regular rice or they can stay. So, before or here, for example you might know what particular marker that corresponds to

differentiation here looking at embryonic stem cell. So we added the right. You know, this axis of differentiation should correspond to the axes of differentiation. And so you can say that these two things have to be for Saturday in terms of service supervision. The next question, I had been wondering this to what was the sample size for training? Me auto encoder and further, do you think using variational autoencoder? I would work even better and I always wondered about that approaches. That I said, has been successful in

image analysis where their biological problems where we have any fewer observations and seek with much more, right? Because that's just that were dropped a thousand and so that's a very good question. So usually when you work with images of what do why do you use It's just so that you know when you actually move around in the late in space that it will actually give you a nice image out, right? Those you care about this decoding because that you want them to have mass everywhere in this probability space. So that way you get out

actually really corresponds to a real looking image or a real looking expression, Catherine etcetera. However, you know the lake in space, right? So that's a problem. When you want to align different distributions because when you have completed your part is, if you make it completely down soon, then I can align any distribution of the data sets, right? Then I have no more structure in the lake and space. So then of course the GPS, but I'm so if you use a

variational one, just use a very small penalty. I think you've made a good case for autoencoders, but there are a number of people who are interested in and what advantage, they have over basic mention reductions like pcar in and out. Yeah, so we could always compared to PCA. I mean, okay. So actually, maybe I can come to this. Let me do. The drugs is of course it in your right, since your I have a okay? So can you see? I can do PCA shirai wanted to get this truck alignment strike. So if we just use, for

example, the first two, principal components, I get a perfect alignment of these drugs that's basically all centered at 1-1 or - 1:00 - 1. However, I doubt this by just getting rid of a look at it, if I wanted to the classification test, where your perturbation or were you not so well, it's basically like random guessing. So however, if I use an all-time colder here in the last one, you know, I still get this alignment properties, but I also get actually because it's over Paradise, perfect

reconstruction. So, so that's the thing. I mean, he's Alton coders, is first of all, they have to The end of the firestick actually wants to do a line is kind of features and that we like them to align and we understand why. And the other thing is, of course they are nonlinear, right? They can actually get in bedding that is not linear. It's not the case that you know here if you take and we may be moving around here, it can correspond to something very, very nonlinear and damaging space either.

That's really what you're making use of. It's like, somehow like this all turned cold or what these nonlinearities do you into a Will make structures that are you know as we saw there are more lines more linear but of course when you're here can mean something very, very complicated outside, which is not something you can get. When you do, PCA mean you'll never be able to get service of course. alright, thank you so much, and Even though it feels like, you're talking to yourself in this consistently. I don't know how to

watch me again later, and we all learned a lot from your appreciate you and thank you for the introduction and stay invitation and was really fun to do this. Thank you. Thank you.

Купить этот доклад

Доступ к видеозаписи доклада «Caroline Uhler, Multi Domain Data Integration From Observations to Mechanistic Insights»
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно

Ticket

Доступ к записям всех докладов «BioC2020»
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Билет

Интересуетесь тематикой «Наука и исследования»?

Возможно, вас заинтересуют видеозаписи с этого мероприятия

27-31 июля 2020
Онлайн
45
19,14 K
bioc2020, bioconductor , dna methylation, epidemiology, functional enrichment, human rna, probabilistic gene, public data resources, visualizations

Похожие доклады

Kelly Street
Research Fellow в Dana-Farber Cancer Institute
+ 2 докладчика
Koen Van den Berge
Postdoctoral Researcher в University of California
+ 2 докладчика
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Michael Love
Statistician в University of North Carolina-Chapel Hill
+ 1 докладчик
Avi Srivastava
Postdoctoral Research Associate в New York Genome Center
+ 1 докладчик
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Lambda Moses
Researcher в California Institute of Technolog
+ 3 докладчика
Ellis Patrick
Senior Lecturer в University of Sydney
+ 3 докладчика
Dario Righelli
Bioinformatician - Computer Scientist в University of Padua
+ 3 докладчика
Lukas Weber
Postdoctoral Fellow в Johns Hopkins Bloomberg School of Public Health
+ 3 докладчика
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно

Купить это видео

Видеозапись
Доступ к видеозаписи доклада «Caroline Uhler, Multi Domain Data Integration From Observations to Mechanistic Insights»
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно

Conference Cast

ConferenceCast.tv — архив видеозаписей докладов и конференций.
С этим сервисом вы можете найти интересные лекции специально для вас!

Conference Cast
1497 конференций
47700 докладчиков
20185 часов контента