Мероприятия Добавить мероприятие Спикеры Доклады Коллекции
 
Продолжительность 50:37
16+
Видео

Aedin Culhane, Workshop 200: An introduction to matrix factorization & principal component analysis

Aedin Culhane
Senior Research Scientist в Harvard T.H. Chan School of Public Health
  • Видео
  • Тезисы
  • Видео
BioC2020
29 июля 2020, Онлайн, USA
BioC2020
Запросить Q&A
BioC2020
Из видеозаписей конференции
BioC2020
Запросить Q&A
Видеозапись
Aedin Culhane, Workshop 200: An introduction to matrix factorization & principal component analysis
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
В избранное
343
Мне понравилось 0
Мне не понравилось 0
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
  • Описание
  • Расшифровка
  • Обсуждение

О докладе

200: An introduction to matrix factorization and principal component analysis in R

Aedin Culhane (Dana-Farber Cancer Institute)

1:00 PM - 1:55 PM EDT on Wednesday, 29 July

WORKSHOP

This workshop will provide a beginner's guide to principal component analysis (PCA), the difference between singular value decomposition, different forms of PCA and fast PCA for single-cell data. We will describe how to detect artifacts and select the optimal number of components. It will focus on SVD and PCA applied to single-cell data.

Moderator: Erica Feick

О спикере

Aedin Culhane
Senior Research Scientist в Harvard T.H. Chan School of Public Health

My lab in the department of Data Sciences at the Dana-Farber Cancer Institute, and Biostatistics at Harvard TH Chan School of Public Health, develops computational approaches to integrate and analyze large scale cancer genomics data. We are developing in multi 'omics integrative approaches and bulk 'omics data and for single-cell data to support Human cell atlas project. We are working to better understand the heterogeneity and cellular complexity of the tumor microenvironment, and discover intercellular communication between tumor, immune, stroma and other cells in tumors. We develop methods and Bioconductor/R packages for clustering, tensor matrix factorization, dimension reduction and integrative exploratory analysis of big data in genomics. Specialties: Bioconductor, R, bioinformatics, genomics, multivariate analysis, multi-omics, biostatistician, transcriptomics, proteomics, epigenomics, computational biology, mathematical oncology, tumor microenvironment, tumor immunology, neuro-immunology, kidney cancer, breast cancer, ovarian cancer.

Перейти в профиль
Поделиться

so this is the workshop which starts what really Basics and explains how he's he is calculated and then how we can build on that to do other types of Matrix factorization I'm this is a workshop that I'll be presenting with Lauren shoe who presented the Corral package yesterday. And if we have time and if there's interest, I can talk about multi-omics Jesus analysis, which is a matrix factorization approach for multiple data sets and and Larry to run this Workshop. The

easiest approach is literally to click on this Workshop. Biosi kinds of data sites that are and I'll show you how to do that in one second. Alternatively, if you don't want to run the code on, and you just want to follow along with the VIN. Yes, you can click on the website and this is just the vignettes and in HTML for Mash. If you have our 4.0 or if you have developed and freshen up or feel free to download the mpca workshop. The main reason why were using the cloud for sure is because we expected. Many people may not have the most recent version of our installed and they are

child. Russian has everything there that you won't get caught with installed packages, not available. But if you just want to try and run this or if you wanted one, just the crowd package later on. You feel free to also install Finally probably the more complicated approach but it is available if you want is a Docker image. And if you don't report which means you download you talk or run and that was starshop a dump for instance, which is just a process on your machine. After which time you open up your browser, you type in local house,

8787 because that is where it's being broadcast to spin broadcast to 8787 that supports. So, if you open up a browser on that porch, you should see the oxide instance, which is the exact same as what is running on workshop at the Bayou C. Org the number to dr. The containers, as well as I am, just going to start that up. Select the PCA one. The idea behind any Dimension function is to simplify the day shift or to reduce the dimensions of the Dacia and possibly because I spend a lot of my time. Cleaning up the kids, I was attracted to this image and the idea here

is that given a matrix are a large set of day, so we can find the area to inform her to pay. And that could be the size of the objects that could be information such as the color function or potentially other hidden information in the Daysha. But that's Heidi, very intimidated, the large source of information and therefore it's something that we can find in the fascia and all of these types of relationships, two-dimensional. Doctor, Matrix factorization ordination factor analysis. Factorization, principal component, analysis, waveless analysis, waiver,

T composition, spectral, analysis, non-negative, Matrix, factorization. It's really important to learn, is basically a speedy and the was 18, different methods. Compare to this is Sonia Shah of the University of Michigan, and was published in genome biology was first described by Pearson in 1904 19:01 1904 19:01 for These are really, really radio Matheson indeed as the dataset gross, he's actually become more similar to each other. So, but these methods are the basis of most modern genomic data analysis, some sticky single-celled, whether we're

doing such a strange neighborhood analysis, or trajectory and friends. The first step in all of these analyses. He's not, she's some Dimension reduction. Even if you're visualizing with his knee, or with Yuma frequency, the first step is a dimension reduction frequency. That's PCA What is this with the idea behind the Matrix? Factorization Dimension reduction? And I keep saying, these words, interchangeably on purpose is to find the smallest number of linear factors that explain most of the variance in the days.

By and large the way that this is done is fire SPD so it's important that we get a concept of what this is. Singular value decomposition or SPD is a matrix operation, the given a matrix, it produces three matrices and these are special matrices with really nice properties. Okay. So you and z are called singular and make sure the last of the right singular values and then B, is a diagonal matrix of singular values. This Matrix is all zeros, except for the diagonal. So, essentially, it's a specter. And that factor tells

you how important are the each of the different eigenvalues are singing about the profound or I need a ranked at the first of the largest. The next is the next largest and so forth. And these, I can find use correspond to the First Column hair of the first row hair. This Matrix is transposed, okay, I need the number of components that you're finding I'm so this case here, we've got for the rank is for You will see the face Matrix as a four-by-four to K by K Matrix. And this Matrix here is for bikie the original majors. And by tea this is end bike. A this

is nyp And again, T's vectors are ranked such that the first represents the most information and so forth. And equally the vectors in this Matrix are also right. Both you and Vee. Are orthogonal matrices. And also panel matrices, have a very, very nice properties that make very, very functional. When you make multiply the transpose of The Matrix. By The Matrix you get the identity and I know for some people this is like really basic and four other people are going to bring me back to like you know, high school math. So the identity Matrix is a

matrix in which all of the values are zeros and the diagonals are one So essentially is a matrix of just one of the diagonal, okay? And you are both are talking tom exercises on Matrix. If you actually take any pair of columns and take the dog park between those, you get zero. And this squared elements. Some to one. So these are really nice properties and also, if you take the sum of the squares of the Matrix, you actually get the sum of the squares is. So there's some really, really nice properties. That allow a person to use this for some very nice

approaches. Is to Maine is probably more than two but there's definitely two approaches that people use for explaining a speedy. One is this Matrix based approach the other the geometric approach and I will give some references if you want in order to explain you may find one easier than the other because the sum of the squares of the columns as one and you're dealing in this unit circle space, you can actually do what you must approach for your looking here. If you can remember my toddler's

hair men in your trigonometry and you catch a cosine and sine and you can all of your ankles, you can actually represent SVT and this space as well and there's a whole school decomposition Dallas which based its based on the geometric, interpretation of SBD and the geometric, interpretation Matrix factorization and that's pretty much the French stool Okay, so we're going to stop at this point, and we're going to go to the VIN. Yes! And I want you to actually test this, don't take

my word first, please actually do this. And I'm going to introduce you to really, really simple dinosaur, which has got five variables, or you can create an our Norm. This is a random normal a hundred by 50. So we're taking 5,000 variables are Recreation 50 columns. So we can just make ice Matrix and you can use this Or you can use the wind at assess. And you're going to run a sweep and then you're going to test these properties here yourself. I'm not going to

give you the code and when people have it worked out I want you to indicate on the poles. So we'll put down a little hang out the pole completed SPD you transpose and you be transposed be and I want you to like hit light and that house looks like there's a lot of people that have hot at that actually have the solution. Okay, so let's go to website. I have way too many things open here. So I just clicked on the link and I started up the browser if you want to go to the package downsize, if you want to install the pockets, those are other approaches.

There's a couple of different ways to open up the vignettes and to get the code, and to get the HTML rendering of us. You can either browse the vignettes if you want, or you can just find a package. And I'm just going to go down to Tri-C a workshop here. Just got up some couple of us. There's actually three different sodas for vignettes in this. So there's one coat introduction. This one called PCA, there's one called COI which is correspondence Schnauzer and a single-cell RNA seek example,

and we're going to use the single Silo and I saved the other two were very much for reference mentioned briefly. So if you click on the intro one here, Okay, I just clicked on the pop out. This example here, I'm loading a couple of packages. The dataset that I'm actually loading is in the package 84, and we're going to look at wine tasting in Bordeaux. And so, you will know that by looking at by loading Jason for do. And this is a really simple that you just ask where

different table wine taste by experts. I was going to give you a little bit of background on wine because, you know, we need to take time just so I'm in France, Grand Cru is like, the most has the most legal restrictions and how it's produced and it's a higher value. Where is the table wine? At the bottom here is the least restrictions and it's the lowest value. The cheapest. And then there's an ordering here. Basically from you know the cave of wind to reach you'll wind up to the different Cruise which are basically think the

The Vineyard ones. So We can have a look at the station. You just load in the libraries. You can also in the packages here or so in The Help. If you want to open up the source or the r code, that's also really useful. okay, because with you open up the r code, you will literally just have the actual segments of Cortex like I'll just show you what the stages that looks like. Merry Christmas. Okay, so this is basically churches. Raise wine is excellent. Good mediocre pouring, okay? And this here is just looking at the scores of the different wines from good

to bad. This part of the vineyard hair is absolutely not something to do. This is actually, and because of his wine, I thought it might be nice to look at Pieology. I'm team with G+. Mology is the cultivation of fruits and it's those, like really pretty pictures. These kind of pictures here of rapes. And so there's actually a Gigi pomology data set which allows you to actually generate bar, plus that look like they're on aged paper with Okay, so I'm going to not talk anymore. You can

work through the thin. Yes. And hit on the pole when you actually complete the task of checking or actually I have a question in the polls of what is fa the first value in the initial graph. Oh, that was from the Sun at all results for factor analysis. Okay, so this is how paper compared to a c Matrix, factorization methods. And if they is factor analysis and pcas is principal component. Analysis, alysis was originally described by Pearson in 1904 and PCA was originally

described by spearmint 1904, so they're both asleep. Chrissy overs. And if you actually listen to some historians, even trying to bring that further boxasian, hundreds, we have another question, call. I'm not sure if you want to address this now or later, but recommended reading to learn algebra. Many biologists rarely encounter this type of mathematics and their undergraduate degree. Have we already missed the boat? I will put that off of the end of the slides and

it is meant for ecologist, which means it's very accessible to biologists. And for each tries to access every type of learning because it will introduce Its Harry. It was introduced the mathematical notation and then it would take a really simple nature. And it will actually multiply it out and you have something, you can calculate with the phone. You got like five numbers on them and you can actually do this, the multiplication calculator. So it does every single stack the whole way. And it has used pool tables that describe, what's the

difference between one method, another message and one of the problems with one versus the other. It's actually really, really nice book for the other books that absolutely fabulous and the garage door code for they just explains that the theory behind. Your soul is Susan Holmes and what kind of Hoover's modern statistics book that is established That's a green screen problem. I'm going to show the book. I can't agree. It's not showing up. So Susan Holmes and Wolfgang Hoover's book and

there is actually an open-source portion of that book and for every single example in that book that have the r code and state you were really fabulous job as well. So I think I'm both of those are great sources to start with and put up is a video, series University of Washington which is really good as well. I need the questions on how we doing with the four people are. We have a question about, where's the link to the vignette link to the media, is in the chat and then we also have

another question clarifying. What does it mean by the column sum of squares in the task? Soaks to to sum up the squares of all of the elements are calling. This is basically verifying these two statements hair at the end of product of the dot product, is of the pair's ugly. We're not doing this one that the pairs of the columns u r v equals 0. I haven't actually asked you to do that to you, the transpose of You by you, and the transpose of b e v e. Enter to verify that that's a hawk. No and then the second one, here's the squared. Elements of The Columns of u r v. Some to one

of those are two important properties off of matrices. Okay, we have another question about our. Would it be possible to the slidely of slide will be sure afterwards. And then where should I communicate to the exercise is finished? How do we have everybody? Upload the where should I communicate that? The exercise is finished. Question, if you're done, we're going to get into more detail. Okay, so 13 folks have said they are done with the exercise does. Anyone want to share their screen?

I want to try make this interaction of watching, a lot of them workshops so I basically just went to the vignette, on the scale function to scale all the values in In the Border, dataframe. To be in the same scale I guess just subtract the mean and divide brother. High standard deviation or something like that. And then I tried to multiply Where's that function? Yeah, it's so sorry. It's a little all over the place. But yeah. So I tried to multiply everything to reproduce the scale for do Matrix. And

yes, it is out. And I also tried to multiply the transformed v x t. The regular Matrix V. This is the stands for the dog product in our and I got to get ones across a diagonal. And I did the same thing the other way around 2. That's pretty much all in good shape. At school. Yeah. Speaking and pretty much most Matrix factorization methods is linear, regression analysis, SPD Lake factor analysis book. The most common approach used in almost all like modern Matrix factorization, that's payday. And this is what's used in all of the single cell and that date analysis.

When you're talking about these three major cities when computer using any of these approaches, the actual matrices and The Columns of those major cities across different names and that can be really confusing to the beginner. Is singular vectors Canal because the principal components, the principal axis, the Layton vectors the eigenvectors, the eigen jeans, if people drink face recognition, sometimes they making faces and just be aware that there's a lot of terminology there. But at the end of the day, you're looking

at 3, matrices one, which represents the rose one, which represents the columns of one, which is a diagonal matrix, that diagonal matrix will provide you with the eigenvalues, which are ranked from the largest to smallest and you can look at those in Elizabeth class or less line place like this to determine how much information was captured in each of those components. And the kaiser roll is pretty much just the album rule, but I'll tell you a couple of better ways to slap components later on. So, what is PCA? Then with PCA

is just an SVT, a singular value decomposition. So this is this approach. Will you take a matrix and you decompose it to remixes which represent on car, leases and vectors, which represented fair is Medusa? But there's two forms of PCA is PC of the covariance Matrix on PC of the correlation Matrix. PCA of the covariance Matrix is simply that you Center each column and then you do STD PCA of the correlation Matrix is the score each column. So you subtract for each column, you subtract, it's me. And you divide by

scaling Factor typically has done to TV station. Look up 33 processing steps that you may take when your standardizing or normalizing, but they sure are all things that will change the outcome For Better or For Worse. So it's important to consider one of the actual pre-processing sucks that you're applying to the data and how they impact exploratory data analysis that you do downtown stream. And most of these approaches that you do with a drink standing or scaling are centering.

Standardizing a transforming the days of example, of transforming the days if you're trying to reach is the actress Cassidy. You're basically make it look a little bit more normally distribution and we reviewed this in a recent review and I'll give you the Information about later on. So I was two different PCS approaches and Carnation base, PCA, which is basically, PCA of the Central and scaled the most common approach. And how do I know that's what, is PCA off the center station? And when I looked at the actual quote on GitHub was right, kindly

provided by the author. Hence, the brownie points are ice noticed that they actually ran pre-comp here, which actually sat with Century, twas true scale equals false Okay. So, these parameters here, change the type of PCA that you're using. And this is actually something that's very important. So I would like you to do, is to go back to the Winans holiday obsess and do the PCA. So, basically do scale equals. True. And centricos. True scale equals true or she'll century was true skating was. And then do it as Speedy as bad as doing correlation

Arco station based PCA. And when you have that John just up ways. Lauren. Is there something in the pool for them to help Waze? I will activate a second. Is there questions? Yeah, we have one question about the SD is a numeric doctor. Shouldn't be a diagonal. Matrix is mentioned in the vignette. That's a very, very good point. Yes. So dee can be represented. It's only filled in the rest of it is 0. Put in a speedy, it returns actually affects our, it doesn't return The Matrix. If you wanted to generate the Matrix, you actually run this function

diag. And that's what I actually just do this here. And I will show you what it looks like. So if you want SPD here on the Bordeaux Daysha, says, you have the singular values that you Matrix and the V Matrix hair. The original Gordo de success. With four scores and five ones. Sophie B Matrix hair is representing Excellence. Good mediocre pouring and these are the scores on principal component, 1, 2 3 and 4. You said you're getting and then the umatrix here represents the ground crew the basically the 5 table wines.

And the the actual be here is the actual if I just do a plus This, here's your script, ask are the plus hair of the eigenvalues and to actually see that as a matrix, you can just do what I actually did. When I did that fact, multiplication. So, this is the, the coach are multiplying you by the, by the transpose of the will actually return axe. Okay, we have a question about how do we know which PC a scale equals t or F to use? Does it make a big difference?

So spd's vbid by we transpose and that is actually the formula for s p d s dollar V. Make note 5. You need to use the key function. You completely rice. Makes it look a little bit fresh air. Okay. So although this is the math approach. Most people don't run, PCA using scale nsvd. I just wanted to show that it could be done using that approach, okay? Most people will actually do is they'll actually run precum. On the actual Matrix. Free compass do is it actually gives you much nicer output than SPD? and then there's a package that I quite like,

Which is called Explorer, does a lot of other package PCH who was also excellent package Explorer. and if I just, I'm just going to save my and then, I'm going to go explore on a Basic she generates quite a nice little interface are exploring kind of the results of a dimension reduction place and at least understanding us and this also visualizes the unit circle which we mentioned earlier and we can play with label sizes and you can do last new selection of variables

if you want and you can look at the individual and individual states. and you can see here this case that the At the first component represented 56, 53% of the variance. If it's in the pinata just explains that, okay, I'm quite Keen that we get onto single cell. So that's actually just go through the slides and actually start that part of the vignette. Because I'm a sweaty important question about GCA, was just how to decide what to do for the scale parameter. Like whether

you want to do like whether to scale or not. So whether to use per cup per print come That's actually so deep. The thing of Creek, pumpkin / Compass. Actually described in the other Vin. Yes. So I'm in the So in the end, vignettes is actually three vignettes. So there's a PCA band. Yes, and that actually takes you through, I can analysis SPD and Alice pre-comp her, it does a d e for an alternative fact, in fact, or extra. So it is, it actually has many, many different implementations of PCA and I actually show how their help

speak prevalent across all of the methods. If you want to get the loading, if you want to get the scores, if you want to do the bike up, one of the functions and it explains all the photos equally we have one for correspondence multi-session. Strategy again. Okay, Levi. I'm going to go through the slides and we went through the difference between running a speed me without processing when it comes to Central Asia and the center has a scaled Asa. I'm the things

that you know, sis has been censoring is really, really important. You can't confuse the SPD sedation on Center and the second thing we noticed was art, affects and effects are really common in SPD and their car because there's Kurt, linear relationships between successive components. You can get this to, to saturation effects or if there's a gradient between one component to the next. So if you see these nice kind of arches, like these kind of things here with these arches, between pc-1 and PC2,

it's normally an indication that you have. An artifact. And that is information about your Deja. I'm possibly, you know, telling you information that you need to add scale these days. Oh and there's very very typical to see this gentleman station. Okay, so we are going to probably begin or if he's just launched. You want to run through the SC single-cell? Sink station? Sure. Can you advance to the next site? I think. It's was just explaining that has lots of different methods and will be

reviewed to this sin, Ming it out 2016. Okay, there's many, many different. SPD methods, or Matrix Matrix decomposition methods. We would have reviewed the PCA and correspondence. Analysis is one of these places them in the wisdom to mounted and single set for singles awareness, think I'm its relation to other methods that people have described and you can run this in the crowd package, and we're going to describe this. And if you run correspondence analysis Walton, PCA you get much better box, correction in

data sets such as the estimate. So the remaining few minutes but we have have, we're going to look at single-cell RNA 6 tasya and the dates that we're going to look at is The Zing Zang mix data that comes from a Buick Lucerne, 2018 package and it includes eight types of pre-sorted cells, including the use of various immune cells and they'll go through 10 x sequencing and then they are then mixed into these different data sets for a tip to be used as a benchmarking

contains four of these types. I mean, this is kind of like the easiest one where most methods do free wellness and then mix for takes for cell types in equal proportions, in Balance groups and then Music. Mix 82, takes all eight of these in approximately equal proportions. However, it is also more challenging than the first one, because there are many more classes. So, if everyone wants to go to the the PCA example, scrna-seq, and yet, you can access that on the package down website, or

simply to do screen share. So there briefly, it walks through performing, PCA on the dataset. I just talked about So essentially, this is a package loads and then this is. So now we are created with this function. Are we loaded in from from the package, with this essay. Makes for a q into a single cell experiment object. We can use these commands to pull out the account and log count, matrices respectively. And then taking a quick look at the Caldera that are that's available

on December things that are I described for each cell, we can see that. See no ID contains the cell types of use for potting in a minute. So can't do. It seems like there and also contains the rotate on which has annotations on each of the genes that are measured. So as I mentioned it has, so these are the cells that appear in the state of sentences of the quantities of each type and is he a? So there's like, actually other packages that do much better approaches for selection. But for the purpose of this one, yet we wanted to keep it simple and and try to not look too many things. So we

just did a very simple sorting by there. And selecting mean variance to see if the day. Meaning got for the jeans that have higher values, that they also tend to have higher variance and that the various structures correlated with the meanest. But for the purpose of this, we will do a simple selection process. Okay, and then we can them run. Berkot's on both the account and the lockout mattresses and plot the results and looking at the results from doing. PCA on the Rockhouse. We can see this artifact that he talked about adultery very clearly, we're all the points are

lying on the left side of the origin and then the crates listen to distinctive horseshoe shape this happens because in the account data, if you look at the range of the values, it is. The scales are really low are really different. And so, some values will be huge and somebody's very small. So when we need that has been used, that is Not necessarily appropriate, are there are some recent papers we can point to those, after the chalk that suggests a locksmith formation is not appropriate for singles all day. However, if if you do do log

transformation on the counter, if you do, mpca on the log house, then you find that the artifact kind of his results, a little bit to separate the questors although although they're still lying on one side of the origin which is Ideal there. Okay? So there are a few interactive examples that could work through. It's pretty self-explanatory. Like what needs to be done for each of those and we only have a few minutes left. So we also wanted to ask if folks would prefer to kind of talk through these examples and have time to look at them. Or if you would

prefer to hear more about Matrix, factorization methods and actually we only have about 5 minutes. I don't think we'll have time to actually do them, but I'll just explain what is in this vignette. If people want to, if everyone wants to know if anyone's interested in like looking at it after so, in the first one, we're going to look at exploring be on the first PC. I'm just using us to do this in explore as a team. Showed that has a nice interact with your Facebook. If you want to just change this one variable in this code and then it will

also enable you to explore the Seas. You do that for both of them? If you're interested and then again we can repeat the thing that everyone just work through of Performing PCA without her come, there's a bid on July the infighting. So how to edit call you map and are not going to go through that now. And then one thing that we wanted to highlight is thinking about other ways to speed up PC a because the a single cell data sets are very large. And if you run for comp on a on on any actual data,

it will take quite a while. So way to speed this up, is by using fast SVD, approximation and the limitations were here is with robot. So I'll grab them to compute, compute the top 10, top number of PCS, rather than performing, a full decomposition, and it's approximation is to be much faster. So, there's a bit here, kind of showing equivalence between Demonstrating speed. The difference here is not so Market, but if you, if you're reading this on your laptop, it will, you will definitely

notice the difference between them, a scaling pretty quickly. Even if the matrices are not super large that they're giving us the same results, even though one is like substantially faster than the other one. And then okay and then another interactive activities. And here is how to implement. So using what we did before, and using the fact that we can use for all that as a substitute for the SPD to perform SVT more quickly, how to, basically to write the command and in the four.

Performing, PCA with the scale and the Earl de functions. And then there's a brief introduction to correspondence analysis, and the Corral package, it's implemented in other places. However, those are going to be tricky to run on a single cell trait such as they're not using other. You think, I like the slower SPD called rather than and There's another interactive example here, it's anyone's interested in exploring more data sets in this package. You can do that. And you can also, if

you don't want to do the kind of, if you don't have to, if you don't want to bother with Gene selection, you can also do such a pre-filter jeans from this package or data sets that are pre filter for for that. You think that So we're almost at time maybe if we have any questions. Sorry, I'm just answering. Some of they are your ones in the chat window. So she mentioned that a sweetie or some other day, right? Technique could be used for bad connection and I want you to know that you was up

degrees of freedom for the downstream steps, unbeknownst to watch us. And wouldn't it be better to include variables in the model explicitly? Because I'm sure that in the background. These these kind of techniques that I still need a reduction techniques. Are there sweetie or what-have-you? They use of degrees of freedom? To actually remove the the boche fake, they would give you an. They would give you a deal on used to good. Okay, I wasn't going to mention it, but is it

really fun? It is actually where's the one that I'll actually hitting? Okay, I'm on high this life so with the SPD does two, things are kind of mentioning their. So one is just a complication of the a full rank decomposition that takes up a lot of memory and you need to calculate the full rank. If you want something like it was on an approximation and it doesn't talk like the full ride to the matrix by the full right means the principal components. It will just caught me to subset of a components and that actually makes it faster. Results are other ways where you

can just calculate the first component using a linear regression Arnie positive got missing Teresa. It's Speedy Johnson's actually generate any model it is just decomposing the variance of the data and trying to find projections to the days of maximize the variance. And so is PCA and PCH, considers his transformation, The Matrix before you do it. And there's actually a school has a whole, massive and Leo Ramen breiman in 2001 wrote. This school of thought of two, you have to fit a model of the deja before you

actually do the do the composition or do, you actually just do an algorithm on the taser. And so the two very very very contrasting statistical School of thoughts on how you actually do this. So this laptop that you don't have to necessarily fit the model before you go. Which is an interesting article with, like, it had tons and tons of responses to it by all the statistical leaders in the field. And I think, you know, it might be something that we need to consider. It is what is the most appropriate approach?

Купить этот доклад

Доступ к видеозаписи доклада «Aedin Culhane, Workshop 200: An introduction to matrix factorization & principal component analysis»
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно

Ticket

Доступ к записям всех докладов «BioC2020»
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Билет

Интересуетесь тематикой «Наука и исследования»?

Возможно, вас заинтересуют видеозаписи с этого мероприятия

27-31 июля 2020
Онлайн
45
19,14 K
bioc2020, bioconductor , dna methylation, epidemiology, functional enrichment, human rna, probabilistic gene, public data resources, visualizations

Похожие доклады

Daniel Bunis
Biological Data Scientist and Immunologist в University of California, San Francisco
+ 2 докладчика
Will Townes
Postdoctoral Researcher в Princeton University
+ 2 докладчика
Koen Van den Berge
Postdoctoral Researcher в University of California
+ 2 докладчика
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Charlotte Soneson
Research Associate в Friedrich Miescher Institute for Biomedical Research
+ 3 докладчика
Davide Risso
Assistant Professor в Università degli Studi di Padova
+ 3 докладчика
Anthony Sonrel
Doctoral Researcher in Statistical Bioinformatics в Universität Zürich
+ 3 докладчика
Stephanie Hicks
Assistant Professor в Johns Hopkins Bloomberg School of Public Health
+ 3 докладчика
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Peter Hickey
Senior Research Officer в The Walter and Eliza Hall Institute of Medical Research
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно

Купить это видео

Видеозапись
Доступ к видеозаписи доклада «Aedin Culhane, Workshop 200: An introduction to matrix factorization & principal component analysis»
Доступно
В корзине
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно
Бесплатно

Conference Cast

ConferenceCast.tv — архив видеозаписей докладов и конференций.
С этим сервисом вы можете найти интересные лекции специально для вас!

Conference Cast
1497 конференций
47700 докладчиков
20185 часов контента