200: MultiAssayExperiment and curatedTCGAData Bioconductor 2020 workshop
Marcel Ramos (Roswell Park Comprehensive Cancer Center)
11:00 AM - 11:55 AM EDT on Wednesday, 29 July
This workshop demonstrates the leveraging of public multi-omics databases, such as cBioPortal and The Cancer Genome Atlas (TCGA), through the use of the `cBioPortalData` and `curatedTCGAData` experiment data packages. It provides users with the basics of data management, using the `MultiAssayExperiment` data class and the `TCGAutils` utility package, and example analyses of multiple assays associated with a single set of biological specimens. In addition to providing a basic overview of key data classes, such as `MultiAssayExperiment` and `RaggedExperiment`, this workshop intends to provide an overview of `cBioPortalData` and `curatedTCGAData` experiment data packages and `TCGAutils` functionality aimed at enhancing the ease-of-use of TCGA data.
Moderator: Simone Bell
Strong research professional with a Master of Public Health focused in Biostatistics from the CUNY School of Public Health, Hunter College. Experienced #rstats #Bioconductor user and developer with a demonstrated history of curiosity. Loves to mold toughПерейти в профиль
Within seafile portal and it has two different mechanisms for providing that dataset. We are still working on it to make it the most user-friendly package as possible by sort of avoiding users to use the API directly and provide help her functions for the user to sell that queries are much easier. So if you're interested in exploring c-54, do data downloading it, you can check out that package I also have Curative, tcj data, is a project that I worked on right after multi-asset experiment. That coordinates pcj data with the, with the
help of the water and lab, and the curation of subtypes. We've added that those subtype information to the package and We recognize that many tools exist for downloading tcj data but curity. CJ data. Provides an integrative representation using mulch experimenting and it's easier to analyze your data when all of it is coordinated and you can create a subset of your data quite easily. So so that is an experiment package. The user doesn't, really no need to know all the details about how that
works but Curative tcj did it makes it really easy to download a dataset from tcgi and we provide about 33 different cancer types and it constructs your multi-asset experiment on the Fly. so, I have a couple of reference vignettes here in so if you were browsing on the Rendered website, you will see that there are a list of available studies in the in the website. You can also you might be able to click on this link as well here and open up the van yet corresponding to that.
So, I have a couple of tables at sort of give you some description of what's included in the package. And thanks to the date for providing a descriptive table of these types in in curated mtcj data. And the other package will touch upon is the TCG a utils package and it allows for easier handling of samples and metadata and separation of samples sample types and provide some useful of operations. For I'm working with the barcodes reshaping the TCG, a data and things of that nature.
So, it provides sort of help her functions in three major areas. I'm so it can serve convert, the Rowenta tations to genomic, ranges, and identify and separate samples. So, if you had wanted to separate tumors versus normal as you could with the helper function within a multi-asset experiment and we We working with identifiers can be a bit of a hassle or so. We provided some a number of functions to translate some tcga barcodes and interpret those, I'm through, tcj utils. So how does all of this
all sort of fit together? All of these packages are dismal, schematic of how that works more or less. So you can see that tcj is the major repository and then we have our pipeline that the Broward fire hose pipe line that goes through the teesta rtcg a toolbox package which I have recently taken up the maintainer ship of the package. And then I have the multi-asset, remember that she should get a pipeline which takes a date out, cleaned it up a bit and generates multi-asset, Experiment objects from that that are served through the Curative. Tcj data
package, let me have a package that provides interface to go to that data. So I'll go into a brief summary of the date of classes that are important and I am assuming that we are putting you to buy a conductor's. So I will go over some Priestly. Just what these data classes are about the sunrise. Experiment is a a data class for representing expression data. And I'm wanting to know, is that the expression data across the ice ages, has to be uniform of this,
or in other words, in defense of the same Dimension, and you could optionally provide range data in in the rows and create a range Sunrise, experiment, and it has sample. Annotations was it called Data? Object here and component of the day of the class and it allows for metadata as well. The other thing that, so, it's, it's a major player in bioconductor the Samurais experiment, and you can see that many other classes buildup on it or a build on it, for example, of single cell experiment class, inherits from Sunrise experiment and it sends it. So
it's it's a, it's a very important class if you're working in by a conductor to get familiar with have ragged experiment is some saying is is started out with the need to represent genomic ranges data as a matrix format and when we were working on multi-asset experiment designed a class that tried to do that but we formalize it in Riga to experiment with the help of Martin Morgan to Create the Ragged experiment class which as you can see, he has a g Rangers list in the ass say and and
does some magic to represent 30 ranges list as a, a square or rectangular Matrix. So the schematic gives you an overview of the when you have ragged ranges, or Turn on matching sets of ranges across the samples. You would use a ragged experiments. If you had matching ranges, you would be better off using a ranged summarize experiment. So in those cases where you don't defrag experiment classes recommended and you have some operations to reshape the representation of the Matrix
and you can do a simple one by sacking or key Reduce by using a window region of Interest window and subsiding your data that way or matching the ranges based on their positions across all the samples Then I get to the integrative container experiment. And here we have a three components of the more, she is a experiment class. One is the experiment list, which again coordinates. Those different data sets with different dimensions and even different observations per patient. As you can see what these two green
lines. So the experiment let's put that all together as a S4 type of list and the call data has the phenotypic information embedded as part of the object. So you can subset based on some kind of feature in in the patient's if you were using patience. Easily, you can easily stop set with the call data structure here and a sample map structure. Basically coordinates all of those, the patients and and and measurements and the names in the in The Columns of the
experiments. So it's sort of a graphic representation of all of the data that we have in the object. So you can have ID based assays or experiments here. Or you can have range base, what does support things like ragged range? Summarize experiment or ragged experiment as well and and so that's an option. So you can subset buy a ring. If if you had some range objects in the multi-asset experiment and if you want to look at what classes are supported, you can click on this little triangle to see essentially the requirements for
Sort of. The class requirements for a multi for inclusion. In a multi-asset experiment are pretty simple. You you need to have a sort of subsiding operation of brackets, upsetting operation. And some of them names so that can support many number of classes here like Matrix is summarize, experiment, Etc. So you can have a look about this like And then lastly be matched. Assay experiment class. Here is an extension of multi-asset experiment where all the samples and patients are coordinated. So they have one sample in
each assay and you can course to match to a safe environment from a multi-asset experiment, using these two functions. so that's the overview and We'll get it into the interactive part of the workshop in a few seconds. so, If you had time to look at the poles on the path of a website, there is a question there that you should be able to answer. Now, I'll go over the poles at the end and and we'll I'll clarify anything that needs to be clarified for. Now, we can start with our
interactive Portion by building a multi-asset experiment from scratch. First will look at our mini ACC demo. This is a data set from tcga. Condensed into a few essays with the observation and it's part of the multi-asset experiments that you can load that I doing I'm data. How many ACC And then invoking, the many ACC object by highlighting and running that. I do have a shiny demo that I want everyone to take about 10 minutes to work through. You can click on here but it may be.
So this is posted on shiny apps that I owe, but I think you may be easier to host it locally and faster. Probably. So what we're going to do is go to our our files here. So we click on the files tab on the right hand side and then go to in the installer inst tutorials. And then we're going to click into the exercise multi-acid. Rmd file so that will open up an rmd file and we're going to click on run document at the top and this is a learner are shiny tutorial. Once we click that should be
pretty quick to give you a pop up. Make sure that your pop-up blocker is disabled for that. So we can do that on my end. You're so if you're on Chrome on the top right, you'll see that pop up with block with a little X on there, to click on that and allow always and then try again. All right. so I prepared this shiny demo for instruction and we can go through this ourselves in this tutorial, will learn how to extract components of the multi hace experiment, data structure, and
Use the Constructor function to create a multi-asset experiment from the individual pieces. So we're going to use our many ACC example to extract to both extract and then to reconstruct a multi-asset experiments and What we have here is our schematic that we saw earlier to study this for a little bit and see what functions. We need to extract those component pieces of the multi-asset experiment, and it was at the bottom, in the black texture, call data sample map garments. So, if you scroll down a little bit and click on continue, you'll be able to
see the instructions here. So we're going to use the experiments function to take the experiment data out of many ACC. so, what type of experiments? And anything about our studio and and the shiny is that you have autocomplete available. So you can see what functions are in there and we want to type many ACC the object And runs in the coat. So if you have a son line to I have the print out a multi-asset experiment that's a little bit distracting so I will comment that out with
a hashtag and rerun the code here. Make this a little bit bigger. And when you run experiments on Mini ACC, you'll see that. We have an experiment in this class object with five experiments in there and you can see that the columns and the rose don't necessarily match across all the experiments and that's one of the strengths of multi-asset experiments that you can have a size that don't exactly match, even buy it by samples or patients. So, So that's our experiment list,
show, message until you click on continue. I'm it gives you a bit more of a note here to list class. So now we want to do is get the phenotypic information of the multi-asset experience. I'm going to, just out again and do call data many ACC. And then run the code. So you can see what's inside a small real world. I'm data set in what's involved in more columns, you can get from a Juicy, J type of data set Okay, so you get a check mark and then we both continued. So now we want to extract the column from call data. So we we didn't
have this in our schematic called Ada is a dataframe. So we could use some call data many ACC and then get a, a Use the dollar sign and then the name of a column. So, for example, I can go back up and look at all of my columns. Say, So actually we're extracting the race and variable so we can type dollar sign race and run the code. So that's how you were distracted. But multitask experiment also has a shortcut that you can do many ACC dollar sign and you can see that the all the columns in the call data are populated here in this helpful tool tip. So
you can click on race and that's another way to extract data from the call data. All right, so now we look at our asses, I'm representation that's the experiment list, but the essays is a function available in summarize experiment, which gives you all of the a matrix representation as a, in a bunch of matrices in a list as output. So you can use that to to reduce your data as matrices and use the assays function to extract that as, as much as possible, as, as matrices. So,
So you can, so we can do a class on essays of the instructions. Say And then, run our code. it's a simple list, containing a number of So we'll try a supply that we want to see what classes are in that list and we can do essays in the ACC. Class. Just explore a little bit of what's inside there. And you'll see that these are all have been reduced to matrices if they weren't already matrices. So a size is a good way if you want to extract your data and and maybe run up a number of correlation or a correlation across all of your asses, you can use the Matrix
representation. and if you're interested in looking at the sample map, within the Call Mattias. Experiment, did you sample map? Did the same. Make the extract their function is at the same name as the component here. So you can do sample map in the ACC and have a look at the sample Maps. What's the date of frame with the number of column names that are fixed as a primary and calling? And they refer to the names in the of the experiments here. And the, in this case, where have we have pcj data. So we have the
barcode, The Fortune Spin ID barcode, and the sample barcode. So now we test our knowledge, what function do you use to extract experiments from many ACC? So this is a little bit tricky because all of the other functions use the same name, but for the experiment list, we have a slightly different name. Your maybe your first answer or reaction is to say experiment list but actually it's experiments. So that's how we're you would extract our our experiment list from a multi-asset
experiments with the experiments function. An experiment list is a function but it's the Constructor function for an experiment that switch. You don't have to use. If you're using our packages such as curetted tcj they are CPAP for do data. That's a common mistake. So now we have the multi ice experiments Constructor function. So we have it has three main arguments on the experiment, the call data and the sample map. So now that we've extracted these three pieces from our example of how do we construct a
multi-asset experiments, essentially multi assay experiment? And we add a, after tease. so, Experiments is are some of the output of experiments in a cc is our input to the government's argument here. And then are called Data is called Eight up in the ACC. And samples map. so, Museum of science experiment, Constructor function with these three component pieces. So if you were reading in your data, you create something that's a list type of object from your data and put that in the experiment Constructor and then you would read in a Fatal
Frame type of object, or your call data and the sample map you can generate on your own or you could have the use of a helper function, which we will look at in a in a second. I'm to generate that sample. Also if if your experiments and your call. If you're patient IDs match with the column ideas in the experiments that you can, you can forgo the sample map input and the Constructor function will recognize that there they match and they will be mapped. I will generate a sample map for you. So, So essentially that's how we construct mod
has a better man and you get a printout of that and then you pat yourself on the back cuz we have our first loyalty is a Skyrim. All right. That is our interactive. I need tutorial for building a multi-asset experiment. Now, I'll go back to our Arvin yet here? No. Clear this on the left. so, We have we provide a cheat sheet for all the functions that you can use with multi acid experiment. Here we want to have a cheat sheet, that's more in the flavor of our studio, but we haven't gotten to that. But you have a list of all the access their functions that we looked at today
call Dave sirmans. I say, I say, squirrel in as a singular, which do slightly different things, sample map metadata, which is representative in Breaux names and call names. So I didn't go over ronin's and call names, but if you do real names in the ACC, you'll see what that gives you. It's a character list representation of your IDs. It's in the in the experiments. And then we have a group of functions that will get into a bit. If we if time permits for subsiding a multi, I say experiment but you can see an overview of we did touch upon the dollar sign
extractor function to get a call later column from Hard Knocks last experiment. Okay, so we had, we went through the young construction and part of the of the multi-asset experiment want to, but your direct your attention to the Sea function for concatenate in in what city is a experiment. So, for example, if you already have a multi-asset experiment from curated tcga data, and you wanted to say at another experiment to your data, you could buy using that c function
using many ACC and say maybe the log transformation of of one of the data sets in there. And you could provide this map from argument, which tells the package to take, take the annotations, or or the sample map structure from the first essay in in mini ACC and map it using those. So that it would handle the sample map. The sample map. I'm rearrangement on its own so you don't have to and it gives you a warning you cuz it makes assumptions that the column order in both essays are matching
Okay, so, we went over the call data and the experiment list and how to extract those. I'm going to skip over, that is running a little bit short on time. I wanted to introduce some examples using curated tcga data and you can see our interfaces. If the package is pretty minimal, use the curated tcj data function and the PTC GA of cancer on ID code here to get a list of all of the data sets available in that. And and you can get more information. If you refer to the tables that I mentioned earlier
about what these are. What is annotation mean in tcga here to go to that table? Don't let me quickly show you that table. So, I have rendered version of the website here. And you can go into the reference tab up here on the right. And click on curated TCG, a data reference and you can see that here's a list of all of the available data sets that we have in Curative tcga data, select one and download the dataset from experiment Hub with that code. And then we have the
descriptive there. A small description of what each of these data sets contain. So that's a curated TCG. A data data, we have two main functions that come into play here. See bio data pack which allows you to download the the sort of package data. Sets from Cibolo, Bordeaux Aztar, Jeezy files, and represent those in multi-asset. As a multi-asset experiment, I must know that about we have about 70% of those, all of the studies in Ceiba Puerto successfully imported. So for those remaining 30, you might have to download them
and clean them up a little bit. Are you can you open up an issue on c-54? Do data, the gift shop page and and have us. Look at that. The data comes from the sea, barge in IMAX Portage and they have curation efforts on their end. And then also a repository that I can point you to you later. If you're interested of where the data lives and how it gets curated and fixed for any issues that may be related to the importing as a a multi-asset experiment. So where when you're working with data, it's always an issue.
I'm as many of you know that they doesn't ever a clean unless it's an example of a dataset it's it's never clean. So there are some curation issues that I'm prevent us from fully supporting all 100. Percent of the of the data sets provided by c504 do. But the other thing you can do is to download the data set and have that as a tar.gz file in your, in your, on your computers. So, the other function is the symbol for the data function, which allows you to obtain data from the
API. And this is more of a, I guess Advanced way of getting the data, but we make it simple by Providing a number of parameters and if you provide a number of parameters, you, if you're able to get the data, I'm quite easily. So out, if I have some time, I'll show you how to do that here. Lil Boosie bio data pack function. You can explore what studies are available by doing data studies table at. You can see that these are working with cancer study IDs. So if you're interested in say, for example,
if you're at Carrie's talk, might be interested in the breast cancer dataset. So I will try to do some live coding to show you how you do that. So first we would do library. See, bio for data. Get our studies table. And then, we're going to grab or brca. the code, and In our studies table, there is a column that list all of the cancer study IDs so. So we're just actually looking for any studies that have that brca tagged in them. So, we can see that the Carrie study is the brca embassy project
waggle, 2017. So we can, we can download The study using this download study function, currently this data sets, not working, as a, it's not working as a, we can't represent it as an experiment. Currently there are some issues with the data but that's where we can step in and and have a look at what's what's preventing the day of set from being loaded as a multi-asset experiment? But in the meantime you can download it using this. Download study function and the ID. So It'll ask you if you want to cash. Yes. And then it will
download it pretty quickly. We could send the cloud and I think this is on a w. S sew. Download super fast. So you have that tarble available to you. Now, you can go to that location you can do like on tar. Spell it right. Antara, and then list files. List equals true. And it's a little bit noisy but it will show you all the stuff that's inside that I'm terrible. So so what, what beta Pac is trying to do is to sort of automatically figure out what is data and what's a metadata and load that as a
dataset and represented, as a multi-asset experiments is doing a lot of stuff in the background and sometimes it doesn't all work. So So that's what's available and there was at least you have that. They decided to play around with an Explorer available. For those that do work, you can see what what it looks like. It's as simple as entering the ID and the Sea by a datapack function and you can see what kind of data sets are in the most summarize experiment data sets and some
ragged experiment data Proceed bioportal data, it's a slightly different workflow. We first create a c, bioportal instance object here with this about, for the function and that gives us a representation of the API using some, some software in the background to represent the API with this, with the help of Martin, and the envelope package. And, and the number of other packages we depend on So you can use this function. Get studies to do a similar operation is as 4C bio data for the datapack function and you'll get a list of studies at the temple
that you can sort through. The important column is the study ID column, which will give you What study? I didn't put in the sea by a for do data function. And then when we run this function, currently we use the sea bio object that we used to query the API, and then music study ID argument with all the relevant study ID from the stable, and then a gene panel from the number of Gene panels available. So you can do this catching panel function and oh, Sorry, it's Jean panels. so, if you go to Gene panels, and then enter c bio,
Which is our API object will see that their number of Gene panels available for use for as part of this function. So you can currently it only supports Jean panels. But we are expanding that to include a list of jeans that are of interest to you as a researcher currently have the only about 3647 Jean panel supported or deceive our part of data function in the background and return. The multi ice experiment to use as a sunrise or range, summarize experiment with those copy number. Alterations in imitation data represented,
All right, and lastly quickly, go over to watch TCG, a utils provides a number of functions to again work with the rose. So it only works for tcga data was built specifically for a TCG, a data and it has a micro Call Renee to ranges so so you can convert those two ranges, and it looks up at the symbols and converts them to ranges as well. And then, if you have ragged experiments, as we had in our previous example, if you have a ragged experiment in a multi-asset will convert those to arrange summarize experiment with an appropriate.
Sort of window for you to summarize over. Okay, so basketball those examples in into those. So their number of help her functions in tcj. You chose to sort of see what samples are available in your TCG. A data you can do sample tables and see a printout for all, all of the experiments in that TCG. A dataset 01, answer tumors. And we also have a sample type stable to give you a healthy, make sense of that, I'll put here. So you have zero one as a primary solid tumor. We have 11 and 10 which are
normals down below. And a split assays, sort of, a recent development in based on my request from researchers, were analyzing the state of Iowa and myself. A couple of minutes on the left is pretty neat in that. It allows you to separate tumors and normals and it will return a multi-asset experiment with two different acids in in that based on their apple annotation. So so, I wanted to highlight those and show you that there are, there is functionality for converting, bar codes to Universal IDs excetera.
And the other thing I didn't get to I was talking to me long as I think is the subsiding mechanisms in multi-asset experiment which that you saw it on the road, The Columns or the assays based on the position here. So i j a r k and Amor. And I think I'll leave it at that. And have you explored on your own when you have time? And I'll take any questions this time permits. Is the r code for a shiny app? Accessible. Yes. It's if you the first thing in the shiny app is the actual source
code. So let me go back to this. You can see the source code here at our Waldron lab, multi-asset Workshop. Link here. Yeah, so the source code is disarmed. E-file. Other questions. And if you are interested, you can quiz yourself with these poles in our house Are passable website? Thank you everyone for attending, and I like to acknowledge Levi Waldron Martin Morgan of the Geisinger for their amazing support. Thank you Marcel. It's a great tutorial. It's like come so far. One of the last couple years of giving it it's so nice and simple. I think you also created a bunch of new fans
of of Lerner has one hour format, really had to focus on. What's really important. Is, there was another question. Could you provide the link to the PowerPoint on Tiny? Not sure yet it's tinyurl.com, flights. Were not visible to the world that her name will see you all on the poster section session. Next, I'm looking forward to seeing how that goes.
Купить этот доклад
Купить это видео
ConferenceCast.tv — архив видеозаписей докладов и конференций.
С этим сервисом вы можете найти интересные лекции специально для вас!