Contributed Talks 5
Shian Su (The Walter and Eliza Hall Institute) PhD Student
Hena R Ramay (International Microbiome Centre, University of Calgary) Consultant Bioinformatician
Jayaram Kancherla *University of Maryland, College Park)
Emma Jablonski (ETH Zürich and EPFL (Swiss Data Science Center)) Data Science Engineer
10:00 AM - 10:55 AM EDT on Friday, 31 July
Reproducible workflows with the RENKU platform
Analysing DNA methylation using nanopore long read sequencing
FemMicro16S: Advancing the science of women’s health through open data sharing
Quickly compose custom interactive genomic visualization apps in R/Bioc with epiviz components
Moderator: Matthew McCall, Charlotte Soneson, Simone Bell
Welcome to the fifth contributor, talk session. We have four wonderful speakers today. As you have questions, please post them to the possible pole and for you to know which speakers to your question is for, we're going to have for 10 minutes, Ox followed by a wrestler, you 15 minutes to end a session where we'll try to get through as many questions as possible. So the first talk, today is from Emma jablonski platform for computational reproducibility and collaboration. A quick note about the Swiss data science center is a joint venture between epfl in lausanne an
equation Zurich since 2017 and its mission is to accelerate, the adoption of data science in Academia and Industry. so, back to competition reproducibility Minions Banana following situation out of conference. And soon after someone sends you an email, please help. ABC's. Great. Someone wants to use your work and would be nice. If you could spend some time helping his potential crack collaborator, run your code better. If it just ran out, just ran for other interested parties as well, liked your current collaborators students business partners decision-makers and of course
you helped everyone trusts results and make it easier for them to build upon what you've already done. Go to run your code after all. But there's more to running code than just the code itself. For instance, did he provide the data that you used as input? Was it pre-processed? Is it shareable Did you provide the execution environment for infants, the machine specs? And the complete list of installed software where you actually to the code? Is it shareable Did you express how the input data executed code and outputs are related in a workflow? Did you record any manual steps?
Luckily lots of people have run into the same problem and written software to help energies thing. That's great on paper, but This is what I called you last day of as of 2019 for all the infrastructure to manage your code data environments. And workflows for computational paper. Disability someone chose for you the open source, community service, and best practice tools, and glue them all together for you to use in the cloud. That's where I grew up on your project which consists of your code data and some customizable templates files. We
provide is stored in it, connected get lab you can launch cloud-based interactive development environment in our studio. And when you push your changes of continuous, integration bills and stores a new Docker image. So that the next time you launch the changes you made to the environment configuration Francis in the docker file or requirements or install that are come pre-installed. And as for work, so we'll capture rain, clouds. Templates also come with an installation of the Raku command line, interface for
rerun and update results as you're developing the code without having to learn a specific word for language or write your own workflow files. When you push for changes that have resulted from Raikou, run a visualization of the lineage is available on Roku. What do you want to track added data to your project or share results, and data stores, like the note, o and a diverse? So, let's see in action. Okay, so this is right to allow that. I owe our public and sense of the regular
platform you can log in or sign up with GitHub or any of the other options. And once were in were taken to her home page where we might not have any projects yet. If you're here the first time you go to do tutorial but we're going to do right now is create a new project. So you can do that by clicking the plus button, click a new project then we want to fill out some information about our project so bio-analysis conductor package, that's going to live in our
own name space and we're going to choose the template that will give us some files. At Define are reproducible environment that we can learn to love. The bioconductor 311 template will let us build and launch a container with Ragu dependencies and bow connector packages from version 311 and do 10 selects its visibility. Whether or not Outsiders can accept the project 3/5 to be logged in or if only people belonging to the project can. So once we create this project is actually a repository sitting inside gitlab but you can view and get live. If you wanted to click this button to
open up a new tab. Other people can also visit this page if it was if you Market at the public project and they can for Kit, which then makes a copy for themselves, they can Lush environment. So we're going to directly launch a new interactive environment by clicking on these buttons. And what to do this, the doctor image will build it. So the doctor images building, they found the files that are inside the projects under the sea. I file this one inside the projects in. This is telling you to get lab to build based on this doctor file. Here that's also in the project and
One side still unavailable. We can choose, we can launch the environment. These are some default configuration. And then, we can start the environment. And it might take a little bit of time to Launch Once you see that the little green arrow has popped up. You can connect to your notebook which has been running since this time and this stops us into our, our studio attractive environment that has been built by the docker file in our project. So let's give a closer look at that doctor found really quickly.
So if you can see that this is coming from this ranku parrot image at a specific release. And if you want to take a closer look at what's inside, you can go to this link and then besides that all the stuff is customizable. So you can change which packages are installed into the 120s, you can change, what our packages are installed. I will, when you make these changes restart a large with your packages, already installed. If you want to also do the same thing for python, you can do it as well, but this is a regular doctor image and this
will build and anything you put in here. Geckos interior image. But in order to put it into your image in order to make sure everything gets billed to me to get add get commit and get pushed any changes you made back to. Thank you love And once you push, you can visit your other tab on my cool app and see that your changes have been saved. And in the background, what's happening is to get lab is building your new image that you started a new environment. You could fill the latest commit suicide about the environment and code. And now we can look at how to get that
aside. So there's two different ways you can have it. I said you can create one by uploading files from your machine or you can import a data set that's already been created and also stored on in a record project or in the NoDa or the data birth individuals files to your project without them being at that aside for the advantage of that is that is that you can add extra metadata about where the data has come from. And a description of what's inside when you see how close you can get a high-level description of higher output files have been generated. So, now we're in
this other project had already has a data set, as we can see why payment card data set. Which is just as ipsius do with some flight data. And in this particular workflow, we're just going to extract some flights run a simple summary analysis, and then make a plot. And so this is already happened in this, in this work flow, we said we got this plot that's kind of all gray and we're going to try to make a small change by adding color. And once we make a change, really have to commit the changes because Rancor realize I get versioning and so we can make a commit with that. And then all we have
to do next is to call Rinku update on the output file that we want to regenerate. Because now leaving generated with the change code, as we can see, it's re-running, the workflow and then we get our output with different colors. For extra thing to mention is that these work clothes can flow across project. So you can imagine from the output of your workflow creating a data set that, another project can use as the input for their workflow packages. You can also develop your own package using ranku. Here's an example of one and has all of the template files for making
an AR package and pacifically, you can ask questions that our discourse. You can talk to us and get her. You can go through the getting started tutorials that are documentation and you can submit Thug and future request in our Repository. So, so thank you for listening and just to remind you rank was under very, very interesting. For those of you that I joined a little late, just a reminder to send the questions using the poll feature and palpable and no trich speaker. Your question.
Our next talk is from Sean soon. Thank you. My name is Sean suit and a PhD student in the history of medical research. And today, I like to talk to you about interfering DNA methylation using data and the software than developing felt it. The DNA methylation is epigenetic regulator FMC methylation which is addition of a methyl group to the side of the universe. APG my teeth methylation from reservations Lisa suppression of June expressions and it's vital role in genomic, imprinting examination and the suppression of it repeat elements. The technology that we're using
to study DNA methylation is Oxford nanopore sequencing, a DNA molecules through the menopause. and in an apple sensors detect disruptions a current, which produces a signal, That's pretty bad in algorithms that signal can be used to infer the basis that pass through that hole. And also it's been discovered that the signal and take the fish information to also in the methylation states off their spaces, passing through the 9 pool. The announcer pipeline that
I can use to analyze. My experiments is shown here. So I'm using topeka-based calling when you met Sioux Falls alignment not even calling WhatsApp funny. Cycling differential methylation and slice of region detection. Using PS speak and I visualized using a concurrent developing covid-19 news. Green Arrow skating at in Decatur Thursday possibly require quite a bit of manual data wrangling in order to get their software has to work together. Whereas, from cooling down to methylation, it's
quite straightforward process and all the outputs from Southwest feet into the next software quite nicely. So now I'm at 3 to turn the oven up on top. If you want to have a look at it and I'm dead something, it's available and some example data included in the package. So in addition to providing visualizations on automatic is also provide the benefit of angling to help to help you through this pipeline, that I've used to analyze my data. And it's also provides a big space data format for working with lots of the memory data, which is necessary because the data has quite
large. When you look at your place, how many station in Nanny McPhee is, is the spaghetti plot and I could buy two different devices to put in the message. And I sent the first in the phases of punching in which using the name Virginia on plot subplot around our area region, which then allows you to specify just an ABBA trades, Namek region that you would like to plot. Camp waziyatah is a part of the package 3-g. So there's a few elements of the spot that I'll talk you through. The first, are they spend the two lines that spaghetti
sauce, spaghetti, and individually. They are the probabilities of individual long braids spaghettis represented one long braid in this region. And then the fifth lines through the middle of the aggregate Trends sites offer average of all of the schools in the areas and that gives you a better overall. Look at what the trends are. Then two of the Ryukin is gray shadow box which is a method for and just annotating the region test plot and in this case is an acceleration represents a region that is not a bit differently methylated by PSC.
Then below that behalf of the ice for meditation along with the Excellence for the Pick 3 in. So we can say that in the spot where the speculation deviates, for the tooth permit Oakridge, in this case, that you have a x that's happening near the transcription start side of the patrons, and multiple eyes. It's currently implemented. So early today to request for plotting his, paw into memory, the day that I have is 10:00 too, big of a space for a laptop,
to pull off a memory for playing. And the state of comes from this pilot project, I'm off F1 process between a female plastic mouse and a male car smells like a stainless Mouse. and we sequenced Street samples off of female Medical Center, wonder which of these mice, and we've done that using want to meet and play some And my shopping day to work at drai's parent strains, it allows us to very effectively haplotype each of our aides, which enables a lot of the next. One else has to
be done and also I buy using states, which gives us a bit of ground truth. When we are looking at besides have Noble analysis so you ain't got to polish the methylation Kohl's result in the stand out in the Vatican by title which obviously I can load into memory. So I've considered that into a 10.2 gigabytes and the topics for my allows me to come and pull out and trees with in economic region without loading, all the data and also have been compressed. It makes it a lot easier to move around and Cher.
So the man coverage of this data is around, 46 x quite evenly spread across the Gina and that equates to around 70% of the time which is quite decent level of coverage for starting methylation. And look at the MDS, part of the log methylation ratio shows that I have to have five separate quite nicely to mention. It's a separation of the messaging profiles of appetite. And then in a second dimension, it's some sort of separation of the samples from before we split them by appetites are
sample. One is going to Temple Street in the middle and amplitude on the bottom. And you didn't stay. We can have a look at the reflection as relates to x-inactivation. And initially, I was a bit surprised by this result because it's showing that cost, and finally, like a green color. Indicating that it started last month later across the chromosome and the inactivated extra be suppressed. And in my mind and I'm Malaysia has a suppressive effects. Are the suppressed chromosomes should really have a high level of methylation,
Looking Through the Glass by Webb s hole. And what's a soul? What's that on the inactive X? What's the whole president of the United States in the last month later? But if you focus in on, just as a Progressive Agent, that's when you say the inactive, explain more methylated, And when I put that for my own data, as seen in this box, that is indeed the case for my dad as well. And I've also planted this aggregate of all the chains across the X chromosome and I'm kind of scale to relative positions and you do see it's it's a
bit more subtle than in the box. But you do say that in near the transcription stop side, the cost turn all inactive ex is just a funny moments later across the rest of it is obviously much less methylated. Simeon and find a spaghetti plots. It's interesting to look at this being, which has a very complex in finding Catherine and we can say that it's a process and it's imprinted in One Direction in one area of the chain and an imprinted in the opposite direction, in another area of the Jane. And with the patient flow, we can also come see that these areas
correspond to soft aesthetic songs on two different types of phones Augustine. And dumb the other Jean. I found to be quite interesting to observe, what's the existing which is a key control, affect the activation. And what this is saying is that the existing is fine as well, it sistema slated in the Eternal chromosome, which is the inactive X chromosome and that makes sense because it's Crumbs, are the existing is expressed from the inactivated chromosome and so it
needs to be demons license in order to try to be expressed. End the current state of Hannah Mathews is such that I'm providing Upstream support for nail polish at 5 p.m. ago. So I can import any often stops producing software and I can't Funk, Downstream support Downstream analysis, using b s a t. S S&S motor functions. Any of the data from these Upstream software as can be, just using that veers into their object appropriate for PSAT, assessing a job and I will have Lee's Summit this to bug doctor for the next release. Say
thank you all for a while since his table. I think I would like to thank you for giving me this opportunity. And obviously all the members of my life, we've helped me immensely in the development of this by text. Thank you very much. And I will be doing her talk. Thank you for this opportunity to talk about a work on combining the retinal microbiome into a package called, then a micro 16s for easy, access for our community center, and my collaborators Sahara and motorcycles.
They also work at the center at the University of Calgary. And let me start by talking a little bit about the microbiome in general. I'm sure most of you have heard about the microbiome and you have her things like a healthy gut healthy mind. So, That's basically we're all kind of putting around about, you know, trillions of microorganisms all over different parts of her body. And this collection has a certain degree of fluidity in the sense that in changes
with the different environmental experiences that we have medical interventions and some of the fad diets that we switch to do sometimes but I'm getting the appreciating that how much are microbiome. Just like our human genome is inherited from my parents but predominantly from our mothers and there's no data supporting this notion its emerging and Are the first kind of inclinations towards his idea. Came from the human microbiome project. We're heading to the female vagina as
so from their data, it was evident by the way, that why didn't Michael biome is very different and it's different because it's dominated by one of the elect species in case when it's heavy and conditions. If you see that you have and you see this out cuz of all the microorganisms present in that community and this time over girls is called a bacterial vaginosis and it's probably running like 10 to 30% off for female population and It's what it does if it increases our risk of infection. So STI transmission
and also it can cause a preterm births. It also has a huge economic burden about like 4.8 million billion and because of these reasons are a lot of research projects and efforts that are done in this field and found you hiding pack papers. We can see that, they've detected some species that can be linked to rain part and outcomes in women's health and most of the data is seen, as I'm stuck on a problem with the dis He studies is that it's their inconsistent in terms of their classification and maiming and because of that, and the reason behind this is because they use
a customer databases so it makes it very hard for us to do a prosthetic in person. So my colleague Sahar was interested in Capitola so she wanted to look at these different date I said but she was she and I got together she collect the data sets and I was helping her with a bio for Medics and inner attended a workshop for human microbiome project where they introduced by the conductor packages which had the data and I found it very easy to access. So Toothpaste, that works out, the started thinking about our package because we had a lot of data that you put it together and we decided to call it by
Michael. 16 us, for these data sets was that we wanted at least 20. They decided they wanted as a Biomet to your lighter by the vaginal swab swab where I am now we decided to stick with a sorry for a data data sets and not just for the sake of consistency and making it easy for us to process. All the data says that all these units have had to pass water pipeline for processing this day. Cousin. And then, I mean to be then used by conductor package that a tool for doing the
required stuff where they do. And in the end after the pipeline is run from swearing table or a table, or let me take these sequences and classify them using another bioconductor package called decipher the technical table which is a consensus of the results from Odyssey. Cicada look like we have about five thousand samples 5000 samples, not subjects because there are some repeated measures, but we didn't have the information available to humanity. These are from all the data sets and we were able to classify about forty to fifty
percent, a genius-level and then about 8 to 11 % of species levels that this is not for classification because they had specialized. So there could be up to 90% and that was because we had about it. He came out of it available until 3. So the samples for which we had complete information were like these variables were like location or you know I'm biomaterial or instrument which were not the most youthful lusts. Pregnancy information Petersburg but then for a lot of them, a lot of videos that we do not have all these clinical
samples and they're from four different continents, but most of them are from North America and Europe. Are small courts from Africa and Asia, we have mostly samples of amniotic fluid. During samples also, run like hell status. We don't have information on a lot of the samples. We have some information on freedom, but it's just because on SRA people don't upload all information. So our package has all these data science. It's easy to access, we have a few functions that you can use and either acts kind of download access individual data sets or you can access all of them at
once, what it doesn't it trouble you. If I do see object table in there to give you a list of five or six objects. So this package is available on some of the future of features that we're going to add is maybe look for a data structure for microbiology, do not include our microwave We have already contacted some of the authors for the metadata, but there are some things you need to be contacted, so we can get some of it. There's a message that we have learned through this process that I would like to share with you guys. And so, as far as I'm
concerned, in general, we've seen that table names changed between now and Sunday. That is called Health Care is another. It's called disease that has a mystery or race. So we have to do sets with Justice necessity or raisin. It's, it's really hard to to Merced. And we also saw that there are multiple runs in in Adidas at, but they did not have the communities. We also saw samples of different read meant it when I said, and Different runs from different time in somebody else. I did not find primers and that was because
they're different protocols that let people use and some of the Pirates are not so you can so you don't find them in your weeds. And I'm just like to acknowledge my collaborators Sahar and Nora also acknowledged David McIntyre. So he's one of the patients so we could add it to our package. And also, I want to thank Charlotte because I keep on bugging her to be questioned, / pain doctor, and she's really kind, and not answering those questions straight. Another very interesting talk by the final talk in this session is from Jerry can
share lock. MJM conchilla, I'm at the center for advanced medicine computational, biology lab Pride. These components controls located between the components and the, and the page, we have the price of diesel prices for different browsers and Technology applications for most modern processor. New ways of being attracted to make application are all available this one all the way. End of this is a last view of the navigation for 165. Cochrans are interesting. Lets you create a relationship based on
something. If you are there any ideas animation, Look at this package and it's going to track the package automatically use how to use. Add cilantro for tomorrow. I want to look at the data from the TCG. How to create a couple of ways in which you can use the first one in the entire genome. You can also specify a navigation in this region and navigation, to make sure nothing would happen. Sofia, the First. Create and then one extra step in this process is to create an assassin.
Unless you can see, no, I did the same place. Same as before we chat. I created an interactive and the navigation controls out and make requests to the website and you can do it. You can stop that. Express Employment in the library is the ITV track UPS coming from Can I bring another wolf, track a package? Package. Simpson documentary, but it is practiced as part of the body actually am. And this isn't it is a nice way to share your pain is when you have environment the element which is looking at the
navigation Navigation. And it's a nice way to free to see where interesting units of the Baker. And you can also be 2020, Mr. Please give me a call. Thank you. I'll start trying to get through as many of these questions. As I can be gotten a lot. So, just a reminder, if we don't get your question, please, follow up with the speakers on. Or any other to the first question for Emma, is each of the Bayou C 2020. Workshop has a GitHub repo with Doctor file. An image on Docker Hub, software
and data use. What if you hard to make all these available for others to clone in use? On Project is a is a project. So you just have to create an account on my Google app and then change the remote and push your project or you could mere it directly from wherever it's on. GitHub. Sean. What is the advantage of tab? Exoverse, a HDs 5 or tile? DD, is there a delay to re-enter face Santos on package? And actually, it's really just a g zipped top secret files
easy to work with. Weather tonight going to make decisions access to data from genetic and very efficient compared to a CS5 but it's very good for clearing that and as far as I know, there's no delay to REI in the face of that but because it's more of a table than a matrix. It's done. I don't really think that's relevant to my youth case, but thanks for putting together. This package manual serration is a painful process for fairies. But a natural question
is, can you talk about the technical details of the data storage? Are the objects stored in the package of self? Or are they downloaded? Yeah. So are you too? I have three or four variables are store and then when the function is called an object is created at that time and just to comment on the metadata, yes it has been very painful actually to put it together just because you know people to poor different things. They don't report things. A lot of the times because I thought it
is very public so it's it's being a bit of a pain if you're looking at that is looking at and if you're passing a large object. Take me to the practice. Text me, if I want to use ranku for teacher that every student need to create a new project, would there be able to join Byron to tell you that, you can make one project for the course or for each lesson. And then anyone else can all the students can force the project. And if they for the project they got a copy of the project they
can pull to get the changes if you make any. And then one other thing you can do is create a group on get lab and then everyone can have permission to access the project. Understand the spaghetti plot represents an individual strands, does the numbers thin spaghetti lines, contributes the air and revolves around 6. Aaron tibbo. It's actually sad when I think about it, they exist. Set alarm to go on to actually create that smoothing line items at that. Smoothing
over a hundred thousand points. Is actually in any competition expensive. Most, I don't expect that smooth trend is really after some open smoothing and little hack sites for the deposit from the individual strands and I guess I agree that I should have a tree, that's my lunch. And I she gives me an idea that I should probably smooth within each sample and then we did that and use the Everton from that. It represents. The amount of noise in the area.
Are you planning on it? Yes, we are. So we have 20 right now. We have looked at I think 3035 data sets and we could take some out just because it wasn't enough information on there was a problem, but the heart is already looking at more data sets and probably can be in a future. How much can a rain cloud deployments tail on Rantoul Ave, can I run on a large intestine Depends on what resources you have access to and what it means by a large and how much is a scale basically in the back. And there's a community is
some of PODS and you can also reach out to us if you have some specific case and will help you I actually think there were a couple questions kind of related to this about whether it can be deployed on high performance Computing resources or inside another organization that work. Yes. We have actually several instances deployed on other other infrastructures and with different sets of resources. So there's some with gpus and we're exploring right now what the best way of Adventure operating with HBC is. But currently, do you have HBC resources?
You can clone your project locally to the HPC and then use it. Use the work clothes on HBC, but we're, we're trying to figure out better ways of doing it in a more seamless white instead of having to push up. Again, it's just a table set up for the beta frame, and when I pull it in, in my subject, it's kind of just throwing a path to a type of style and waiting for Mccreary. It pulls it back. Smoking a frame. So there's no policy. Arthur were very efficient and the time
we got some personal questions but if we didn't get to your question please, reach out to the speakers and I like to take all four speakers again, for very wonderful talk. The final contributed talk session is this afternoon at 3 p.m. so she stopped after that.
Купить этот доклад
Купить это видео
ConferenceCast.tv — архив видеозаписей докладов и конференций.
С этим сервисом вы можете найти интересные лекции специально для вас!