About the talk
This video is part of the R/Medicine 2020 Virtual Conference.
Okay. Welcome everybody. Our next talk is by Karthik ROM reproducible notebooks with hole punch. This is also recorded Karthik will be available to answer some questions during the chat as well as at the end of this talk. Thank you. Hey there, I'm Karthik rhymes from the University of California at Berkeley. And today, I want to talk to you about Bieber's little notebooks with a package that I have developed called hope. So let's imagine you've completed a project and you shared all of the code on GitHub. It's also
Imagine for a second that the code runs and not just on your machine. Avery is able to clone your repository and then render all of the outputs from your notebooks, right? I'm quite likely. Not there, many reasons why this can happen but a few common ones are that others don't have access to your data. Masturbate to me. I called and those APS are now I'm reachable but the more likely reason is that the dependencies have changed over time. So they are packaged. Ecosystem is continuously, evolving actions can change their behavior or simply become
Decatur. And someone else with the exact same set of packages but different versions might not be able to run your exact same outfit. So they're two things you can do to alleviate this document all your dependencies. And then the next thing you want to do is maybe create a Docker container of all of the other teaser are very time-consuming. Not something everyone has time to do for every project but there are ways to make this a little bit easier. So I'd like to introduce to you to the concept of a research compendium and the original idea proposed by Robert John to
ship a collection of data code and text together as a compendium which can then be easily shared managed and updated. If you're interested in learning more about research compendia, I get a longer talk on this topic at the 2019 RC your conference and I've linked to that talk in the slide. It turns out that are packaged structure is ideally suited for a compendium, and this is possible, because our packages contain a file called, the description is composed of simple, keto repairs in a file format called the deputy controls format. And if you'd like to return a collection of code for compendium
the simplest thing you can do is add a description file similar to the one that you've seen in many are packages, it does not have to be comprehensive, it just needs to have a few required fields. Example, I have a package of Tide compendium, I've named it after the version number listed a few dependencies, including one that is available, only on a nice thing is that with this vile someone can easily install of these? Depends, he's using Productions package, but so far
we have only listed the dependencies but not the exact versions of the dependencies and this is where binder comes in binder is an open source project. That makes it very easy to share analysis that you have any type of notebook. It's it's worth noting that there is an open source project. And my binder is an instance of a project that can be many instances of a binder. And if you enable binder on a collection of code and out of Dodge to your GitHub repository, anyone can click that badge and then
be dropped into an art studio server on your browser. All of the, the depends he's already been installed and the code is actually required. It's really that simple cap a badge on your read. Me launch of the new instance of binder binder, then looks for recent Docker image. And if it's not able to find one that takes several minutes to build a new one and then once fully launched it is, it drops you into an art studio server with everything ready to go. How do you set up binder for
your art project is a few different ways to do this but I believe the supposed to do this is with the whole bunch package and the workflow for this is very simple. You start by loading the whole bunch library and then you were right to file, a description file, and a dockerfile. You'll then add a badge to your evening. At this point, you will commit and put your code to get high. And then finally a spider to build a Docker image from the docker file. The workflow is a description, Rida dockerfile Jenner, the badge and build on binder,
and these map day nicely to four functions for the whole bunch package, and I'm watching them right now. So you'll start by writing a description, you can just stick to the G Falls but you might want to name your package, describe it and a diversion number. Your dad wanted to create a Docker file and some of the things you might want to change or the maintainer. Everything else you can leave to defaults. But it's interesting to note that opens for picking up her. Docker image to start from and a chooses one that already comes with a car seat
and the Tidy burst of really speed things along Call control Austinburg at the date where you last modified and are stripping are marked on file, in your project and pop that date in here and this day, put map on our that is used in Europe. And of course, the last step is Easter Danger. Made a batch the have option here defaults to my binder, which, as I said, as an instance, of of binder, you can leave practice, default, or swap it out for any one or many other binder Huts that are publicly available.
This is, this is option only because clicking the badge also runs the stuff. So does the steps to Growing from a collection of our code on GitHub to having a live executable notebook, that anybody can be great for showcasing all examples. From your paper code, examples for teaching and also to show off, use cases, and tutorials. There are some limitations of binder though, it only has one gigabyte of ram, so you're not going to run cop station, intense
examples, in real time out, after 10 minutes of of being an active and you still have to find a way to have all of your data reading at the analysis time. So you can either come in small detour to get home or the data in from elsewhere. So I there's a really nice way for you to get more visibility for your work and if you're interested in learning more about a mix of the documentation and links to the slides, thank you very much. Thank you. So we have a few questions. The first question is, will writing dockerfile launching
binder instances, faster packages, like tidyverse or installed from Source on Linux and takes forever to install? Thank you. Yes, it will actually make it go faster. And that's part of the reason that I wrote the whole bunch package at schiphol, punch entirely, and Poppy, or get. Happy are all into binder, but it's going to try and build as you say all of the tiny verse from source. So the image that I launched from already includes all of the tidyverse, all the teddy bears dependencies are Studio server Binder & Binder
dependencies. So at most, it's going to install a few extra packages and really speed things up and build binder. Does this sort of asynchronously? So it won't hold up to your your are terminal. You can just continue doing other work and when it's ready, and I'll just watch the server for you. And any other question that we had is this hole punch apply to any, our projects are only our packages. So you would normally not do this for our packages because you're not trying to create a virtual instance
of your are package for someone to run through. So you actually do this for code examples. So I don't know if anybody that's turned on binder for a package directly but if you've got examples and a GitHub repo, that's where you were turn this on. I meant my repository already has a Docker file for IPython notebook. Should I create a separate dockerfile for running the rmds? So this is an interesting question. I'm not sure. If your dockerfile is mod can be modified to install
Sim binder elements and make sure you install the art studio elements. Then it's probably fine to use the same dockerfile. I would say that you can have multiple dockerfile from the same repo and led by their only, look at 1 and leave a second. Docker file for a local work. What are other words that you might do? So that package installs, all of the binder specific configuration in a hidden folder called. Binder, which will not interfere with your standard dockerfile. That's the only, that's the only issue it is possible. That would just a standard
dockerfile without all the other elements. You might end up on binder, but it's not able to run our studio. Wonderful. So thank you so much. Karthik is a really cool package and we're going to move to the next session. Thanks.
Buy this talk
Interested in topic “Medicine, Health and MedTech”?
You might be interested in videos from this event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.