TensorFlow World 2019
October 30, 2019, Santa Clara, USA
TensorFlow World 2019
Video
Great TensorFlow Research Cloud projects from around the world (TF World '19)
Available
In cart
Free
Free
Free
Free
Free
Free
Add to favorites
2.34 K
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Zak Stone
Product Manager for Cloud TPUs at Google

Zak Stone is the product manager for TensorFlow and Cloud TPUs (Tensor Processing Units) on the Google Brain team. He is interested in making hardware acceleration for machine learning universally accessible and useful. He also enjoys interacting with TensorFlow's vibrant open-source community. Prior to joining Google, Zak earned a PhD in Computer Vision and founded a mobile-focused deep learning startup that was acquired by Apple. While at Apple, Zak contributed to the on-device face identification technol

View the profile

About the talk

The TensorFlow Research Cloud (TFRC) has made more than 1,000 Cloud TPUs available for free to machine learning researchers all over the world. This talk presents a small sample of the exciting work that these researchers have accomplished with TFRC compute. You'll also learn how to request TFRC access to accelerate your own research projects.

Presented by: Zak Stone

Share

Thank you very much. I'm delighted to be here today to talk to you about some of the Fantastic Center for research club projects that we've seen around the world and to invite you to start your own whether you're here in the room watching the livestream or online watching this afterwards. Any of you are welcome to get involved with tfrc. Describe briefly since I'm sure you've heard this all today. The context is this massive Improvement in competition capabilities driven by Deep learning so deep and specifically these deep neural networks are in a blink many new

applications that are exciting and all sorts of ways touching all kinds of different data ranging from images to speech-to-text even full scenes and the challenge that many of you are probably grappling with is that these new capabilities come with profound increases in computer requirements. A while back open and I did a study where they measure the total amount of computer required to train some of these famous machine learning models over the past several years and the important thing to notice about this plot is that it's actually a log scale on the

computer access. So there are tremendous increases in the total amount of compute required to train the state-of-the-art Deep learning models over and there's this consistent Trend up into the right that these new capabilities are being unlocked by the additional compute power as well as lots of hard work by many researchers All Around the World in this open community. So unfortunately these tremendous demands for compute to meet these new opportunities opened up by Deep learning are coming to us just as Moore's laws ending we benefited for decades upon decades

inconsistent increases in single-threaded CPU performance, but all of a sudden now, we're down to maybe 3% per year, who knows there could always be a breakthrough, but we're not expecting extraordinary year upon year games from single-threaded performance as we've enjoyed in the past. So in response to that, we believe that specialized hardware for machine learning is the path forward four major performance wins cost savings and new breakthroughs across all these research domains that I mentioned earlier. Now I Google we've developed a family

of special purpose machine learning accelerator pump clogged CPUs, and we're on our third generation now and two of these Generations are available in the cloud the second and the third generation just to give you a brief overview of the hardware that I'm going to be talking about for the rest of the session. We have these individual devices here cuz if you V2 and V3, and as you can see, we've made tremendous progress generation of a generation 180 teraflops to 420 teraflops. We've also increase the memory from 64 gigabytes of high bandwidth memory 228 which matters a lot if you care about

these cutting-edge natural language processing models like bird or XL net or GPT to But the most important thing about Todd reviews isn't just these individual devices, you know, which of the the boards that you see here with the 40 few chips connected to a TV host has not shown. It's the fact that these devices are designed to be connected together into multirack machine learning super computers that let you scale much further and program the whole supercomputer across as many rocks as if it were a single machine now on the top here you can see the clouds review V2 pods spending

for racks. The CPUs are in those two Center columns in the CPUs on the outside. Lat machines Costco 11 1/2 petaflops, which you can also subdivide Every Witch Way as you wish and the TPU ships in particular are connected by this to Detroit on mesh Network. The Naples ultra-fast communication is much faster than standard data center networking. That's a big factor in performance, especially to care about things like model parallelism partitioning, but now with three plug which is actually liquid-cooled the picture wasn't big enough to hold all the racket spend a tracks out to the

side and it gets you up over a hundred petaflops. If you're using the entire machine simultaneously on a rock by up basis that competitive with the largest supercomputers in the world, although these two fuel supercomputers use lower Precision, which is appropriate for a deep learning. Now I've mentioned performance. I just wanted to quantify that briefly in the most recent. Ml / training version 0.6 competition Cloud reviews were able to outperform on-premise infrastructure. What you can see here is the TP results in blue compared with the largest on-premise cluster results that were

submitted to the competition. And in three of the five categories that we entered the cloud see if you deliver the best Topline results, including 84% increases over the next entry in machine translation, which is based on Transformer and object detection, which was an SSD architecture. Obviously these these numbers are evolving all the time. There's tremendous investment in progress in the field, but I just wanted to assure you that these people use can really deliver when it comes to high performance at scale. But today we're here to talk about research and expanding access to this

tremendous computing power to enable researchers all over the world to benefit from it and explore the machine learning Frontier make their own contributions to expand it. In order to increase access to cutting-edge machine learning compute. We're thrilled to have been able to create the tentacle research cloud in order to accelerate open machine learning research and hopefully to drive this feedback cycle. We're more people than ever before have access to state-of-the-art tools. They have new breakthroughs, they publish papers and blog posts and open source code and give talks

and share the results with others that helps even more people gain access to the frontier and benefit from it. So we're trying to drive this positive feedback loop Aunt was part of that. We've actually made well over a thousand of these Cloud TPU devices available for free to support this open machine learning research. If you're interested in learning more right now, you can also have more information at the end of The Talk this pool of compute the tfrc cluster involves. Not just the original that we included but

we recently added some of the V3 devices of the latest generation if you're really pushing limits and there's the potential for clouds review paw taxes. If you gone as far as you can with these individual devices, please email us. Let us know and we'll do our best to get you some access to see if you pods The underlying motivation of all this is a simple observation, which is that Talent is equally distributed throughout the world, but opportunity is not and we're trying to change that balance to make more opportunities available to talented people all around the world wherever they

might be. So we had tremendous interest in the PRC program so far more than 26,000 people have contacted us interested in cfrc and we're thrilled that we've already been able to on 1250 researchers and we're adding more researchers all the time. So if you haven't heard from this, yes yet, please sing us again. We really want to support you with your first scene. The feedback loop is just starting to turn but already I'm happy to announce that more than 30 papers in the academic Community have been enabled by

tfrc and many of these researchers tell us that without that the FRC compute. They couldn't possibly have afforded to carry out this research. So I feel like in a small way. We should have grabbed the level of progress here in Italian. So the whole field is moving just a little bit faster and we really thank you all for being part of that. I'm most excited though to share some of the stories directly of the individual researchers and the projects that they've been carrying out on the tfrc cloud CPUs and know these researchers they come from all over the world. I only have time to

highlight for projects today. But the Fantastic thing is that three of these researchers have been able to come and travel here to be with us in person. So you'll get to hear about their projects in their own words will start with Victor Divya here in the upper left. Welcome, Victor. Come on up. Hi Tom. I love it. When really excited to be here. My name is Victor Divya. I'm originally from Nigeria and currently on my research engineer with color fast forward labs in Brooklyn, New York. And so About a year ago. I got really fascinated about

this whole area to know the intersection of Arts in the eye and given my background to my interest in human computer interaction in and applied artificial intelligence is something I really want to do right about that time. I got the opportunity to have access to tfrc. And today I'm going to talk to you about, you know, the results of those experiments and some of the research results. Can I hide working on face? So why did I work in this project as a as a little kid growing up in eastern Nigeria my extended family and I will travel to Village once a year. I want to see

interesting dinner captivity in part 6 of those trips with something called the Eastern masquerade dances of of Africa. It's always happen is that there's this dancers with this complex labret mask and no right as a kid. I was really fascinating. And so this project has agreed to can I Bridge my interests in technology arts and at the richest research engineer to I call to express my identity to a project like this. In addition to this data is growing areas inspired arts or AI generated art thing you notice in that space is that most of the data sets that are used for this sort

of Explorations? Milia classical European art Rembrandt Picasso as a project like this is way too kind of diversify the conversations in that area and then find Lisa research researchers working in the generative model domain images and it can be really interesting way to maximizing you generated model set being a research today. So what did I do? And so like I love the snow the best results must have afros from Mission line project comes from the data collection fees. So I started out collecting images I280 about 20,000 images

and then we called that down to about 9,300 high quality images. And at this point I was right to train my model. The beautiful thing is that the tents of 14 have made available a couple of reference models. And so I started my experiments using an implementation using dissecans implemented with tensorflow TPS meters. I saw the picture you see on the right is just a visualization of the training process for a deep convolutional again. So it starts album random noise and as training progresses, it learns to generate images that are really similar to the input data distribution. It's

starting out with the reference implementation. Is there two interesting things that I did the first was to modify the the configuration of the network, but if I didn't couldn't Dakota parameters, let distinction. Larger imagistics 4px 128px. I didn't write a custom data input pipelines lets me CJ is my data set into the mall and get it rained. So the thing you should watch out for is to ensure that yeah, you're good at you could pipeline tennis matches what the the reference model implementations expecting it took me about 66 experiments to be tracked down the air on 62 take a

couple of days to fix that and all I want about 200 experiments. And at this point this is where I can't afford research Claus remix difference. Is that something like this would take a couple of weeks to get done, but I was able to run messages experiments once all buts were fixed within about 20 cities. That's what this point all the images. You see here. They look like mosque but interesting thing is that none of them are real other than exist in the real world. And these are all interesting autistic interpretations of what an African mask would look like as a what could I do next? As

I said to think at this point, I have a model that does pretty well, but the question is at the images never actually new stuff has model just never I smoke my infant that has has a kind of like regurgitated that answer to answer this question side. I took a semantic semantic search approach why use their protrain model of a g16 text messages from all of my data sets and all of my generated images and I build this interface that allows some sort of all grid McCord inspection referee generated image. I can find the top 20 images in the desert

that I actually similar to that image. So this is one way to actually expect the results from a mother like this. The going forward the best stable model I was able to train could only generate 6 4px images but can we do better? So it turns out that you could use super resolution guns and the digits one of my favorite results where we have a super resolution can from the topaz. Yacht pics of AI model and the other has another interesting result when you probably can't see very clearly years at this detail and just Sweep result in aged. I really just does not

exist in the low resolution images. I'd like to step interpretation using your Netflix SOA co-researcher or you're an artist or software engineer interested in this sort of work. Yeah, there we got the go-ahead all of the code I use for this. It's all available online and there's a blog post that goes around to it thinking turn my picture. Next up we have wisdom. Come on up with them. Thank you. Thank you, I'm glad to be here. I'm wisdom. I'm from Togo I grew up there and I'm currently visiting

researcher to Mena in Montreal doing research in grounder language learning natural language understanding under the supervision of you for NGO. So since the past year, I've been interested in medical report generation. And so basically when he goes to see if each other just you get your chest x-ray taken and would you least rice in a fraction of seconds to to interpret the X-ray and producer Radiology reports that has monthly the sections of findings and oppression and turning so rich and observations from different regions of the church. So basically saying if there's an abnormality of

that region or not and impression section highlights the keeping of findings also because this happened very fast and recharges can commit mistakes in our community has been thinking of ways to appointment with urologist with AI capacity to provide a third eye and we have discussed fire start work for the word for that. The problem with this is gospel station is you were going from the image to labels. So where's the step where we generate to recharge a report? And that's what I've been interesting and xiumin Li. How to try something that's image captioning to

generate a report basically condition language model on the input image and Maximus Turlock likelihood but this doesn't work very well on medical reports because there is nothing in this formulation that ensures clinical accuracy of the report that I've been generated and this is big problem. I just want to be interesting to solve and I found inspiration and run language learning. So in this setting you have a mother that receive natural and get instructions to achieve a task and to correctly cheapest tax the mother needs a good natural language instructions. And I said we

can do the same thing for metacarpal generation. So rather than or on top of maximizing the luck likelihood of which we doing image captioning we can also reward the language model based on how well is that what was useful for a medical fact, let's say classification for the sense. So hear the coffee fire takes the recharger report US Imports and we can also add an image for Superior accuracy. But what is interesting here is in a backward pass. We are updating language model parameters based on a how well the output was useful for classification and that's a good

starting point for accuracy because we are forcing language models. Do Apple things that have enough and pertinent medical Clues to ensure accurate diagnosis and a trains model on Linux user data set which is the largest two dates with buck. Just recognizing free text to Jody reports. M2 trendiest I needed extensive amount of compute and I started this project as a master to Denton India and I was turning on my laptop. I know yet that was spent. So I apply to see of RC to have access to tpus and I got it. So I had

suddenly many cheap used for free to do a lot of experiment at the same time and that's what's useful for to each weight fast in my research because I needed to reproduce the baselines as well as optimize my a proposed approach xiomara smell out. Your farts using was 8000 + Powers. I use V2 devices virtual devices and also talked to try the video pause devices. So I couldn't leave you without some results of the model. I'm so this is a case of hiatal hernia. And if you speak you talk to me Georgia's to tell me the

main evidence for hiatal hernia. He's out the retrocardiac opacity. You can observe with that Green Arrow and in the Redbox, you see the Grand Cru fridge or the report of the Sexes. So the green one is a baseline. I reproduced and optimize the best support but you can see that the mother completely completely misses out on the key findings and as a consequence of turning a log likelihood because because you were Optimus maximus in the confidence of the mother it will avoid taking risk and we'll tell you any person of the

time that your x-rays Define ISO. The the blue box is my Approach and you can see that by forcing the language multiple things that I used for classification the model find the right. Words to use to justify this case of hiatal hernia. So I'm very excited to have presented this work very recently at Stanford a different year of Arc that are scientific Symposium organized by Philly and Stanford school of medicine, and I'm excited for what's next for this project in thankful to the tea party theme for are providing the the resources that we use for this

work. Thanks for having me. Thank you very much for a friend. Except we have Jade. Hi everyone. I'm Jai Dev it I'm from South Africa. So it comes very long way to be here. I will come to go right to wrap it, but I'm going to be here today to talk about what I do at work. Which of these days is a lot of birth stuff. I'm here to talk about my side projects and I kind of picked up researches as a as a hobby. And what we're trying to do is work on this year. There's a lot of African languages does not speak a little bit later

and very little research. So what we did here is it kind of looked at this for five of the Southern African languages better kind of Baseline models just feeds into a kind of Grey 2 project for to call Mexicana Mexicana means we built together and easy to do and in this project was trying to kind of change the kind of Anarchy footprints on the continent. So the problem we have over 2,000 languages does quite insane many of these languages are exceptionally complex some of the most complex in the world and in contrast we got almost no data

if you do half day so we have to dig to find it and what's even worse is it does absolutely no research. So if you are a beginner NLP practitioner United currently learning about machine translation of MLP on the continents and you do a search and you're trying to find something in your language, there's there's nothing right. You can look in some obscure journals and you'll find like maybe some old linguistic Publications and that's kind of the extent of it just makes it hard. If you're trying to build on models in you're trying to kind of spare This research this graph you

can see what is the normal size paper count by country at the 2018 NLT conferences. And if it's near the mall orange, it is a mole at them and you see Mt. Consonance in the middle of the day and even even a widening an old workshop at ACL. They still near the gospel isn't that much different than that's meant to be inclusive of a more people from around the world. So what did we do? We said I like to say we took some existing data that we scrounge around the phone and we took the state of the art modeling. We smash them

together. They never seen each other this this model in this data and we didn't see the into the side effects of Tri optimize the in Mt. Kind of algorithms parameter. Why is just do two to a bed on these Liberties West African languages and I'm kind of a goal for this is just for that additional research. Cuz right now there's nothing provide these bass lines in this is where tfrc came in. Like I said, this is my side projects are instead of you know, you're having lots of money to do this, but I do that I tried and I was renting gpus from a cloud provider and they would have

cost me an arm and a leg I reached out to tea if I see and they were super super happy to Linda ccps we basically is the tense of to 10 slow framework to train up. These models used government data. Carla copas has that we we manage to find their I'm one of the things that we found out that she simultaneously kind of present presented at at ACL on a different language than English to German. We found that optimizing this by parent coating tokenization allows like these very complex. Clarice's languages allows it to kind of handle there a

beautiful nature images, which is kind of adding on more words or likes touching little bit scam TV changes the meaning that's a great nation in the NBA optimized in his prime at so I can make a really significant differences in in the in the blue school. And what was it something I think in two or three weeks to 22 Webb shop and it said of taking days during these experiments it would take a couple of hours people these models. So, yes, thank you 42 too spicy for that. The discomfort of results overview of the five languages we had you can see in Newton Massachusetts

one on satsanga. We have almost in the cases. This one. I almost double the bigger bigger is better with blue where in Africa is actually european-based language is based on. That's preferred the oldest tistical machine translation of like a architecture and what better there. Unfortunately runs on CPU and actually takes a lot longer than public language, but we had very few sentences and the sentence is also very Macy's and this case has to go translation performed site. You got to English. For Conns, but was really cool as you can see

the attention actually captured some of the language structures. I'm here. We've got our particular instance where can not in Afrikaans becomes to Whitney. And at the end you have to say knee again for the double negative. What's your why and you can kind of see that's it actually sold. It. Does that they still on this screen for some reason. There's a yellow that you can fix up the tennis matches to can me and me and those are the words of spiteful. Here, we've got one of us novel translation service could also sentence for the reference translation in the Toronto a translation of his

day and very few of you are likely to speak to someone in the audience. So we may just like speaker to ask you translate it back to English what that was the consummate generated and you can see it. It took me about sunflower fields and lands and flowering periods, and they picked up blossoming. So you can see that she is sick. She done really really really well despite having so little later. That's what it was my call to action on a project credit of a project that I owe you can go check it out and idea is to basically change this map. So it does matter

is what our current representation of researchers across the African continent are currently working on on languages from and like I said the ideas to Sperrys are so if you know the language from Africa, even if you don't and you wouldn't contribute time or resources or advice to a lot of girls are very junior teams who don't have supervisors or more people who work a machine translation or even if you'd like to come do maybe like a way to not drop us a message and yeah, thank you very much to tip our Seafood for hurting us and you know what else we can actually build.

sexy much J Thank you. I'll let me get the picture to. Did you so much? Let's have a round of applause cuz Victor wisdom and gave her coming to represent their research. Thank you so much. Really a pleasure to have you here. So there's one more project. I want to show Jonathan wasn't able to be here in person. But this is just fantastic work and I wanted to Showcase it so Jonathan and his colleagues at MIT won the best paper award at I clear with a paper called the lottery ticket hypothesis where they're looking for the

sparse trainable neural networks within larger. No networks. Nothing to do with this really interesting idea many of the neural networks that were used to have this tremendous number of parameters. They're very large and one thing that the opening a graph earlier didn't show is that these neural networks are getting larger and larger over time. There's generally a correlation between larger model sizes in higher accuracy as long as you have enough training data, but it takes more and more compute power again to train these larger and larger networks. So that Jonathan is calling us this

question. What if You could find. Just the right hub network in this much larger Network that could perform the same task as a larger network ID lead to the same accuracy and those networks are you know somewhat whimsically called lottery tickets at I clear. Jonathan used small networks, cuz that's what he could afford to show some initial encouraging evidence for this hypothesis. But the real interesting part of this research at least from my perspective in big computer is does it work at scale? Right? And

so I find that out Jonathan got in touch with us here at the FRC to try to scale up this work and you know, he was kind enough to stay at the bottom here that for his group research at this scale would be impossible without TV use Let me share a little bit more about his work and then about his findings so. The lottery ticket hypothesis, as I mentioned before is related to this broader category of techniques called pruning and to be clear there are many approaches to pruning neural networks. But most of these approaches take place after the networks have been trained. So

you've already spent this compute time and cost to get the network train and then you modify the trans model to try and set weights to zero or reduce the size of the modeler distill. It is another approach into a smaller model but it's interesting to ask. Could you just turn a smaller network from the start? Could you prune connections early on maybe at the very beginning or at least early in the training process without affecting the learning too much. So, like I said this initial paper showed some very promising results on small networks on small data sets. So with the tfrc call diffuse

Jonathan took this two models were all familiar with resnet50 trained on imagenet and he found slightly different Behavior, but was able to validate the hypothesis. So you can't know all the way back to the beginning. You can't prove it all the way to at least with current understanding. There may be for the breakthroughs, but you can go almost back to the first Deepak cut the network down and then train from there with a much smaller Network without any harm to accuracy in particular that with resnet50. You could remove 80% of the parameter at Epoch for

and not heard the accuracy at all. And you're training to something like ninety epochs are further. So this is a real computer savings and there's a place down below with resnet50 in Inception time showing you this rewind Epoch and showing the text error stays low until you get past rewind epoc for 3. One thing I really appreciate it about Jonathan's work is in addition to carrying out these experiments and Publishing them and sharing them with the community and inspiring other research some interesting tools to help manage all the computer. So what you're saying here is actually a Google

sheet. So a spreadsheet that Jonathan wired up with scripts to orchestrate all of his experiment. So this was a fully declarative system at the end of the day, he could add a row to the spreadsheet and behind-the-scenes his script would kick off a new experiment monitor the results bring them back into the spreadsheet flag air conditions. If anything have done wrong and at the end of the day this spreadsheet had thousands upon thousands of rows showcasing all the different experiments that were, you know, then searchable in and share a bowl and usable in all the ways that a spreadsheet is, so

I thought this was a great mix of old technology and new and this was a serious amount of compute Jonathan estimates that he used at least 40,000 hours of clouds if you compute on cfrc, so I hope that underscores that were really serious. About providing a large amount of computer for you to do things that you couldn't do otherwise and then share them with the research community. So it's not just about the projects that you've heard today. These are just samples of the thousands of researchers for working on cfrc and I'd really like to personally encourage all of you to think about your

next research project happening at the FRC and it can be an academic project or it can be a side project. It can be an art project as long as it's intended to benefit the community as long as you're going to share your work with others make them open and help accelerate progress in the field. We love to hear from you. So if you're interested in getting started right now, you can visit this link below G. Co / keep you talk and enter code t f world. That'll go straight to us in the organizers of this event, and we're happy to make available as a starting point five regular

Cloud CPUs and 20 printable Cloud CPUs for several months for free the rest of Google cloud services still cost money. So this is not completely free, but the CPUs are the overwhelming majority of the compute cost for most of these compute-intensive project until we really hope that this enables things that you couldn't do otherwise, and if you get to the limits of what you can do with this initial Kota, please reach out. Let us know tell us about what you're doing tells what you'd like to do, you know, we can't promise anything, but we'll do our best to help with more maybe even a lot

more compute capacity including the access to pods that I mentioned earlier. So thanks again to all of you for being here today for joining the live stream online. Thanks to our speakers for I work in person and we'll all be happy to hang out here afterwards and answer any of the questions you might have for those of you who are here in the room, please write this session in the Riley and thank you all very much. Hope you're enjoying tea.

Cackle comments for the website

Buy this talk

Access to the talk “Great TensorFlow Research Cloud projects from around the world (TF World '19)”
Available
In cart
Free
Free
Free
Free
Free
Free

Access to all the recordings of the event

Get access to all videos “TensorFlow World 2019”
Available
In cart
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “AI and Machine learning”?

You might be interested in videos from this event

March 11, 2020
Sunnyvale
30
205.62 K
dev, google, js, machine learning, ml, scaling, software , tensorflow, web

Similar talks

Raziel Alvarez
Software Engineer at Google
Available
In cart
Free
Free
Free
Free
Free
Free
Jeff Dean
Google Senior Fellow at Google
+ 5 speakers
Megan Kacholia
Engineering Director at Google
+ 5 speakers
Frederick Reiss
Chief Architect at IBM
+ 5 speakers
Theodore Summe
Head of Product for Cortex (Twitter Machine Learning) at Twitter
+ 5 speakers
Craig Wiley
ДолжностьDirector, Product Management CloudAI Platform at Google
+ 5 speakers
Kemal El Moujahid
Product Director, Tensor Flow at Google
+ 5 speakers
Available
In cart
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “Great TensorFlow Research Cloud projects from around the world (TF World '19)”
Available
In cart
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
558 conferences
22059 speakers
8190 hours of content