John L. Hennessy, Professor of Electrical Engineering and Computer Science, served as President of Stanford University from September 2000 until August 2016. In 2017, he initiated the Knight-Hennessy Scholars Program, the largest fully endowed graduate-level scholarship program in the world, and he currently serves as Director of the program. Hennessy, a pioneer in computer architecture, joined Stanford’s faculty in 1977 as an assistant professor of electrical engineering.View the profile
About the talk
In this Keynote Session, John Hennessy, pioneering computer scientist, distinguished engineer, and joint winner of the Turing Award for his work developing modern day computer chip architecture, shares his thoughts on the future of computing in an era of artificial intelligence.
Boy, I'm delighted to be here today and have a chance to talk to you about. What is one of the biggest challenges we faced in Computing in 40 years, but also a tremendous opportunity to rethink how we build computers and how we move forward. He said all exponential's come to an end. It's just a question of when and that's what's happening with Moore's Law. If we look at the Rams that's probably a good place to start because we all depend on the incredible
growth and memory capacity. And if you look at what's happened in grams for many years. We were achieving increases about 50% of your other ways going slightly faster, even than more sloth slow down and if you look what's happened in the last 7 years. This technology. We were used to sing Boom. The number megabits per chip more than doubling. Every two years is now going up in about 10% of year and it's going to take about seven years to double that are particularly odd technology because they use deep trench capacitor. So they require a very particular kind of fabrication
technology what's happening in processors though. And if you look at the data in processors, you'll see a similar slowdown Moore's laws that red line going up there or nice on a nice logarithmic lot noticed the blue line. That's the typical Intel microprocessor at that date. It begins diverging slowly at first, but look what's happened since in the last 10 years roughly. The Gap has grown. In fact, you look up where we are in 2015 2016. We're more than a factor of 10 off. Had we stayed on that Moore's line curve. Now. The thing to remember
is that it also there's also a cost factor in here or getting a lot more expensive and the cost of chips is actually not going down as fast. So results of that is that the cost for a transistor is actually increasing at a worse Ray so was beginning to see the effects of that is we think about architecture but if the Slowdown which is what you see all the Press about is one thing the big issue is the end of what we called Dennard scaling. So Bob Dennard is it was an IBM Employee was the guy who invented the one transistor d'ram and he made a
prediction many years ago that the energy the power per square millimeter of silicon would stay constant would stay constant because voltage levels would come down. What does that mean? If the energy it took the power stays constant and the number of transistors increases exponentially then the energy per transistor is actually going down and in terms of energy consumption, it's cheaper and cheaper and cheaper to compute. Well what happened to technology improving on a standard Moore's Law curve the blue
what's happening to power and you all know. I mean you seem microprocessors now, but they slow their clock down. They turn off Coors they do all kinds of things because otherwise they're going to burn up they're going to burn up. I mean, I never thought they were a processor would actually slow itself down to prevent itself overheating but where they are and so what happens when is the last began to slow down but starting about 97 and then since 2007 it's essentially faulted. The result is a big change all of a sudden Energy power becomes the key limiter
not the number of transistors available to designers, but their power consumption becomes the key limiter that requires you to think completely differently about architecture about how you design machines. It means inefficiency in the use of transistors in Computing inefficiency and how an architecture computes is penalized much more heavily than it was in this earlier time. And of course, yes, what all the devices we carry around all the devices we use or running off batteries. So all of a sudden energy is a critical
resource. Like what's the worst the worst thing to happen? If your cell phone runs out of power you're smart phone runs out of town, but you think we're going to have to always on and permanently on which are expected to last 10 years on a single battery by using energy harvesting techniques energy becomes the key resorts in making those things work efficiently and as always on devices with things like Google Assistant, you're going to want your device on all the time or
at least do you want all the time? It's not the screen. So we're going to have to worry more about power but the surprising thing that many people are surprised by Is that Energy Efficiency is a giant issue in large Cloud configurations? You notice that green flies. They are those are the servers to look at the size of that red slides that red slices the cost of the power plus cooling infrastructure spending on processors. So Energy Efficiency becomes a really critical issue as we go forward and the end of that. There's no more free lunch
for a lot of years. We had a free lunch. It was pretty easy to figure out how to make computation more energy efficient now it's a lot harder. And you can see the impact of this. This just shows you 40 Years of processor performance what's happened to unit processor single processor performance and then multiprocessor performance early years of computing the beginning of the microprocessor era. We were singing about 22% Improvement per year of the creation of risk in the mid-1980s a dramatic instruction level parallelism pipelining multiple issue.
We saw this incredible. Of about 20 years where we got roughly 50% performance Improvement per year. 50% that was amazing, then the beginning of the end of Dennard scaling. What is that that caused everybody to move to multi-core what is multi cord to multi-course of the efficiency problem from the hardware designer, but the software people now the software people had to figure out how to make use those multi-core processors efficiently butt and balls walking along reared its ugly head. I'll show you some data on that. And
now we're in this late stage. It looks like we're getting about 3% performance Improvement per year. Doubling could take 20 years. That's the end of general-purpose processor performance as we know it as we're used to for so many years. Why did this happen? Why did it grind to a halt so fast? Well think about what was happening during that wrist are aware. We're building these deeply pipeline machines 15 16 17 stages de pipe one that machine needs to have 60
instructions that it's working on it 160 instructions how to possibly get 60 instructions that uses speculation and tries to execute them. Nobody can predict branches perfectly every time you predict the branch incorrectly. You have to undo all the work associated with that Miss prediction. You got to back it out. You got to restore the state of machine and if you look inside a typical Intel Core i7 today on Integer code roughly 25% of the instructions to get
executed end up being thrown away. Guess what the energy still got burnt to execute all those instructions and then I threw the results away and I had to restore the state of the machine. A lot of wasted energy. That's why the single processor performance curve ended basically. But we see similar challenges when you begin to look at multi-core things and more than 40 years ago is still true today, even if you take large data centers with heavily power workloads. It's very hard
to write a big complicated piece of software and not have small sections of it be sequential whether it's synchronization or coordination or something else. So think about what happens you got a 64 processor multi-core in the future suppose 1% is sequential. The next 64 processor multi-core only runs at the speed of a 40 processor core, but guess what you paid all the energy for a 64 processor Core X all the time and you only got 40% 40 processors out of that slightly more than half. That's the problem. We've got to break through this efficiency barrier. We got to rethink how we
design machines. So what's left? Well software Centric approaches. Can we make our machine? Can we make our systems more efficient? It's great that we have these modern scripting languages their interpreted dynamically typed think they really liberated programmers to get a lot more code written and create incredible functionality. They're efficient for programmers. They're very inefficient for execution and I'll show you that in a second and then there a hardware Centric approach has what we call what date and I called domain-specific
architecture design texture which isn't fully general purpose, but which does his set of domain to set of applications really well much more efficiently. What's the caliber with the opportunity is this is a chart that comes out of a paper by Charles licensing and a group of colleagues at MIT cold is plenty of room at the top. They take a very simple example admitted like Matrix multiply. They write it in Python. They run it on an 18 core Intel processor. And then they proceeded to optimize it first rear. I didn't see that speeds up
47 x 47 would be really remarkable even a speed of a 20 a factor of 9 out of that then they rewrite it by doing memory optimization that gives him a factor of 20. They blocked the Matrix. They allocated to the cash as properly that gives him a factor of 20 and then Finally, they rewrite 8 using Intel AVX instructions using the vector instructions in the in the Intel Core write domain specific instructions that do Vector operations, efficiently that gives them
another factor of 10. The end result is that final version run 62,000 times faster than the initial version but it shows the potential of rethinking how we write the software and making it better. So what about these domain specific architectures? Really? What we're going to try to do is make a breakthrough in how efficient we build the hardware and buy domain specific when referring to a class of processors, which do a range of applications do not like for example
the thing else but I think of a set of processors which do a range of applications that are related to a particular application domain they're programmable there. They're useful in that domain. They take advantage of specific knowledge about that domain when they run so they can run much more efficiently obvious examples doing things for neural network processors doing things that focus on machine learning. One example gpus are another example of this kind of thinking right programmable in the
context of doing Graphics processing. So for any of you have ever seen the any of the books that Dave Patterson, I wrote you know that we like quantitative approaches to understand things and we like to analyze why things work. So the key about the main specific architectures is there is no black magic here going to a more limited range of architectures doesn't automatically make things faster. We have to make some specific architectural changes that win and there are three big ones. The first is we make more effective use of parallelism we go from a
multiple instruction multiple data world that you'd see on a multi-core today to a single instruction multiple data. So instead of and they're going to a whole set of functional units. It's much more efficient. It's what do I give up? I give up some flexibility. When I do that. I absolutely give up flexibility, but the efficiency gained is dramatic. I go from speculative out of order machines what a typical high-end processor from arm or Intel process. It looks like today to something that's more like a vliw that uses a set of operations where the compiler has decided
that a set of operations can occur in parallel. So I shift work from run time to compile again. It's less flexible, but for applications when it works, it's much more efficient. I move away from Cassius. So cancers are one of the great inventions of computer science. One of the truly Great Inventions. The problem is when there's low spatial and low temporal locality cash is done. We don't work actually slow programs down. They slow them down. So we move away from that to use your control local memories. What's the trade-off now? Somebody has to figure out how to map
their application into a user control memory structure of things myself. And then finally I focus on only the amount of accuracy. I need I moved from IEEE to the lower Precision floating point or from 32 and 64-bit ensures the 8-Bit and 16-bit integer integer operations eight eight bit operations in the same amount of time that I can do one 64-bit operation Soca. literally faster but to go along with that. I also need a domain specific language. I need a language. It'll match up to that
Hardware configuration. We're not going to be able to take code written in python or c for example and extract the kind of information. We need to match with the name specific architecture. We've got to rethink how we program these machines and that's going to be high level operations is going to be vector vector multiply or vector Matrix multiply or sparse Matrix organization so that I get that high-level information that I need and I can compile it down into the architecture. The key in doing these domain-specific languages will be to retain enough machine
Independence that I don't have to recode things that a compiler can come along take a domain specific language map it to maybe at one architecture that's running in the cloud. Maybe another architecture that's running on my smartphone. That's going to be the challenge things. I do is like tensorflow and opengl or step in this direction, but it's really a new space. We're just beginning to understand it and understand how to design in the space. You know, I've been I built my first computer almost 50 years ago, believe it or not.
I've seen a lot of Revolutions in an in this incredible IT industry since then the creation of the internet the creation of the World Wide Web the magic of the microprocessor a smartphones a personal computers. But the one I think that is really going to change our lives is the Breakthrough in machine learning and artificial intelligence. This is the technology which is people that worked on for 50 years. And finally finally we made the Breakthrough and the basis of that breakthrough we needed about a million times more computational power than we thought we
needed to make the technology park but we finally got to the point where we could apply that kind of computer power. This is some data that Jeff Dean and Dave Patterson and Cliff young collected that shows there's one thing growing just as fast as Moore's Law the number of papers being published in machine learning. It is a revolution. It's going to change our world and I am sure some of you saw the duplex demo the other day in the domain of making appointments. It passes the Turing test in that domain which is an extraordinary break in the general terms,
but it passes that in a limited domain and that's really an indication of what's coming. So, how do you think about building a domain specific architecture to do to do deep neural networks? Well, this is a picture of what's inside a tensor Processing Unit the point I want to make about this is if you look at this what uses up the Silicon area Notice it's not used for a lot of control. It's not use for a lot of cashing. It's used to do things that are directly relevant to the computation. So this processor can do 256 x 256 that is
64000 X acumulativo 8-bit multivitamin every single clock. Every single clock so we can really crunch through for inference things enormous amounts of computational capability. You're not going to run general-purpose C code on this you're going to run something. That's a neural network inference problem. And if you look at the performance and you look at he right we've shown performance-per-watt again energy being the key limitation whether it's for your cell phone and you're doing some
kind of machine learning on your cell phone or it's in the cloud energy is the key limitation. So what we plotted here's the performance-per-watt and you see that the first generation tensor processing unit gets roughly more than 30 times the performance per watt compared to a general-purpose processor from floating point to lower density integer, which is much faster. So again this notion of tailoring the architecture to the specific domain becomes really crucial.
So this is a new era? Some sense. It's a return to the past in the early days of computing as computers were just being developed. We also had teams of people working together with people who are early applications experts working with people who are doing the beginning of the software environment building the first compilers in the first software environment and people doing the architecture and they're working as a vertical team. That kind of integration where we get a design team that understands how to go from application to representation and some domain specific language to
architecture and can think about how to rebuild machines in new ways to get this. It's an enormous opportunity and it's a new kind of challenge for the industry to go forward but I think there are enough interesting application domains like this where we can get incredible performance advantages buy tailoring our machines and do Egg and I think if we can do that maybe will free up some time to worry about another small sir problem namely cybersecurity and whether or not the hardware designers can finally helped the software design is to improve the security of our system and
that would be a great problem to focus on Thanks. Can you talk about some of the advances and Quantum and neomorphic Computing? So we've got a build a bridge from where we are today to post silicon the possibilities there are a couple I mean, there's organic. There's Quantum there's carbon nanofiber. There's a few different possibilities out there. I characterize them as technology that is Future. The reason is two people working on them are still fits this they're not computer scientist yet
or electrical engineers there physicists. So they're still in the lab on the other hand want them. Know if it works the computational power from a reasonably modest-sized Cubit. Let's say 128 corrected cubits 128 corrected cubits. Meaning they're accurate that made that might take you a thousand cubits to get to that level of accuracy, but the competition power for things that make sense protein folding photography of 128 cubed is phenomenal so we can get an enormous jumpforward there. We need something post-silicon.
We need something post silicon we've got maybe you know, what's Moore's Law slows down maybe another decade or so before it comes to a real Halt and we've got to get it an alternative technology out there because I think there's lots of creative software to be written that wants to run faster machine. I just at the end of your presentation briefly mentioned how we could start using Hardware to increase security. Would you mind elaborating on that might be with security? Everybody knows about meltdown and Spectre. The first thing about
meltdown and Spectre is to understand what happened is an attack that basically undermined architecture in a way that we never anticipated. I worked on out of order machines in the mid-1990s. That's how long that bug has been in those machines since the 1990s and we didn't even realize it we didn't even realize it and the reason is that basically what happens is our definition of architecture was there's an instruction set program to run. I don't tell you how fast they run. All I tell you is what the right answer is.
Side-channel attacks that use performance to leak information basically go around or definition of architecture. So we need to rethink about architecture in 1960s and 1970s. There was a lot of thought about how to do a better job of protection rings and domains and capabilities. They all got dropped and they got dropped because two things first of all, we became convinced the people we're going to verify their software and it was always going to be perfect. Well, the problem is that the amount of stuff we were right is far bigger than the amount of software we ever verify
so that's not going to help. I think it's time for Architects to begin to think about. How can they help software people build systems which are more secure. What's the right architecture support to make more secure systems. How do we build how do we make sure they get used effectively and how do we together Architects and software boarding together create a more secure environment, and I think it's going to be Thinking back about some of those old ideas and bring them back in some cases. After I took my processor architecture class was used your book. I hope it didn't hurt
you. I had a real appreciation for the Simplicity of a risk system. It seems like we've gone towards more complexity with demand specific languages and things is that just because of her performance or is it has your philosophy change? What do you think? Actually think they're not necessarily more complicated. They have all natural range of a of applicability, but they're not more complicated in the sense that they are a better match for what the application is the key thing to understand about risk the key
Insight was we weren't targeting people riding Assembly Language anymore. That was the old way of doing things right in the 1980s. The move was on units was the first operating system ever written in a high-level language, the first-ever the move was on from Assembly Language to high-level languages and what you needed at Target was the compiler output. So it's the same thing here, you're targeting the output of the domain specific language that works well for range of domain and you design the architecture to match that environment make it as simple as possible, but no simpler.
With the domain specific architectures examples of what might be the most promising areas for future to make specific architectures. Obvious. One is are things related to machine learning. I'm in their computational extremely intensive both training as well as inference a so, that's one big field virtual reality virtual reality augmented reality environment. If we really want to construct a high-quality environment that's augmented reality. We're going to need an enormous amounts of computational power but yet it's well-structured tons of computations
that could match to those kinds of application. We're not going to do everything with domain specific architectures. They're going to give us a lift on some of the more computationally intensive problem general purpose because the general purpose machines are going to drive these domain-specific machines. Do everything for us? So we're going to have to figure out ways to go forward on that front as well. How do you think about us some emerging memory technology? How did it impact the future computer architecture? Thank
you. Cuz I think some of the more Innovative memory Technologies are beginning to appear so-called phase-change Technologies, which I had the advantage that they can probably scare better than the RAM and probably even better than flash Technologies. They have the advantage that life times are better to than Flash from The Flash is it wears out some of these Phase 10 memories or memory's star technologies have the ability to scale longer and what you'll get is probably not a replacement for d'ram. You'll probably get a
replacement for flasher replacement for discs and I think that technology is coming very fast and it'll it'll change the way we think about memory hierarchy sand and I owe hierarchy because you'll have a device that's not quite as fast. Deram but a lot faster than the other Alternatives and that will change the way we want to build machines. As a person you think about education quite often. We all saw Zuckerberg getting having a conversation with Congress and I'm
excited to see children getting general education around Computing and and coding which is something that a lot of us didn't have the opportunity I have where do you see education? Not only for K-12 grad post-grad etcetera, but also existing people in policy-making decisions Setter. Nobody has one job for a lifetime anymore. They change what they're doing and education becomes constant. I mean you think about the stuff you learned as an undergrad and you think how far how much technology is already changed,
right? So we have to do more there. I think we also have to make more Technology Society needs to be more technology-savvy Computing is changing every single part of the world. We live in to not have some understanding into that technology. I think limits your ability to lead an organization to make important decisions. So we're going to have to educate our young people at the beginning and we're going to have to make an investment in education. So that is people's careers change over their lifetime are they can go back and engage and education not necessarily going
back to college. It's going to have to be online and some way but it's going to have to be engaging it's going to have to be something that really works. Well for people Hi only freak, baby safe. Just wondered what your view was on the amount of energy being used Bitcoin mining another cryptocurrency. Yeah, sure. I could build a special purpose architecture to mine Bitcoins. That's another obvious example of a domain specific architecture for sure. So I'm a long-term believer in cryptocurrency
as as an important part of our space in what we're going to have to do is figure out how to make it work how to make it work efficiently how to make it work seamlessly how to make it work inexpensively. I think those are all problems that can be conquered and I think you'll see a bunch of people that have both the algorithmic half and the ability to rethink how we do that and really make cryptocurrencies go quite quick and then we can also build machines which accelerate that even further so that we can make Trent cryptocurrency transaction should be
faster than a cash transaction and certainly no slower than a credit card transaction. We're not there yet, but we could get there we can get there with enough work and I think that's where we are to be moving to. What do you think about the future operating system has to have to covid is a really crucial, you know way back when in the in the 1980s. We thought we were going to solve all our operating system problems by going to kernel-based operating systems. And the colonel would be this
really small little thing that just hit the core functions of protection and memory management and then everything else around it would be protected basically and what happened was Colonel start at really small and then I got bigger and then I got bigger in the neck to make it performance performance efficient same thing happen with hypervisors. They started really small in the very beginning and then they got bigger. We're going to have to figure out how we structure Complex
operating systems so that they can deal with the protection issues. They can deal with efficiency issues. They can work. Well we should be building operating systems which from the beginning realize that they're going to run on large numbers of processors and organize them in such a way that they can do that efficiently because that's the future. We're going to have to rely on that. Then your intro video you mention this Chasm between Concepts and practice and also in your talk. You mentioned that Hardware is vital to the future of computing given that most investors are
very Hardware averse especially this day and a where do you expect that money to come from? Is that something that will come from governments or private investing of how are we going to fund? The future of computing is really what my question is investments in water these Technologies from Quantum to other things. I think government remains a player. So government you look at how many of the Innovations were used to the internet risk the rise of being a side modern computer aided design schools all had funding basically coming from the government at some point. So I think the
government should still remain a player and thinking about What's the one area the government is probably funded longer than anybody else artificial intelligence. They funded it for 50 years before we really saw the Breakthrough that came right? So they're big believers. They should be funding things long term. They should fun things that are out over the horizon that we don't yet really understand what they're practical implication may be playing a big role and we going to have to make universities work well with industry cuz they complement one another right they do two different kinds
of things, but they're complimentary and if we can get them to work well, then we can have the best of both worlds. You talked a little bit about the difference between the memory hierarchy and storage that is coming up with these new memory technologies. Have you seen any applications where the computer and the store and get two minds kind of more like the brain? towards that direction where you the software takes care of the difference between what is in storage and storage quote-unquote, right?
Cuz it may actually be flash or some kind of next-generation memory technology. What Cindy Ram what you need to tell me is what's volatile. And when do I have to ensure that a particular operation is committed to non-volatile storage on but if you know that, you know, we got log base filesystems. You got other ideas which move in the direction of trying to take advantage of a much greatly different memory hard. He greatly different storage hierarchy, then we're used to and we may want to continue to move in that direction. Not particularly when you begin to
think about you think about things like networking or Ayo, and they become major bottlenecks in applications, which they often do then rethinking. How could we could do those efficiently and optimize the hard way but also the software is the minute you stick an operating. Transaction in they are you better than a lot of weight for what it cost to get to that storage facility. So if we can make that work better and make it more transparent without giving up protection without giving up a guarantee that once something is written to a certain storage unit is permanently
recorded then I think we can make much faster systems. So you see the implementation of a domain specific architecture being implemented as heterotype, or do you see it off. I off chip type implementations are both. I think it's the time of great change for the rise of fpgas. For example, give you the opportunity to implement these machines try them out Implement them nfpga before you committed to design a custom silicon chip unleash it on the world. Try it out see how it works see how the applications map to it and then perhaps
decide whether or not you want to free the architecture. You may just want to build another Next Generation fpga. See lots of different implementation approach has the one thing we have to do. And you know, there was a big breakthrough and how hard it was designed ships that occurred from about the mid-eighties to about 1995 or 2018 some kind of ground to a halt since then we haven't had another big we need a big breakthrough because we're going to need many more people designing processors targeting particular application demand and that's going to mean we need to make it much easier and much
cheaper to design a processor. I'm wondering as a deep learning engineer for private Enterprise. What is my role in pushing forward the essay who really understand the application space and that's really critical in this is a change. I mean if you think about if you think about how much Architects and computer design Hardware design is that the think about the applications they haven't had to think about them all of a sudden they're going to have to develop a bunch of new friends that they can interact with
and talk to and colleagues they can work with to really get the insides they need in order to push forward the technology and that's going to happen that's going to be a big change for us. But I think it's beginning to talk to people who software domain experts or talk to Hardware people. That's terrific thing. You mentioned the performance enhancements of domain specific language has over like python Princeton's but they're all so much harder to use. So do you think software engineering Thailand can keep up in the future? I think the challenge will
be the game. We've got them suffer productivity in the last twenty or thirty years is absolutely stunning. It is absolutely stunning. I mean a programmer now can probably write 10 to a hundred times more code than they could 30 years ago in terms of functionality cuz that's incredible applications. We have what we need to do is figure out all of a sudden we need to we need a new generation of compiler people to think about how do we make those run efficiently? And by the way, if the Gap is a factor of 25 between C and
python, for example, if you get only half that that's a factor of 12 times faster any compiler writer the computer code that runs 12 times faster. Is the hero in my book so we have to just think about new ways to approach the problem and the opportunity is tremendous. Are there any opportunities still left in the x86 as far as like lifting the complexity of disa into software and exposing more microarchitecture to the compiler tough? I mean, I think the Intel people have spent more time implementing x86 is than anybody's ever spent
implementing one is a one instruction set. Dave mind out almost all the performance. And in fact, if you look at the tweaks that occur for example, they do aggressive prefetching in the i7, but you look at what happens with prefetching some programs actually slow down. Now, I'm balanced they get a little bit of speed up from it, but they actually slow down all the programs and the problem right now is it's very hard to turn that dial in such a way that we don't get overwhelmed with negative things and I see my producer telling me it's the end of the session.
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.