Reading time ( words)
Dale McHerron is senior manager at IBM Research and oversees heterogenous integration research. In his presentation, Dale spoke about the need to architect the next generation of AI systems, and what it means to the infrastructure to run AI systems at this high level of technology.
Nolan Johnson: Dale, you just finished your presentation and I’d love to get a quick summary.
Dale McHerron: Sure. My focus is on high performance computing, with a special interest in AI and AI workloads, and how we need to architect next generation systems to support what we see happening in AI. I talked about where we are today from a traditional IBM high performance computing standpoint, and some of the aspects of our Z Systems mainframe. Then I started talking about the emerging confluence of the need for much more compute power, while simultaneously silicon scaling is starting to slow down. There is the need for architecting for new workloads, which drives a huge amount of memory. How do you bring all that together? That all has an impact on what we will need going forward from the packaging.
Johnson: You had an interesting statistic that AI compute power requirements are doubling every three and a half months. Can you talk about that?
McHerron: The algorithms are getting so sophisticated. As you can see, AI is raising some eyebrows. People are really doing things with it. Those algorithms are getting so complex, and so computationally complex in terms of training, which is much more complex than inferencing. Training is how you train the model; inferencing is actually using it in the real world. You look at the training requirements for these very sophisticated algorithms, and they need to double the amount of compute every three and a half months because they’re getting so complex. Our friends in the software and algorithm worlds are really making a lot of leaps and gains in terms of the complexity and the ability of these new algorithms.
Johnson: That just feels like a pace well more than what you’d expect from Moore’s Law. That’s driving the hardware even harder than we traditionally think—even at a time where some people are saying Moore’s Law is broken.
McHerron: It is. There’s been a lot in the industry lately about how much energy data centers use. I read that something like 3% of the world’s energy gets directed toward data centers because we’re not as efficient as we need to be for managing these new workloads.
Johnson: You talked about the constraints that are holding this computational development back—power among them. But what was interesting to me from a packaging point of view was that you also discussed the resurgence of a multichip module (MCM). The MCM was something we saw 20 to 25 years ago and then it faded away. What’s bringing it back?
McHerron: With scaling slowing, you’re not getting as many transistors on a piece of silicon with each new generation as we did in the past. As computation workloads increase, you need more and more transistors. The system architects will ask, “How can you get me more acres of silicon into my package to make this work?” We’re already at the limits of reticle sizes, as I mentioned, the next generation of lithography for silicon, which will probably be online for manufacturing before the end of the decade; the reticle size will drop in half. Now, to get the hardware needed to enable the compute intensity required, you must put more silicon in each volume. You can’t have it spread out across the board. It will drive your power requirements through the roof. So, the more you can bring these things together, the more silicon you can get in each volume. You’re not getting it from transistor scaling like we used to, so it must come down to the packaging.
Johnson: Compare the difference between a multichip module, what we’re seeing in chiplets, and the heterogeneous integration type packaging.
McHerron: In the past, MCMs were largely just putting more of the same chip into a package. The chiplet architectures, each one of these pieces of silicon, will now have its own function and own personality. You must bring the package to interconnect all these different functions in a very efficient way, so you get good performance and energy efficiency.
Johnson: Great. Thanks for the clarification on that, and thanks for your time.
This conversation originally appeared in the November 2022 issue of SMT007 Magazine.