Cooling Needs to Start at the Chip Level


Reading time ( words)

By Paul Magill, Ph.D., Nextreme Thermal Solutions

The discussion taking place in the U.S. today concerns the drive towards green or energy-efficient computing and IT. Energy consumption is rampant in the U.S., and theories vary on curtailing its impact. A large portion of this energy consumption is driven by the requirement for human beings to work in proximity to servers. Electronics thermal management can cool electronics from the first level, exponentially improving operating temperatures throughout the server farm.

At first glance, it makes sense to focus on building more energy-efficient computing structures, but skeptics would argue that making them more energy-efficient and therefore more cost-efficient (in terms of a total cost of ownership model) only drives their increased usage. From an EPA study published in 2007:

  • •Data centers consumed about 60 billion kilowatt-hours (kWh) in 2006, roughly 1.5% of total U.S. electricity consumption. •The energy consumption of servers and data centers has doubled in the past five years and is expected to almost double again in the next five years to more than 100 billion kWh, costing about $7.4 billion annually. •Federal servers and data centers alone account for approximately 6 billion kWh (10%) of this electricity use, at a total electricity cost of about $450 million per year. •Existing technologies and strategies could reduce typical server energy use by an estimated 25%, with even greater energy savings possible with advanced technologies.

Since the release of this study, it has been reported by the Uptime Institute that energy consumption in the top third of sites grew from 20% to 30% annually in 2006 and 2007, exceeding the EPA's predicted 9% growth for the period 2006 to 2010.

Energy costs for data centers revolve around two poles: the energy to perform a calculation and the energy to maintain a functioning work environment. The focus of most of the discussion in the literature is around the first pole and the need for more systems that offer ever-more-efficient devices for calculations or searches. However, according to publically released numbers, anywhere from one-third to half the energy consumed in data centers goes into cooling the building (maintaining a viable human work environment). While it is possible to build more efficient air-conditioning systems, we might be better served to look at why we have to pump so much heat out of the building.

Electronics Thermal Management Thermal management has long been the red-headed stepchild of the electronics industry. With data centers consuming an ever-greater share of the power produced and consumed in the U.S., we can no longer afford to let thermal management take a back seat. A large portion of this energy consumption is driven by the requirement for human beings to work in proximity to the servers.

In traditional thermal management solutions, the entire chip is cooled using a heatsink and fan, and a fan is then used to blow the waste heat into the next level system environment. This leads to a cascading effect of throwing more heat than is necessary to the next level, where the maximum allowable temperature is always lower. Finally, we reach the building level and have to cool the entire structure the most expensive to manage.A more cost-effective and efficient approach would be to cool only what is necessary and then manage the removal of this heat in a more controlled fashion.

Passive and Active Thermal ManagementThermal management solutions for electronics may be divided into two types of systems: active and passive. With the exception of fans, which are usually combined with heatsinks, most of today's thermal management systems are passive types. These systems have served the industry well and generally are cheap to apply. Conduction-based thermal management systems, such as thermal interface materials (TIMs) used to improve the flow of heat from one location to another, greatly enhance the efficiency of the overall thermal management system. However, convection-based systems have the drawback of allowing heat to flow in an uncontrolled manner from one level to the next.

One aspect of this convective heat flow (as it is usually applied) is that as the heat flows from its generation point (usually the IC die) to the next level; each succeeding level that the heat passes through tends to be increasingly sensitive to the maximum allowable temperature. For instance, while die may be able to operate with temperatures nominally around 85°C, with peak temperatures reaching 100°C or slightly higher, most systems have temperature maximums of around 40°70°C; any environment where human beings would work on a regular basis typically has temperature maximums of around 25°C.

One solution is localized thermal management solutions deep inside electronic components using thin-film thermoelectric (eTEC) structures known as thermal bumps. The thermal bump is made from a thin-film thermally active material that is embedded into flip-chip interconnects (in particular copper pillar solder bumps) for use in electronics packaging.

To reduce the cooling needed at the building level, we need to reduce the amount of heat extracted from the die. Die are typically cooled to keep their operating frequency near its peak. However, this peak frequency is usually limited by the temperature within one of the hot spots on the die.

An example of hot spots and localized heat extraction is shown in Figure 2. The left image shows the thermal characteristics of a Merom chip when one core is operating. The center image shows two cores operating. In either case, there are heat issues that emanate across the chip. In the right image, thin-film TECs have been fabricated over top of the likely hot spot locations. As a result, we are able to extract the heat from the hot spots instead of extracting the heat from the die as a whole. In this case we have a much smaller system-level problem to manage, and subsequently, a much smaller thermal management problem at the data center level.ConclusionPower consumption in data centers is large and expected to increase. The primary drivers for power consumption within a data center are the servers providing the calculations and the need to cool the building due to the heat removed from the servers. Today, most of the effort in energy efficiency is focused on improving device efficiency, but a larger return may be had by focusing resources on the fundamental thermal management systems that bring unnecessary heat into the building and into close proximity with human beings. The primary issue being that the individuals working in these server farms are the most sensitive to temperature extremes in the hierarchy described above.

Paul Magill, Ph.D., VP of marketing and business development, Nextreme Thermal Solutions, can be reached at (919) 597-7300; info@nextreme.com.

Share

Print



Copyright © 2019 I-Connect007. All rights reserved.