AI and Thermal Cooling

May 05, 2023

AI applications accelerate the evolution of data centers towards high density. Faced with the explosive growth in data and computing brought about by AI, and with the increasing scarcity of data center resources, especially in first tier cities, only by improving the computing power, storage, and transmission capabilities per unit area of the computer room can the value of the data center be maximized. The introduction of high computing AI chips will accelerate the evolution trend of server high power density.

AI thermal cooling SINK

As ChatGPT ignites a new round of enthusiasm for artificial intelligence applications, domestic and foreign data centers and cloud business manufacturers have begun to promote the construction of AI infrastructure, and the proportion of AI server shipments in all servers is gradually increasing. According to TrendForce data, the annual shipment volume of AI servers equipped with GPGPUs accounted for nearly 1% of all servers in 2022. In 2023, with the support of artificial intelligence applications such as ChatGPT, the shipment volume of AI servers is expected to increase by 8% year-on-year. From 2022 to 2026, the shipment volume CAGR is expected to reach 10.8%. GPUs are mainly used for AI servers, mainly Nvidia H100, A100, A800 (mainly shipped to China), as well as AMD MI250 and MI250X series, The proportion of NVIDIA and AMD is approximately 8:2.

AI computing thermal sink

The effect of fan speed exceeding 4000r/min on thermal resistance is limited. According to CNKI, in an air-cooled system, the fan speed increases from 1000r/min to 4000r/min, and convection dominates chip heat dissipation. With an increase in flow rate, the convective heat transfer coefficient significantly increases. Air cooling can effectively improve chip heat dissipation issues. When the fan speed exceeds 4000r/min, the decrease in heat transfer resistance is relatively gentle, and an increase in speed can only improve the heat transfer with air, resulting in a decrease in heat dissipation effect. Chip level liquid cooling is the future development trend. Under a server space of 2U, 250W is approximately the limit for air cooling and heat dissipation; Above 4U air cooling can reach 400-600W; The TDP of AI chips generally exceeds 400W, mostly using 4-8U. Traditional air-cooled heat dissipation has reached its limit. The control of chip temperature is particularly important for stable and continuous operation, with a maximum temperature not exceeding 85 ℃. Excessive temperature can cause chip damage. Within 70-80 ℃, every 10 ℃ increase in temperature of a single electronic component reduces system reliability by 50%. Therefore, in the context of increased power, the cooling system will be upgraded to chip level liquid cooling.

air cooling heatsink module

Compared to air cooling, liquid cooling can not only meet the heat dissipation requirements of high-power density cabinets, but also achieve lower PUE and higher power output (GUE). Compared to traditional air cooling, the PUE of cold plate liquid cooling is generally 1.1x, with a GUE of over 75%, while the PUE of immersion liquid cooling can be as low as 1.0x, with a GUE of over 80%. Simultaneously using liquid cooling technology can remove some or even all of the IT equipment fans (usually the fan energy consumption is also calculated in the server equipment energy consumption). For immersed liquid cooling, removing the server fan can reduce the server energy consumption by about 4% -15%.

immersion cooling liquid

The current maturity of cold plate liquid cooling technology is relatively high, and it is mainstream in the liquid cooling technology route. Assuming that the current proportion is 80%. In the future, with the maturity of immersion liquid cooling technology, the overall proportion is expected to gradually increase. Based on comprehensive calculations, AI large model training and inference will bring a liquid cooling market space of 4 billion RMB . With the increase of model parameters and promotion of use, the liquid cooling market will experience a compound annual growth rate of 60% over the next four years.

liquild cooling plate-3

We believe that the AI big model is expected to lead the upgrading of computing power demand, drive the construction of high power density intelligent computing and supercomputing centers, accelerate the introduction of supporting facilities such as liquid cooling systems into the market, and in the future, with the construction of new data centers and the transformation of existing data centers, the overall penetration rate is expected to rapidly increase. At present, the liquid cooling industry is still in its early stages of development, and it is optimistic about manufacturers with a leading layout in technology and production capacity.

Previous: AIGC Accelerates the Explosion of Chip Level Liquid Cooling Market

Next: Why liquid cooling is more and more used in server

Knowledge

AI and Thermal Cooling