Analysis of liquid cooling and heat dissipation technology in AI data centers

   Generative AI and various large models bring us a brand new application experience, and also pose higher demands on computing power. For data center operation managers, due to the significant increase in power density of GPU servers, higher requirements are placed on the refrigeration equipment and technology of data centers. Therefore, in addition to focusing on the computing power itself, they also pay more attention to the various issues brought about by data center power consumption and cooling. 

AI computing

    Driven by the strong demand for AI computing power, the number of GPU servers in data centers has significantly increased, leading to increasingly prominent power consumption issues. We know that the maximum total power of air-cooled single cabinet in data centers is 15kW. With the same rack up rate, the power growth brought by GPU servers has approached the limit of single cabinet. However, the power consumption of GPUs is still constantly increasing. In the face of high power consumption and high density scenarios, traditional air cooling is clearly unable to meet energy consumption and heat dissipation needs. Liquid cooling technology, with its ultra-high energy efficiency and ultra-high heat density, has become a necessary option for temperature control solutions in intelligent computing centers.

AI liquid cooling

    In traditional air-cooled data centers, the energy consumption for equipment cooling and heat dissipation is as high as 40%, and the heat dissipation efficiency is not high. Due to its limitations, conventional air cooling in data centers is generally designed with a single cabinet density of 8-10kW. Due to the thermal conductivity of liquid cooling technology being 25 times that of air and carrying nearly 3000 times more heat than the same volume of air, it can easily achieve a single cabinet density of over 30kW. Therefore, it can save a lot of space, further improve the deployment density of cabinets in a single data center, and improve the utilization rate of data center unit area.
   However, there is currently no unified technology and construction standard in the liquid cooling industry, and compared to traditional air-cooled cooling, the construction cost of liquid cooling data centers is still too high. The rapid development of liquid cooling technology and the lack of unified technology and construction standards have brought significant challenges to later management and maintenance.

data center

   At present, the main liquid cooling technologies include indirect liquid cooling technology represented by cold plate liquid cooling system and direct liquid cooling technology represented by immersion liquid cooling system. Due to the differences in heat dissipation design between the two, there are also significant differences in heat dissipation efficiency.

    Indirect heat dissipation technology is achieved by contacting the surfaces of CPUs, memory, GPUs, hard drives, and other media such as cold plates, utilizing the flow of coolant to carry away heat. In addition to cold plates and other media, indirect liquid cooling technology also includes components such as heat exchangers, pipelines, pumps, coolant, and control systems. The cold plate liquid cooling system has become the main solution for indirect liquid cooling technology. The main advantages of indirect liquid cooling technology are that it does not require changing the form of existing servers, has low design technical difficulty, relatively low deployment difficulty, and relatively low difficulty in later operation and maintenance management. In addition, due to the use of ethylene glycol aqueous solution as the cooling medium, the cost is lower.
    The disadvantage is that the heat dissipation efficiency is relatively low, and due to the large number of components, the failure rate is relatively higher. At present, the cold plate liquid cooling system has become the preferred solution for most data centers.

server liquid cold plate

    Direct liquid cooling technology refers to the direct contact between CPU, GPU, motherboard, memory, etc., and coolant, which directly flows through the hardware surface to absorb and take away heat. At present, direct liquid cooling technology includes immersion liquid cooling systems and spray liquid cooling systems. Depending on whether the cooling medium undergoes phase change, it can be divided into single-phase immersion and phase change immersion.
   Compared with indirect heat dissipation technology, direct liquid cooling technology has no intermediate conductive medium between the liquid and the heat source, and heat can be transferred more directly to the liquid, resulting in higher heat dissipation efficiency. However, direct liquid cooling technology is more difficult and costly to deploy due to the need to redesign and transform the entire data center. At present, direct liquid cooling technology is mainly used in scenarios that require high heat dissipation efficiency.

data center immersion liquid cooling

   At present, as the cold plate liquid cooling system becomes more mature, it will become the mainstream liquid cooling technology that enters data centers first. The cost, operation and maintenance, safety and other issues that affect the popularization of cold plate liquid cooling technology will also be solved with the development of technology and the standardization.
  With the continuous development of technology, immersed liquid cooling systems will also be widely used in high-density new data centers, further improving the heat dissipation efficiency of data centers and significantly enhancing computing power levels.

You Might Also Like

Send Inquiry