Intel’s Xe-HP ‘Arctic Sound’ GPU Pictured: Up to 960 EUs & 32GB HBM2E


Intel’s datacenter GPUs based on the Xe-HP architecture are due for release in the coming months, so it isn’t surprising to see the initial leaks emerge. On Wednesday, Igor’s Lab published images of Intel’s codenamed Arctic Sound compute cards based on the Xe-HP GPU and disclosed some of their preliminary specifications.

(Image credit: Igor’s Lab)

The single-tile Intel Arctic Sound 1T features an Xe-HP GPU with 384 EUs as well as 16GB of HBM2E memory offering a peak bandwidth of up to 716 GB/s (which probably means that we are dealing with two stacks of HBM2E that use a 2048-bit interface). The accelerator is a short single-slot full-height card that is rated for a 150W TDP.  

Intel’s Arctic Sound 2T card carries an Xe-HP GPU with two tiles, 960 EUs (480×2 to be more accurate), and 32GB of HBM2E DRAM. The accelerator uses a full-length full-height (FLFH) form factor and is rated for a 300W TDP (which is delivered using one eight-pin power connector). (One thing to keep in mind is that IgorsLab edited the images of the cards to protect the source.)

(Image credit: Intel)

Intel’s Xe-HP architecture is a far cry from the company’s Xe-LP architecture we know from the Iris Xe consumer-grade GPUs. The Xe-HP card supports more floating-point formats (e.g., FP16, FP32, FP64 for general purpose, bfloat16 format for AI/ML computing), more compute-specific instructions, DP4A convolution instruction for deep learning, and Intel’s XMX extensions.

The datacenter-oriented Xe-HP GPUs use all-new execution units (EUs) with various IPC improvements, feature HBM2E memory support, and are made using Intel’s performance-optimized 10nm SuperFin process technology. In short, the Xe-HP is not the Xe-LP or Xe-HPG on steroids, but something completely different. 

(Image credit: Intel)

Intel now allows some of its customers to preview its Arctic Sound compute cards carrying single-tile and dual-tile Xe-HP implementations. Although Intel announced a quad-tile Xe-HP implementation last year and even demonstrated one of such accelerators in action offering over 42 FP32 TFLOPS of performance, the company is either not ready to sample it right now, or it is sampling it with select customers only.  



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *