My deep learning build — work in progress :).
This story provides a guide on how to build a multi-GPU system for deep learning and hopefully save you some research time and experimentation.
Target
Build a multi-GPU system for training of computer vision and LLMs models without breaking the bank! 🏦
Step 1. GPUs
Let’s start with the fun (and expensive 💸💸💸) part!
The H100 beast! Image from NVIDIA.
The main considerations when buying a GPU are:
memory (VRAM)performance (Tensor cores, clock speed)slot widthpower (TDP)
Memory
For deep learning tasks nowadays we need a loooot of memory. LLMs are huge even to fine-tune and computer vision tasks can get memory-intensive especially with 3D networks. Naturally the most important aspect to look for is the GPU VRAM. For LLMs I recommend at least 24 GB memory and for computer vision tasks I wouldn’t go below 12 GB.
Performance
The second criterion is performance which can be estimated with FLOPS (Floating-point Operations per Second):
The crucial number in the past was the number of CUDA cores in the circuit. However, with the emergence of deep learning, NVIDIA has introduced specialized tensor cores that can perform many more FMA (Fused Multiply-Add) operations per clock. These are already supported by the main deep learning frameworks and are what you should look for in 2023.
Below you can find a chart of raw performance of GPUs grouped by memory that I compiled after quite some manual work:
Raw performance of GPUs based on the CUDA and tensor cores (TFLOPs)