The page has been translated by Gen AI.

Using NVSwitch on GPU Server

After creating a GPU Server, you can enable the NVSwitch feature in the GPU Server’s VM (Guest OS) and quickly use P2P (GPU to GPU) communication between GPUs.

Exploring NVIDIA NVSwitch for Multi GPU

NVIDIA A100 GPU server is a multi-GPU based on the NVIDIA Ampere architecture, with 8 Ampere 80 GB GPUs installed on the baseboard. The GPUs installed on the baseboard are connected to 6 NVSwitches via NVLink ports. Communication between GPUs on the baseboard is done using the full 600 GBps bandwidth. For this reason, the 8 GPUs installed on the A100 GPU server can be connected and operated like one, thereby maximizing GPU-to-GPU usage.

NVLink(25 GBps) 12 Lane 8 GPU configuration

Figure. NVLink(25 GBps) 12 lanes 8 GPU configuration diagram

NVSwitch(600 GBps) 6 units 8 GPU configuration diagram

Figure. NVSwitch(600 GBps) 6 units 8 GPU configuration diagram

Create GPU NVSwitch

To use the GPU NVSwitch feature, create a GPU Server service on the Samsung Cloud Platform, create a VM Instance (GuestOS) with 8 A100 GPUs assigned, and activate the Fabricmanager.

주의

NVSwitch can only be activated and used for products with 8 A100 GPUs assigned to a single GPU server (g1v128a8 (vCPU 128 | Memory 1920G | A100(80GB)*8)).
Currently, GPU Server created with Windows OS does not support NVSwitch (Fabricmanager).

NVSwitch Installation and Operation Check (Fabric Manager Activation)

To operate NVSwitch, install Fabricmanager on the GPU Instance and follow the next procedure.

Install NVIDIA GPU Driver (470.52.02 Version) on the GPU server.

Color mode

$ add-apt-repository ppa:graphics-drivers/ppa
$ apt-get update
$ apt-get install nvidia-driver-470-server

$ add-apt-repository ppa:graphics-drivers/ppa
$ apt-get update
$ apt-get install nvidia-driver-470-server

Code Block. NVIDIA GPU Driver Installation

Install and run NVIDIA Fabric Manager (470 Version) on the GPU server (For NVSwitch).

Color mode

$ apt-get install cuda-drivers-fabricmanager-470
$ systemctl enable nvidia-fabricmanager
$ systemctl start nvidia-fabricmanager

$ apt-get install cuda-drivers-fabricmanager-470
$ systemctl enable nvidia-fabricmanager
$ systemctl start nvidia-fabricmanager

Code Block. NVIDIA Fabric Manager Installation and Operation

Check the status of NVIDIA Fabric Manager running on the GPU server.
- Normal operation indicates active (running)
  Color mode
  $ systemctl status nvidia-fabricmanager
  $ systemctl status nvidia-fabricmanager
  Code Block. Check NVIDIA Fabric Manager Operation Status

Figure. NVSwitch installation - Checking the operation status of Fabric Manager

Check the NVSwitch operation status on the GPU server.
- Normal operation indicates NV12
  Color mode
  $ nvidia-smi topo --matrix
  $ nvidia-smi topo --matrix
  Code block. NVSwitch operation status check

Figure. NVSwitch Installation - Checking NVSwitch Operation Status

Using Multi-instance GPU in GPU Server

Keypair Management