GPU Server에서 Multi-instance GPU 사용하기
GPU Server를 생성한 후 GPU Server의 VM(Guest OS)에서 MIG (Multi-instance GPU) 기능을 활성화하고 Instance를 생성해 사용할 수 있습니다.
Multi-instance GPU (NVIDIA A100) 살펴보기
NVIDIA A100은 NVIDIA 암페어(Ampere) 아키텍처를 기반으로 하는 Multi-instance GPU(MIG)로, 최대 7개의 독립된 GPU Instance로 안전하게 분할되어 CUDA (Compute Unified Device Architecture, 연산통합 장치설계) Application을 운용할 수 있습니다. NVIDIA A100은 고대역폭 메모리(HBM: high bandwidth memory)와 캐시를 활용하는 동시에 GPU 사용에 최적화된 방식으로 컴퓨팅 자원을 할당함으로써 다수의 사용자들에게 독립적인 GPU 자원을 제공할 수 있습니다. 사용자는 각 워크로드의 병렬 실행을 통해 GPU 최대 연산 용량에 도달하지 않은 워크로드를 활용할 수 있으므로, GPU 사용율을 극대화할 수 있습니다.
Multi-instance GPU 기능 사용하기
Multi-instance GPU 기능을 사용하려면 Samsung Cloud Platform에서 GPU Server 서비스를 생성한 후 A100 GPU가 할당된 VM Instance(GuestOS)를 생성해야 합니다. GPU Server 생성 완료 후, 아래의 MIG 적용 순서와 MIG 해제 순서를 따라 적용해볼 수 있습니다.
- MIG 기능을 사용하기 위한 시스템 요구사항은 다음과 같습니다(NVIDIA - Supported GPUs 참고).
- CUDA toolkit 11, NVIDIA driver 450.80.02 또는 이후 버전
- CUDA toolkit 11을 지원하는 리눅스 배포 운영체제
- 컨테이너 또는 쿠버네티스 서비스 운용 시 MIG 기능을 사용하기 위한 요구사항은 다음과 같습니다.
- NVIDIA Container Toolkit(nvidia-docker2) v 2.5.0 또는 이후 버전
- NVIDIA K8s Device Plugin v 0.7.0 또는 이후 버전
- NVIDIA gpu-feature-discovery v 0.2.0 또는 이후 버전
MIG 적용 및 사용하기
MIG를 활성화하고 Instance를 생성해 작업을 할당하려면 다음 절차를 따르세요.
MIG 활성화
MIG를 적용하기 전 VM Instance(GuestOS)에서 GPU 상태를 확인하세요.
- MIG mode가 Disabled 상태인지 확인하세요.배경색 변경
$ nvidia-smi Mon Sep 27 08:37:08 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVDIA A100-SXM... Off | 00000000:05:00.0 Off | 0 | | N/A 32C P0 59W / 400W | 0MiB / 81251MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+$ nvidia-smi Mon Sep 27 08:37:08 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVDIA A100-SXM... Off | 00000000:05:00.0 Off | 0 | | N/A 32C P0 59W / 400W | 0MiB / 81251MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+코드블록. nvidia-smi 명령어 - GPU 비활성화 상태 확인 (1) 배경색 변경$ nvidia-smi –L GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)$ nvidia-smi –L GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)코드블록. nvidia-smi 명령어 - GPU 비활성화 상태 확인 (2)
- MIG mode가 Disabled 상태인지 확인하세요.
VM Instance(GuestOS)에서 GPU별로 MIG를 활성화(Enable)하고 VM Instance를 재부팅하세요.
배경색 변경$ nvidia-smi –I 0 –mig 1 Enabled MIG mode for GPU 00000000:05:00.0 All done. # reboot$ nvidia-smi –I 0 –mig 1 Enabled MIG mode for GPU 00000000:05:00.0 All done. # reboot코드블록. nvidia-smi 명령어 - MIG 활성화
GPU 모니터링 에이전트가 다음과 같은 경고 메시지를 표시하는 경우, MIG를 활성화하기 전에 nvsm 및 dcgm 서비스를 중단하세요.
Warning: MIG mode is in pending enable state for GPU 00000000:05:00.0: In use by another client. 00000000:05:00.0 is currently being used by one or more other processes (e.g. CUDA application or a monitoring application such as another instance of nvidia-smi).
# systemctl stop nvsm
# systemctl stop dcgm
- MIG 작업을 마친 후 nvsm 및 dcgm 서비스를 다시 시작하세요.
- VM Instance(GuestOS)에서 MIG를 적용한 후 GPU 상태를 확인하세요.
- MIG mode가 Enabled 상태인지 확인하세요.배경색 변경
$ nvidia-smi Mon Sep 27 09:44:33 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVDIA A100-SXM... Off | 00000000:05:00.0 Off | On | | N/A 32C P0 59W / 400W | 0MiB / 81251MiB | 0% Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +-----------------------------------------------------------------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |=============================================================================| | No MIG devices found | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+$ nvidia-smi Mon Sep 27 09:44:33 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVDIA A100-SXM... Off | 00000000:05:00.0 Off | On | | N/A 32C P0 59W / 400W | 0MiB / 81251MiB | 0% Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +-----------------------------------------------------------------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |=============================================================================| | No MIG devices found | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+코드블록. nvidia-smi 명령어 - GPU 활성화 상태 확인 (1) 배경색 변경$ nvidia-smi –L GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)$ nvidia-smi –L GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)코드블록. nvidia-smi 명령어 - GPU 활성화 상태 확인 (2)
- MIG mode가 Enabled 상태인지 확인하세요.
GPU Instance 생성
MIG 활성화하고 상태를 확인하였다면, GPU Instance를 생성할 수 있습니다.
생성할 수 있는 MIG GPU Instance 프로파일 목록을 확인하세요.
배경색 변경$ nvidia-smi mig -i [GPU ID] -lgip$ nvidia-smi mig -i [GPU ID] -lgip코드블록. nvidia-smi 명령어 - MIG GPU Instance 프로파일 목록 확인 배경색 변경$ nvidia-smi mig -i 0 -lgip +-----------------------------------------------------------------------------+ | GPU instance profiles: | | GPU Name ID Instances Memory P2P SM DEC ENC | | Free/Total GiB CE JPEG OFA | |=============================================================================| | 0 MIG 1g.10gb 19 7/7 9.50 No 14 0 0 | | 1 0 0 | +-----------------------------------------------------------------------------+ | 0 MIG 1g.10gb+me 20 1/1 9.50 No 14 0 0 | | 1 1 1 | +-----------------------------------------------------------------------------+ | 0 MIG 2g.20gb 14 3/3 19.50 No 28 1 0 | | 2 0 0 | +-----------------------------------------------------------------------------+ | 0 MIG 3g.40gb 9 2/2 39.50 No 42 2 0 | | 3 0 0 | +-----------------------------------------------------------------------------+ | 0 MIG 4g.40gb 5 1/1 39.50 No 56 2 0 | | 4 0 0 | +-----------------------------------------------------------------------------+ | 0 MIG 7g.80gb 0 1/1 79.25 No 98 0 0 | | 7 1 1 | +-----------------------------------------------------------------------------+$ nvidia-smi mig -i 0 -lgip +-----------------------------------------------------------------------------+ | GPU instance profiles: | | GPU Name ID Instances Memory P2P SM DEC ENC | | Free/Total GiB CE JPEG OFA | |=============================================================================| | 0 MIG 1g.10gb 19 7/7 9.50 No 14 0 0 | | 1 0 0 | +-----------------------------------------------------------------------------+ | 0 MIG 1g.10gb+me 20 1/1 9.50 No 14 0 0 | | 1 1 1 | +-----------------------------------------------------------------------------+ | 0 MIG 2g.20gb 14 3/3 19.50 No 28 1 0 | | 2 0 0 | +-----------------------------------------------------------------------------+ | 0 MIG 3g.40gb 9 2/2 39.50 No 42 2 0 | | 3 0 0 | +-----------------------------------------------------------------------------+ | 0 MIG 4g.40gb 5 1/1 39.50 No 56 2 0 | | 4 0 0 | +-----------------------------------------------------------------------------+ | 0 MIG 7g.80gb 0 1/1 79.25 No 98 0 0 | | 7 1 1 | +-----------------------------------------------------------------------------+코드블록. MIG GPU Instance 프로파일 목록
| Profile Name | Fraction of Memory | Fraction of SMs | Hardware Units | L2 Cache Size | Number of Instances Available |
|---|---|---|---|---|---|
| MIG 1g.10gb | 1/8 | 1/7 | 0 NVDECs /0 JPEG /0 OFA | 1/8 | 7 |
| MIG 1g.10gb+me | 1/8 | 1/7 | 1 NVDEC /1 JPEG /1 OFA | 1/8 | 1 (A single 1g profile can include media extensions) |
| MIG 2g.20gb | 2/8 | 2/7 | 1 NVDECs /0 JPEG /0 OFA | 2/8 | 3 |
| MIG 3g.40gb | 4/8 | 3/7 | 2 NVDECs /0 JPEG /0 OFA | 4/8 | 2 |
| MIG 4g.40gb | 4/8 | 4/7 | 2 NVDECs /0 JPEG /0 OFA | 4/8 | 1 |
| MIG 7g.80gb | Full | 7/7 | 5 NVDECs /1 JPEG /1 OFA | Full | 1 |
- MIG GPU Instance를 생성한 후 확인하세요.
GPU Instance 생성
배경색 변경$ nvidia-smi mig -i [GPU ID] -cgi [Profile ID]$ nvidia-smi mig -i [GPU ID] -cgi [Profile ID]코드블록. nvidia-smi 명령어 - GPU Instance 생성 배경색 변경$ nvidia-smi mig -i 0 -cgi 0 Successfully created GPU instance ID 0 on GPU 0 using profile MIG 7g.80gb (ID 0)$ nvidia-smi mig -i 0 -cgi 0 Successfully created GPU instance ID 0 on GPU 0 using profile MIG 7g.80gb (ID 0)코드블록. nvidia-smi 명령어 - GPU Instance 생성 예시 GPU Instance 확인
배경색 변경$ nvidia-smi mig -i [GPU ID] -lgi$ nvidia-smi mig -i [GPU ID] -lgi코드블록. nvidia-smi 명령어 - GPU Instance 확인 배경색 변경$ nvidia-smi mig -i 0 -lgi +--------------------------------------------------------+ | GPU instances: | | GPU Name Profile Instance Placement | | ID ID Start:Size | |========================================================| | 0 MIG 7g.80gb 0 0 0:8 | +--------------------------------------------------------+$ nvidia-smi mig -i 0 -lgi +--------------------------------------------------------+ | GPU instances: | | GPU Name Profile Instance Placement | | ID ID Start:Size | |========================================================| | 0 MIG 7g.80gb 0 0 0:8 | +--------------------------------------------------------+코드블록. nvidia-smi 명령어 - GPU Instance 확인 예시
Compute Instance 생성
GPU Instance를 생성하였다면, Compute Instance를 생성할 수 있습니다.
생성할 수 있는 MIG Compute Instance 프로파일을 확인하세요.
배경색 변경$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip코드블록. nvidia-smi 명령어 - MIG Compute Instance 프로파일 확인 배경색 변경$ nvidia-smi mig -i 0 -gi 0 -lcip +---------------------------------------------------------------------------------+ | Compute instance profiles: | | GPU GPU Name Profile Instances Exclusive Shared | | GPU Instance ID Free/Total SM DEC ENC OFA | | ID CE JPEG | |=================================================================================| | 0 0 MIG 1c.7g.80gb 0 7/7 14 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+ | 0 0 MIG 2c.7g.80gb 1 3/3 28 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+ | 0 0 MIG 3c.7g.80gb 2 2/2 42 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+ | 0 0 MIG 4c.7g.80gb 3 1/1 56 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+ | 0 0 MIG 7g.80gb 4* 1/1 98 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+$ nvidia-smi mig -i 0 -gi 0 -lcip +---------------------------------------------------------------------------------+ | Compute instance profiles: | | GPU GPU Name Profile Instances Exclusive Shared | | GPU Instance ID Free/Total SM DEC ENC OFA | | ID CE JPEG | |=================================================================================| | 0 0 MIG 1c.7g.80gb 0 7/7 14 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+ | 0 0 MIG 2c.7g.80gb 1 3/3 28 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+ | 0 0 MIG 3c.7g.80gb 2 2/2 42 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+ | 0 0 MIG 4c.7g.80gb 3 1/1 56 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+ | 0 0 MIG 7g.80gb 4* 1/1 98 5 0 1 | | 7 1 | +---------------------------------------------------------------------------------+코드블록. MIG Compute Instance 프로파일 목록 예시 MIG Compute Instance를 생성하고 확인하세요.
- MIG Compute Instance 생성배경색 변경
$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -cci [Compute Profile ID]$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -cci [Compute Profile ID]코드블록. nvidia-smi 명령어 - MIG Compute Instance 생성 배경색 변경$ nvidia-smi mig -i 0 -gi 0 -cci 4 Successfully created compute instance ID 0 on GPU instance ID 0 using profile MIG 7g.80gb(ID 4)$ nvidia-smi mig -i 0 -gi 0 -cci 4 Successfully created compute instance ID 0 on GPU instance ID 0 using profile MIG 7g.80gb(ID 4)코드블록. nvidia-smi 명령어 - MIG Compute Instance 생성 예시 - MIG Compute Instance 확인배경색 변경
$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –lci$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –lci코드블록. nvidia-smi 명령어 - MIG Compute Instance 확인 배경색 변경$ nvidia-smi mig -i 0 -gi 0 –lci +-----------------------------------------------------------------+ | Compute instance profiles: | | GPU GPU Name Profile Instances Placement | | GPU Instance ID ID Start:Size | | ID | |=================================================================| | 0 0 MIG 7g.80gb 4 0 0:7 | +-----------------------------------------------------------------+$ nvidia-smi mig -i 0 -gi 0 –lci +-----------------------------------------------------------------+ | Compute instance profiles: | | GPU GPU Name Profile Instances Placement | | GPU Instance ID ID Start:Size | | ID | |=================================================================| | 0 0 MIG 7g.80gb 4 0 0:7 | +-----------------------------------------------------------------+코드블록. MIG Compute Instance 확인 예시 배경색 변경$ nvidia-smi –L GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0) MIG 7g.80gb Device 0: (UUID: MIG-53e20040-758b-5ecb-948e-c626d03a9a32)$ nvidia-smi –L GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0) MIG 7g.80gb Device 0: (UUID: MIG-53e20040-758b-5ecb-948e-c626d03a9a32)코드블록. nvidia-smi 명령어 - GPU 상태 확인 (1) 배경색 변경$ nvidia-smi Mon Sep 27 09:52:17 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVDIA A100-SXM... Off | 00000000:05:00.0 Off | On | | N/A 32C P0 49W / 400W | 0MiB / 81251MiB | N/A Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +-----------------------------------------------------------------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |=============================================================================| | 0 0 0 0 | 0MiB / 81251MiB | 98 0 | 7 0 5 1 1 | | | 1MiB / 13107... | | | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+$ nvidia-smi Mon Sep 27 09:52:17 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVDIA A100-SXM... Off | 00000000:05:00.0 Off | On | | N/A 32C P0 49W / 400W | 0MiB / 81251MiB | N/A Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +-----------------------------------------------------------------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |=============================================================================| | 0 0 0 0 | 0MiB / 81251MiB | 98 0 | 7 0 5 1 1 | | | 1MiB / 13107... | | | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+코드블록. nvidia-smi 명령어 - GPU 상태 확인 (2)
- MIG Compute Instance 생성
MIG 사용
- MIG Instance를 사용해 Job을 수행하세요.
- 작업 수행 예시배경색 변경
$ docker run --gpus '"device=[GPU ID]:[MIG ID]"' -rm nvcr.io/nvidia/cuda nvidia-smi$ docker run --gpus '"device=[GPU ID]:[MIG ID]"' -rm nvcr.io/nvidia/cuda nvidia-smi코드블록. 작업 수행 예시 - 아래와 같이 작업을 수행한 예시를 확인해볼 수 있습니다.배경색 변경
$ docker run --gpus '"device=0:0"' -rm -it --network=host --shm-size=1g --ipc=host -v /root/.ssh/:/root/.ssh ================ == TensorFlow == ================ NVIDIA Release 21.08-tf1 (build 26012104) TensorFlow Version 1.15.5 Container image Copyright (c) 2021, NVIDIA CORPORATION. All right reserved. ... # Python 프로세스 실행 root@d622a93c9281:/workspace# python /workspace/nvidia-examples/cnn/resnet.py --num_iter 100 ... PY 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] TF 1.15.5 ...$ docker run --gpus '"device=0:0"' -rm -it --network=host --shm-size=1g --ipc=host -v /root/.ssh/:/root/.ssh ================ == TensorFlow == ================ NVIDIA Release 21.08-tf1 (build 26012104) TensorFlow Version 1.15.5 Container image Copyright (c) 2021, NVIDIA CORPORATION. All right reserved. ... # Python 프로세스 실행 root@d622a93c9281:/workspace# python /workspace/nvidia-examples/cnn/resnet.py --num_iter 100 ... PY 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] TF 1.15.5 ...코드블록. 작업 수행 결과
- 작업 수행 예시
- GPU 사용률을 확인하세요. (JOB 프로세스 생성)
- Job이 구동될 때 MIG 디바이스에 프로세스가 할당되고 사용률이 증가하는 것을 확인할 수 있습니다.배경색 변경
$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip코드블록. nvidia-smi 명령어 - GPU 사용률 확인 - 아래와 같이 GPU 사용률을 확인할 수 있습니다.배경색 변경
+-----------------------------------------------------------------------------+ | MIG devices: | +-----------------------------------------------------------------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |=============================================================================| | 0 0 0 0 | 66562MiB / 81251MiB | 98 0 | 7 0 5 1 1 | | | 5MiB / 13107... | | | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 0 0 17483 C python 66559MiB | +-----------------------------------------------------------------------------++-----------------------------------------------------------------------------+ | MIG devices: | +-----------------------------------------------------------------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |=============================================================================| | 0 0 0 0 | 66562MiB / 81251MiB | 98 0 | 7 0 5 1 1 | | | 5MiB / 13107... | | | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 0 0 17483 C python 66559MiB | +-----------------------------------------------------------------------------+코드블록. GPU 사용률 확인 예시
- Job이 구동될 때 MIG 디바이스에 프로세스가 할당되고 사용률이 증가하는 것을 확인할 수 있습니다.
MIG Instance 삭제 및 해제하기
MIG Instance를 삭제하고 MIG를 해제하려면 다음 절차를 따르세요.
Compute Instance 삭제
- Compute Instance를 삭제하세요.배경색 변경
$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dci $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -ci [Compute Instance] –dci$ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dci $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -ci [Compute Instance] –dci코드블록. nvidia-smi 명령어 - Compute Instance 삭제 배경색 변경$ nvidia-smi mig -i 0 -gi 0 –lci +-----------------------------------------------------------------+ | Compute instance profiles: | | GPU GPU Name Profile Instances Placement | | GPU Instance ID ID Start:Size | | ID | |=================================================================| | 0 0 MIG 7g.80gb 4 0 0:7 | +-----------------------------------------------------------------+$ nvidia-smi mig -i 0 -gi 0 –lci +-----------------------------------------------------------------+ | Compute instance profiles: | | GPU GPU Name Profile Instances Placement | | GPU Instance ID ID Start:Size | | ID | |=================================================================| | 0 0 MIG 7g.80gb 4 0 0:7 | +-----------------------------------------------------------------+코드블록. MIG Compute Instance 확인 예시 배경색 변경$ nvidia-smi mig -i 0 -gi 0 –dci Successfully destroyed compute instance ID 0 from GPU instance ID 0$ nvidia-smi mig -i 0 -gi 0 –dci Successfully destroyed compute instance ID 0 from GPU instance ID 0코드블록. Compute Instance 삭제 예시 배경색 변경$ nvidia-smi mig -i 0 -gi 0 –lci No compute instances found: Not found$ nvidia-smi mig -i 0 -gi 0 –lci No compute instances found: Not found코드블록. Compute Instance 삭제 확인
GPU Instance 삭제
- GPU Instance를 삭제하세요.배경색 변경
$ nvidia-smi mig -i [GPU ID] –dgi $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dgi$ nvidia-smi mig -i [GPU ID] –dgi $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dgi코드블록. nvidia-smi 명령어 - GPU Instance 삭제 배경색 변경$ nvidia-smi mig -i 0 -lgi +--------------------------------------------------------+ | GPU instances: | | GPU Name Profile Instance Placement | | ID ID Start:Size | |========================================================| | 0 MIG 7g.80gb 0 0 0:8 | +--------------------------------------------------------+$ nvidia-smi mig -i 0 -lgi +--------------------------------------------------------+ | GPU instances: | | GPU Name Profile Instance Placement | | ID ID Start:Size | |========================================================| | 0 MIG 7g.80gb 0 0 0:8 | +--------------------------------------------------------+코드블록. nvidia-smi 명령어 - GPU Instance 확인 예시 배경색 변경$ nvidia-smi mig -i 0 -dgi Successfully destroyed GPU instance ID 0 from GPU 0$ nvidia-smi mig -i 0 -dgi Successfully destroyed GPU instance ID 0 from GPU 0코드블록. nvidia-smi 명령어 - GPU Instance 삭제 예시 배경색 변경$ nvidia-smi mig -i 0 -lgi No GPU instances found: Not found$ nvidia-smi mig -i 0 -lgi No GPU instances found: Not found코드블록. nvidia-smi 명령어 - GPU Instance 삭제 예시
MIG 기능 해제(비활성화)
- MIG를 비활성화(Disable)한 후 재부팅하세요.배경색 변경
$ nvidia-smi -mig 0 Disabled MIG Mode for GPU 00000000:05:00.0 All done.$ nvidia-smi -mig 0 Disabled MIG Mode for GPU 00000000:05:00.0 All done.코드블록. nvidia-smi 명령어 - MIG 비활성화 배경색 변경$ nvidia-smi Mon Sep 30 05:18:28 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVDIA A100-SXM... Off | 00000000:05:00.0 Off | 0 | | N/A 33C P0 60W / 400W | 0MiB / 81251MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +-----------------------------------------------------------------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |=============================================================================| | No MIG devices found | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+$ nvidia-smi Mon Sep 30 05:18:28 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVDIA A100-SXM... Off | 00000000:05:00.0 Off | 0 | | N/A 33C P0 60W / 400W | 0MiB / 81251MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +-----------------------------------------------------------------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |=============================================================================| | No MIG devices found | +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+코드블록. nvidia-smi 명령어 - GPU 상태 확인


