GPU Server에서 Multi-instance GPU 사용하기

GPU Server를 생성한 후 GPU Server의 VM (Guest OS)에서 MIG (Multi-instance GPU) 기능을 활성화하고 Instance를 생성해 사용할 수 있습니다.

NVIDIA Multi-instance GPU 소개

NVIDIA Multi-instance GPU (이하 MIG)는 NVIDIA Ampere 아키텍처부터 GPU를 GPU 인스턴스로 안전하게 분할하여 CUDA 애플리케이션을 실행할 수 있도록 지원합니다.
이를 통해 여러 사용자가 각각 다른 GPU 리소스를 활용하여 최적의 GPU 사용률을 확보할 수 있습니다.
이 기능은 GPU의 컴퓨팅 용량을 완전히 활용하지 못하는 워크로드에 특히 유용하며 사용자는 여러 워크로드를 병렬로 실행하여 사용률을 극대화할 수 있습니다.

Multi-instance GPU 기능 사용하기

MIG 기능을 사용하려면 Samsung Cloud Platform에서 NVIDIA GPU Server를 생성한 후, MIG를 적용 및 해제해야 합니다.
MIG를 적용하고 해제하는 순서는 다음과 같습니다.

MIG 적용 순서
MIG 활성화 → GPU Instance 생성 → Compute Instance 생성 → MIG 사용
MIG 해제 순서
Compute Instance 삭제 → GPU Instance 삭제 → MIG 기능 해제(비활성화)
참고
  • MIG는 Samsung Cloud Platform의서 g세대 GPU Server 또는 MNGC(Multi-node GPU Cluster)에서 사용할 수 있습니다.
  • MIG를 사용하기 위한 시스템 요구사항은 NVIDIA Multi-Instance GPU User Guide를 참고하세요.

MIG 적용 및 사용하기

MIG를 활성화하고 Instance를 생성해 작업을 할당 작업은 다음 순서로 진행됩니다.

MIG 적용 순서
MIG 활성화 → GPU Instance 생성 → Compute Instance 생성 → MIG 사용
참고
MIG 적용 방법에 대한 예시는 A100 GPU Server를 기준으로 설명합니다.

MIG 활성화

  1. MIG를 적용하기 전 VM Instance(GuestOS)에서 GPU 상태를 확인하세요.

    • MIG modeDisabled 상태인지 확인하세요.
      배경색 변경
      $ nvidia-smi
      Mon Sep 27 08:37:08 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                    0 |
      | N/A   32C   P0    59W / 400W  |      0MiB / 81251MiB |      0%      Default |
      |                               |                      |             Disabled |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      $ nvidia-smi
      Mon Sep 27 08:37:08 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                    0 |
      | N/A   32C   P0    59W / 400W  |      0MiB / 81251MiB |      0%      Default |
      |                               |                      |             Disabled |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      코드블록. nvidia-smi 명령어 - GPU 비활성화 상태 확인 (1)
      배경색 변경
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
      코드블록. nvidia-smi 명령어 - GPU 비활성화 상태 확인 (2)
  2. VM Instance(GuestOS)에서 GPU별로 MIG를 활성화(Enable)하고 VM Instance를 재부팅하세요.

    배경색 변경
    $ nvidia-smi –I 0 –mig 1
    Enabled MIG mode for GPU 00000000:05:00.0
    All done.
    
    # reboot
    $ nvidia-smi –I 0 –mig 1
    Enabled MIG mode for GPU 00000000:05:00.0
    All done.
    
    # reboot
    코드블록. nvidia-smi 명령어 - MIG 활성화

참고

GPU를 사용하는 중에 MIG를 설정할 경우, 다음과 같은 경고 메시지가 발생할 수 있습니다. 경고 메시지가 발생하면 GPU에서 사용 중인 프로그램이 있는지 확인하세요.

Warning: MIG mode is in pending enable state for GPU 00000000:05:00.0: In use by another client. 00000000:05:00.0 is currently being used by one or more other processes (e.g. CUDA application or a monitoring application such as another instance of nvidia-smi).
  1. VM Instance(GuestOS)에서 MIG를 적용한 후 GPU 상태를 확인하세요.
    • MIG modeEnabled 상태인지 확인하세요.
      배경색 변경
      $ nvidia-smi
      Mon Sep 27 09:44:33 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                   On |
      | N/A   32C   P0    59W / 400W  |      0MiB / 81251MiB |      0%      Default |
      |                               |                      |              Enabled |
      +-------------------------------+----------------------+----------------------+
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      | No MIG devices found                                                        |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      $ nvidia-smi
      Mon Sep 27 09:44:33 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                   On |
      | N/A   32C   P0    59W / 400W  |      0MiB / 81251MiB |      0%      Default |
      |                               |                      |              Enabled |
      +-------------------------------+----------------------+----------------------+
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      | No MIG devices found                                                        |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      코드블록. nvidia-smi 명령어 - GPU 활성화 상태 확인 (1)
      배경색 변경
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
      코드블록. nvidia-smi 명령어 - GPU 활성화 상태 확인 (2)

GPU Instance 생성

MIG 활성화하고 상태를 확인하였다면, GPU Instance를 생성할 수 있습니다.

  1. 생성할 수 있는 MIG GPU Instance 프로파일 목록을 확인하세요.

    배경색 변경
    $ nvidia-smi mig -i [GPU ID] -lgip
    $ nvidia-smi mig -i [GPU ID] -lgip
    코드블록. nvidia-smi 명령어 - MIG GPU Instance 프로파일 목록 확인

    배경색 변경
    $ nvidia-smi mig -i 0 -lgip
    +-----------------------------------------------------------------------------+
    | GPU instance profiles:                                                      |
    | GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
    |                              Free/Total   GiB              CE    JPEG  OFA  |
    |=============================================================================|
    |   0 MIG 1g.10gb        19    7/7         9.50       No     14     0     0   |
    |                                                             1     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 1g.10gb+me     20    1/1         9.50       No     14     0     0   |
    |                                                             1     1     1   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 2g.20gb        14    3/3         19.50      No     28     1     0   |
    |                                                             2     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 3g.40gb         9    2/2         39.50      No     42     2     0   |
    |                                                             3     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 4g.40gb         5    1/1         39.50      No     56     2     0   |
    |                                                             4     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 7g.80gb         0    1/1         79.25      No     98     0     0   |
    |                                                             7     1     1   |
    +-----------------------------------------------------------------------------+
    $ nvidia-smi mig -i 0 -lgip
    +-----------------------------------------------------------------------------+
    | GPU instance profiles:                                                      |
    | GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
    |                              Free/Total   GiB              CE    JPEG  OFA  |
    |=============================================================================|
    |   0 MIG 1g.10gb        19    7/7         9.50       No     14     0     0   |
    |                                                             1     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 1g.10gb+me     20    1/1         9.50       No     14     0     0   |
    |                                                             1     1     1   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 2g.20gb        14    3/3         19.50      No     28     1     0   |
    |                                                             2     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 3g.40gb         9    2/2         39.50      No     42     2     0   |
    |                                                             3     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 4g.40gb         5    1/1         39.50      No     56     2     0   |
    |                                                             4     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 7g.80gb         0    1/1         79.25      No     98     0     0   |
    |                                                             7     1     1   |
    +-----------------------------------------------------------------------------+
    코드블록. MIG GPU Instance 프로파일 목록
참고
GPU Instance profile은 NVIDIA Multi-Instance GPU User Guide를 참고하세요.
  1. MIG GPU Instance를 생성한 후 확인하세요.
    • GPU Instance 생성

      배경색 변경
      $ nvidia-smi mig -i [GPU ID] -cgi [Profile ID]
      $ nvidia-smi mig -i [GPU ID] -cgi [Profile ID]
      코드블록. nvidia-smi 명령어 - GPU Instance 생성
      배경색 변경
      $ nvidia-smi mig -i 0 -cgi 0
      Successfully created GPU instance ID 0 on GPU 0 using profile MIG 7g.80gb (ID 0)
      $ nvidia-smi mig -i 0 -cgi 0
      Successfully created GPU instance ID 0 on GPU 0 using profile MIG 7g.80gb (ID 0)
      코드블록. nvidia-smi 명령어 - GPU Instance 생성 예시

    • GPU Instance 확인

      배경색 변경
      $ nvidia-smi mig -i [GPU ID] -lgi
      $ nvidia-smi mig -i [GPU ID] -lgi
      코드블록. nvidia-smi 명령어 - GPU Instance 확인
      배경색 변경
      $ nvidia-smi mig -i 0 -lgi
      +--------------------------------------------------------+
      | GPU instances:                                         |
      | GPU   Name               Profile  Instance  Placement  |
      |                            ID       ID      Start:Size |
      |========================================================|
      |   0  MIG 7g.80gb            0        0         0:8     |
      +--------------------------------------------------------+
      $ nvidia-smi mig -i 0 -lgi
      +--------------------------------------------------------+
      | GPU instances:                                         |
      | GPU   Name               Profile  Instance  Placement  |
      |                            ID       ID      Start:Size |
      |========================================================|
      |   0  MIG 7g.80gb            0        0         0:8     |
      +--------------------------------------------------------+
      코드블록. nvidia-smi 명령어 - GPU Instance 확인 예시

Compute Instance 생성

GPU Instance를 생성하였다면, Compute Instance를 생성할 수 있습니다.

  1. 생성할 수 있는 MIG Compute Instance 프로파일을 확인하세요.

    배경색 변경
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip
    코드블록. nvidia-smi 명령어 - MIG Compute Instance 프로파일 확인
    배경색 변경
    $ nvidia-smi mig -i 0 -gi 0 -lcip
    +---------------------------------------------------------------------------------+
    | Compute instance profiles:                                                      |
    | GPU     GPU     Name            Profile  Instances   Exclusive      Shared      |
    | GPU   Instance                     ID    Free/Total     SM       DEC  ENC  OFA  |
    |         ID                                                       CE   JPEG      |
    |=================================================================================|
    |   0      0      MIG 1c.7g.80gb     0      7/7           14       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 2c.7g.80gb     1      3/3           28       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 3c.7g.80gb     2      2/2           42       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 4c.7g.80gb     3      1/1           56       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 7g.80gb        4*     1/1           98       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    $ nvidia-smi mig -i 0 -gi 0 -lcip
    +---------------------------------------------------------------------------------+
    | Compute instance profiles:                                                      |
    | GPU     GPU     Name            Profile  Instances   Exclusive      Shared      |
    | GPU   Instance                     ID    Free/Total     SM       DEC  ENC  OFA  |
    |         ID                                                       CE   JPEG      |
    |=================================================================================|
    |   0      0      MIG 1c.7g.80gb     0      7/7           14       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 2c.7g.80gb     1      3/3           28       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 3c.7g.80gb     2      2/2           42       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 4c.7g.80gb     3      1/1           56       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 7g.80gb        4*     1/1           98       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    코드블록. MIG Compute Instance 프로파일 목록 예시

  2. MIG Compute Instance를 생성하고 확인하세요.

    • MIG Compute Instance 생성
      배경색 변경
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -cci [Compute Profile ID]
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -cci [Compute Profile ID]
      코드블록. nvidia-smi 명령어 - MIG Compute Instance 생성
      배경색 변경
      $ nvidia-smi mig -i 0 -gi 0 -cci 4
      Successfully created compute instance ID 0 on GPU instance ID 0 using profile MIG 7g.80gb(ID 4)
      $ nvidia-smi mig -i 0 -gi 0 -cci 4
      Successfully created compute instance ID 0 on GPU instance ID 0 using profile MIG 7g.80gb(ID 4)
      코드블록. nvidia-smi 명령어 - MIG Compute Instance 생성 예시
    • MIG Compute Instance 확인
      배경색 변경
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –lci
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –lci
      코드블록. nvidia-smi 명령어 - MIG Compute Instance 확인
      배경색 변경
      $ nvidia-smi mig -i 0 -gi 0 –lci
      +-----------------------------------------------------------------+
      | Compute instance profiles:                                      |
      | GPU     GPU     Name            Profile  Instances   Placement  |
      | GPU   Instance                     ID      ID        Start:Size |
      |         ID                                                      |
      |=================================================================|
      |   0      0      MIG 7g.80gb         4       0            0:7    |
      +-----------------------------------------------------------------+
      $ nvidia-smi mig -i 0 -gi 0 –lci
      +-----------------------------------------------------------------+
      | Compute instance profiles:                                      |
      | GPU     GPU     Name            Profile  Instances   Placement  |
      | GPU   Instance                     ID      ID        Start:Size |
      |         ID                                                      |
      |=================================================================|
      |   0      0      MIG 7g.80gb         4       0            0:7    |
      +-----------------------------------------------------------------+
      코드블록. MIG Compute Instance 확인 예시
      배경색 변경
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
        MIG 7g.80gb     Device  0: (UUID: MIG-53e20040-758b-5ecb-948e-c626d03a9a32)
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
        MIG 7g.80gb     Device  0: (UUID: MIG-53e20040-758b-5ecb-948e-c626d03a9a32)
      코드블록. nvidia-smi 명령어 - GPU 상태 확인 (1)
      배경색 변경
      $ nvidia-smi
      Mon Sep 27 09:52:17 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                   On |
      | N/A   32C   P0    49W / 400W  |      0MiB / 81251MiB |     N/A      Default |
      |                               |                      |              Enabled |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      |   0    0   0    0 |     0MiB / 81251MiB | 98      0 |  7   0    5    1    1 |
      |                   |     1MiB / 13107... |           |                       |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      $ nvidia-smi
      Mon Sep 27 09:52:17 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                   On |
      | N/A   32C   P0    49W / 400W  |      0MiB / 81251MiB |     N/A      Default |
      |                               |                      |              Enabled |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      |   0    0   0    0 |     0MiB / 81251MiB | 98      0 |  7   0    5    1    1 |
      |                   |     1MiB / 13107... |           |                       |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      코드블록. nvidia-smi 명령어 - GPU 상태 확인 (2)

MIG 사용

  1. MIG Instance를 사용해 Job을 수행하세요.
    • 작업 수행 예시
      배경색 변경
      $ docker run --gpus '"device=[GPU ID]:[MIG ID]"' -rm nvcr.io/nvidia/cuda nvidia-smi
      $ docker run --gpus '"device=[GPU ID]:[MIG ID]"' -rm nvcr.io/nvidia/cuda nvidia-smi
      코드블록. 작업 수행 예시
    • 아래와 같이 작업을 수행한 예시를 확인해볼 수 있습니다.
      배경색 변경
      $ docker run --gpus '"device=0:0"' -rm -it --network=host --shm-size=1g --ipc=host -v /root/.ssh/:/root/.ssh
      
      ================
      == TensorFlow ==
      ================
      
      NVIDIA Release 21.08-tf1 (build 26012104)
      TensorFlow Version 1.15.5
      
      Container image Copyright (c) 2021, NVIDIA CORPORATION. All right reserved.
      ...
      
      # Python 프로세스 실행
      root@d622a93c9281:/workspace# python /workspace/nvidia-examples/cnn/resnet.py --num_iter 100 
      ...
      PY 3.8.10 (default, Jun 2 2021, 10:49:15)
      [GCC 9.4.0]
      TF 1.15.5
      ...
      $ docker run --gpus '"device=0:0"' -rm -it --network=host --shm-size=1g --ipc=host -v /root/.ssh/:/root/.ssh
      
      ================
      == TensorFlow ==
      ================
      
      NVIDIA Release 21.08-tf1 (build 26012104)
      TensorFlow Version 1.15.5
      
      Container image Copyright (c) 2021, NVIDIA CORPORATION. All right reserved.
      ...
      
      # Python 프로세스 실행
      root@d622a93c9281:/workspace# python /workspace/nvidia-examples/cnn/resnet.py --num_iter 100 
      ...
      PY 3.8.10 (default, Jun 2 2021, 10:49:15)
      [GCC 9.4.0]
      TF 1.15.5
      ...
      코드블록. 작업 수행 결과
  2. GPU 사용률을 확인하세요. (JOB 프로세스 생성)
    • Job이 구동될 때 MIG 디바이스에 프로세스가 할당되고 사용률이 증가하는 것을 확인할 수 있습니다.
      배경색 변경
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip
      코드블록. nvidia-smi 명령어 - GPU 사용률 확인
    • 아래와 같이 GPU 사용률을 확인할 수 있습니다.
      배경색 변경
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      |   0    0   0    0 | 66562MiB / 81251MiB | 98      0 |  7   0    5    1    1 |
      |                   |     5MiB / 13107... |           |                       |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      |   0     0    0     17483      C   python                           66559MiB |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      |   0    0   0    0 | 66562MiB / 81251MiB | 98      0 |  7   0    5    1    1 |
      |                   |     5MiB / 13107... |           |                       |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      |   0     0    0     17483      C   python                           66559MiB |
      +-----------------------------------------------------------------------------+
      코드블록. GPU 사용률 확인 예시

MIG Instance 삭제 및 해제하기

MIG Instance를 삭제하고 MIG를 해제하려면 다음 절차를 따르세요.

MIG 해제 순서
Compute Instance 삭제 → GPU Instance 삭제 → MIG 기능 해제(비활성화)

Compute Instance 삭제

  • Compute Instance를 삭제하세요.
    배경색 변경
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dci
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -ci [Compute Instance] –dci
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dci
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -ci [Compute Instance] –dci
    코드블록. nvidia-smi 명령어 - Compute Instance 삭제
    배경색 변경
    $ nvidia-smi mig -i 0 -gi 0 –lci
    +-----------------------------------------------------------------+
    | Compute instance profiles:                                      |
    | GPU     GPU     Name            Profile  Instances   Placement  |
    | GPU   Instance                     ID      ID        Start:Size |
    |         ID                                                      |
    |=================================================================|
    |   0      0      MIG 7g.80gb         4       0            0:7    |
    +-----------------------------------------------------------------+
    $ nvidia-smi mig -i 0 -gi 0 –lci
    +-----------------------------------------------------------------+
    | Compute instance profiles:                                      |
    | GPU     GPU     Name            Profile  Instances   Placement  |
    | GPU   Instance                     ID      ID        Start:Size |
    |         ID                                                      |
    |=================================================================|
    |   0      0      MIG 7g.80gb         4       0            0:7    |
    +-----------------------------------------------------------------+
    코드블록. MIG Compute Instance 확인 예시
    배경색 변경
    $ nvidia-smi mig -i 0 -gi 0 –dci
    Successfully destroyed compute instance ID  0 from GPU instance ID  0
    $ nvidia-smi mig -i 0 -gi 0 –dci
    Successfully destroyed compute instance ID  0 from GPU instance ID  0
    코드블록. Compute Instance 삭제 예시
    배경색 변경
    $ nvidia-smi mig -i 0 -gi 0 –lci
    No compute instances found: Not found
    $ nvidia-smi mig -i 0 -gi 0 –lci
    No compute instances found: Not found
    코드블록. Compute Instance 삭제 확인

GPU Instance 삭제

  • GPU Instance를 삭제하세요.
    배경색 변경
    $ nvidia-smi mig -i [GPU ID] –dgi
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dgi
    $ nvidia-smi mig -i [GPU ID] –dgi
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dgi
    코드블록. nvidia-smi 명령어 - GPU Instance 삭제
    배경색 변경
    $ nvidia-smi mig -i 0 -lgi
    +--------------------------------------------------------+
    | GPU instances:                                         |
    | GPU   Name               Profile  Instance  Placement  |
    |                            ID       ID      Start:Size |
    |========================================================|
    |   0  MIG 7g.80gb            0        0         0:8     |
    +--------------------------------------------------------+
    $ nvidia-smi mig -i 0 -lgi
    +--------------------------------------------------------+
    | GPU instances:                                         |
    | GPU   Name               Profile  Instance  Placement  |
    |                            ID       ID      Start:Size |
    |========================================================|
    |   0  MIG 7g.80gb            0        0         0:8     |
    +--------------------------------------------------------+
    코드블록. nvidia-smi 명령어 - GPU Instance 확인 예시
    배경색 변경
    $ nvidia-smi mig -i 0 -dgi
    Successfully destroyed GPU instance ID  0 from GPU  0
    $ nvidia-smi mig -i 0 -dgi
    Successfully destroyed GPU instance ID  0 from GPU  0
    코드블록. nvidia-smi 명령어 - GPU Instance 삭제 예시
    배경색 변경
    $ nvidia-smi mig -i 0 -lgi
    No GPU instances found: Not found
    $ nvidia-smi mig -i 0 -lgi
    No GPU instances found: Not found
    코드블록. nvidia-smi 명령어 - GPU Instance 삭제 예시

MIG 기능 해제(비활성화)

  • MIG를 비활성화(Disable)한 후 재부팅하세요.
    배경색 변경
    $ nvidia-smi -mig 0
    Disabled MIG Mode for GPU 00000000:05:00.0
    
    All done.
    $ nvidia-smi -mig 0
    Disabled MIG Mode for GPU 00000000:05:00.0
    
    All done.
    코드블록. nvidia-smi 명령어 - MIG 비활성화
    배경색 변경
    $ nvidia-smi
    Mon Sep 30 05:18:28 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------|
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                    0 |
    | N/A   33C   P0    60W / 400W  |      0MiB / 81251MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    +-----------------------------------------------------------------------------+
    | MIG devices:                                                                |
    +-----------------------------------------------------------------------------+
    |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
    |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
    |                   |                     |        ECC|                       |
    |=============================================================================|
    | No MIG devices found                                                        |
    +-----------------------------------------------------------------------------+
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    | No running processes found                                                  |
    +-----------------------------------------------------------------------------+
    $ nvidia-smi
    Mon Sep 30 05:18:28 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------|
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                    0 |
    | N/A   33C   P0    60W / 400W  |      0MiB / 81251MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    +-----------------------------------------------------------------------------+
    | MIG devices:                                                                |
    +-----------------------------------------------------------------------------+
    |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
    |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
    |                   |                     |        ECC|                       |
    |=============================================================================|
    | No MIG devices found                                                        |
    +-----------------------------------------------------------------------------+
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    | No running processes found                                                  |
    +-----------------------------------------------------------------------------+
    코드블록. nvidia-smi 명령어 - GPU 상태 확인
Keypair 관리하기
GPU Server에서 NVSwitch 사용하기