The page has been translated by Gen AI.

Using Multi-instance GPU in GPU Server

After creating a GPU Server, you can enable the MIG (Multi-instance GPU) feature on the GPU Server’s VM (Guest OS) and create an instance to use it.

Multi-instance GPU (NVIDIA A100) Overview

NVIDIA A100 is a Multi-instance GPU (MIG) based on the NVIDIA Ampere architecture, which can be securely divided into up to 7 independent GPU instances to operate CUDA (Compute Unified Device Architecture) applications. The NVIDIA A100 provides independent GPU resources to multiple users by allocating computing resources in a way optimized for GPU usage while utilizing high-bandwidth memory (HBM) and cache. Users can maximize GPU utilization by utilizing workloads that have not reached the maximum computing capacity of the GPU through parallel execution of each workload.

Multi-instance GPU configuration diagram
Figure. Multi-instance GPU configuration diagram

Using Multi-instance GPU Feature

To use the multi-instance GPU feature, you must create a GPU Server service on the Samsung Cloud Platform and then create a VM Instance (GuestOS) with an A100 GPU assigned. After completing the GPU Server creation, you can follow the MIG application order and MIG release order below to apply it.

Multi-instance GPU creation
Figure. Multi-instance GPU creation

MIG Application Order
MIG activation → GPU Instance creation → Compute Instance creation → MIG usage
MIG Removal Order
Compute Instance deletion → GPU Instance deletion → MIG feature deactivation(disabling)

Reference
  • The system requirements for using the MIG feature are as follows (refer to NVIDIA - Supported GPUs).
    • CUDA toolkit 11, NVIDIA driver 450.80.02 or later version
    • Linux distribution operating system supporting CUDA toolkit 11
  • When operating a container or Kubernetes service, the requirements for using the MIG feature are as follows.
    • NVIDIA Container Toolkit(nvidia-docker2) v 2.5.0 or later version
    • NVIDIA K8s Device Plugin v 0.7.0 or later version
    • NVIDIA gpu-feature-discovery v 0.2.0 or later version

MIG Application and Usage

To activate MIG and create an instance to assign a task, follow these steps.

MIG Application Order
MIG activation → GPU Instance creation → Compute Instance creation → MIG usage

MIG Activation

  1. Check the GPU status on the VM Instance (GuestOS) before applying MIG.

    • MIG mode is Disabled status, please check.
      Color mode
      $ nvidia-smi
      Mon Sep 27 08:37:08 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                    0 |
      | N/A   32C   P0    59W / 400W  |      0MiB / 81251MiB |      0%      Default |
      |                               |                      |             Disabled |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      $ nvidia-smi
      Mon Sep 27 08:37:08 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                    0 |
      | N/A   32C   P0    59W / 400W  |      0MiB / 81251MiB |      0%      Default |
      |                               |                      |             Disabled |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      Code block. nvidia-smi command - Check GPU inactive state (1)
      Color mode
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
      Code block. nvidia-smi command - Check GPU inactive state (2)
  2. In the VM Instance(GuestOS), enable MIG for each GPU and reboot the VM Instance.

    Color mode
    $ nvidia-smi –I 0 –mig 1
    Enabled MIG mode for GPU 00000000:05:00.0
    All done.
    
    # reboot
    $ nvidia-smi –I 0 –mig 1
    Enabled MIG mode for GPU 00000000:05:00.0
    All done.
    
    # reboot
    Code Block. nvidia-smi Command - MIG Activation

Note

If the GPU monitoring agent displays the following warning message, stop the nvsm and dcgm services before enabling MIG.

Warning: MIG mode is in pending enable state for GPU 00000000:05:00.0: In use by another client. 00000000:05:00.0 is currently being used by one or more other processes (e.g. CUDA application or a monitoring application such as another instance of nvidia-smi).

# systemctl stop nvsm
# systemctl stop dcgm
  • After completing the MIG work, restart the nvsm and dcgm services.
  1. Check the GPU status after applying MIG in the VM Instance(GuestOS).
    • MIG mode must be in Enabled state.
      Color mode
      $ nvidia-smi
      Mon Sep 27 09:44:33 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                   On |
      | N/A   32C   P0    59W / 400W  |      0MiB / 81251MiB |      0%      Default |
      |                               |                      |              Enabled |
      +-------------------------------+----------------------+----------------------+
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      | No MIG devices found                                                        |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      $ nvidia-smi
      Mon Sep 27 09:44:33 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                   On |
      | N/A   32C   P0    59W / 400W  |      0MiB / 81251MiB |      0%      Default |
      |                               |                      |              Enabled |
      +-------------------------------+----------------------+----------------------+
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      | No MIG devices found                                                        |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      Code block. nvidia-smi command - Check GPU activation status (1)
      Color mode
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
      Code block. nvidia-smi command - Check GPU activation status (2)

GPU Instance Creation

After activating MIG and checking the status, you can create a GPU Instance.

  1. Check the list of MIG GPU instance profiles that can be created.

    Color mode
    $ nvidia-smi mig -i [GPU ID] -lgip
    $ nvidia-smi mig -i [GPU ID] -lgip
    Code block. nvidia-smi command - MIG GPU Instance profile list check

    Color mode
    $ nvidia-smi mig -i 0 -lgip
    +-----------------------------------------------------------------------------+
    | GPU instance profiles:                                                      |
    | GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
    |                              Free/Total   GiB              CE    JPEG  OFA  |
    |=============================================================================|
    |   0 MIG 1g.10gb        19    7/7         9.50       No     14     0     0   |
    |                                                             1     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 1g.10gb+me     20    1/1         9.50       No     14     0     0   |
    |                                                             1     1     1   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 2g.20gb        14    3/3         19.50      No     28     1     0   |
    |                                                             2     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 3g.40gb         9    2/2         39.50      No     42     2     0   |
    |                                                             3     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 4g.40gb         5    1/1         39.50      No     56     2     0   |
    |                                                             4     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 7g.80gb         0    1/1         79.25      No     98     0     0   |
    |                                                             7     1     1   |
    +-----------------------------------------------------------------------------+
    $ nvidia-smi mig -i 0 -lgip
    +-----------------------------------------------------------------------------+
    | GPU instance profiles:                                                      |
    | GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
    |                              Free/Total   GiB              CE    JPEG  OFA  |
    |=============================================================================|
    |   0 MIG 1g.10gb        19    7/7         9.50       No     14     0     0   |
    |                                                             1     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 1g.10gb+me     20    1/1         9.50       No     14     0     0   |
    |                                                             1     1     1   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 2g.20gb        14    3/3         19.50      No     28     1     0   |
    |                                                             2     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 3g.40gb         9    2/2         39.50      No     42     2     0   |
    |                                                             3     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 4g.40gb         5    1/1         39.50      No     56     2     0   |
    |                                                             4     0     0   |
    +-----------------------------------------------------------------------------+
    |   0 MIG 7g.80gb         0    1/1         79.25      No     98     0     0   |
    |                                                             7     1     1   |
    +-----------------------------------------------------------------------------+
    Code Block. MIG GPU Instance Profile List
Note
A100 GPU Instance profile refers to the example of NVIDIA A100 MIG Profile.
MIG Device Naming
Figure. MIG Device Naming
Profile NameFraction of MemoryFraction of SMsHardware UnitsL2 Cache SizeNumber of Instances Available
MIG 1g.10gb1/81/70 NVDECs /0 JPEG /0 OFA1/87
MIG 1g.10gb+me1/81/71 NVDEC /1 JPEG /1 OFA1/81 (A single 1g profile can include media extensions)
MIG 2g.20gb2/82/71 NVDECs /0 JPEG /0 OFA2/83
MIG 3g.40gb4/83/72 NVDECs /0 JPEG /0 OFA4/82
MIG 4g.40gb4/84/72 NVDECs /0 JPEG /0 OFA4/81
MIG 7g.80gbFull7/75 NVDECs /1 JPEG /1 OFAFull1
Table. NVIDIA A100 MIG Profile
Note
MIG 1g.10gb+me profile can only be used when starting with the R470 driver.
  1. Check after creating the MIG GPU Instance.
    • GPU Instance creation

      Color mode
      $ nvidia-smi mig -i [GPU ID] -cgi [Profile ID]
      $ nvidia-smi mig -i [GPU ID] -cgi [Profile ID]
      Code Block. nvidia-smi command - GPU Instance creation
      Color mode
      $ nvidia-smi mig -i 0 -cgi 0
      Successfully created GPU instance ID 0 on GPU 0 using profile MIG 7g.80gb (ID 0)
      $ nvidia-smi mig -i 0 -cgi 0
      Successfully created GPU instance ID 0 on GPU 0 using profile MIG 7g.80gb (ID 0)
      Code block. nvidia-smi command - GPU Instance creation example

    • GPU Instance check

      Color mode
      $ nvidia-smi mig -i [GPU ID] -lgi
      $ nvidia-smi mig -i [GPU ID] -lgi
      Code Block. nvidia-smi Command - GPU Instance Check
      Color mode
      $ nvidia-smi mig -i 0 -lgi
      +--------------------------------------------------------+
      | GPU instances:                                         |
      | GPU   Name               Profile  Instance  Placement  |
      |                            ID       ID      Start:Size |
      |========================================================|
      |   0  MIG 7g.80gb            0        0         0:8     |
      +--------------------------------------------------------+
      $ nvidia-smi mig -i 0 -lgi
      +--------------------------------------------------------+
      | GPU instances:                                         |
      | GPU   Name               Profile  Instance  Placement  |
      |                            ID       ID      Start:Size |
      |========================================================|
      |   0  MIG 7g.80gb            0        0         0:8     |
      +--------------------------------------------------------+
      Code block. nvidia-smi command - GPU Instance check example

Compute Instance Creation

If you have created a GPU Instance, you can create a Compute Instance.

  1. Check the MIG Compute Instance profile that can be created.

    Color mode
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip
    Code Block. nvidia-smi command - MIG Compute Instance profile check
    Color mode
    $ nvidia-smi mig -i 0 -gi 0 -lcip
    +---------------------------------------------------------------------------------+
    | Compute instance profiles:                                                      |
    | GPU     GPU     Name            Profile  Instances   Exclusive      Shared      |
    | GPU   Instance                     ID    Free/Total     SM       DEC  ENC  OFA  |
    |         ID                                                       CE   JPEG      |
    |=================================================================================|
    |   0      0      MIG 1c.7g.80gb     0      7/7           14       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 2c.7g.80gb     1      3/3           28       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 3c.7g.80gb     2      2/2           42       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 4c.7g.80gb     3      1/1           56       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 7g.80gb        4*     1/1           98       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    $ nvidia-smi mig -i 0 -gi 0 -lcip
    +---------------------------------------------------------------------------------+
    | Compute instance profiles:                                                      |
    | GPU     GPU     Name            Profile  Instances   Exclusive      Shared      |
    | GPU   Instance                     ID    Free/Total     SM       DEC  ENC  OFA  |
    |         ID                                                       CE   JPEG      |
    |=================================================================================|
    |   0      0      MIG 1c.7g.80gb     0      7/7           14       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 2c.7g.80gb     1      3/3           28       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 3c.7g.80gb     2      2/2           42       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 4c.7g.80gb     3      1/1           56       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    |   0      0      MIG 7g.80gb        4*     1/1           98       5    0    1    |
    |                                                                  7    1         |
    +---------------------------------------------------------------------------------+
    Code block. MIG Compute Instance profile list example

  2. Create and check the MIG Compute Instance.

    • MIG Compute Instance creation
      Color mode
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -cci [Compute Profile ID]
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -cci [Compute Profile ID]
      Code Block. nvidia-smi command - MIG Compute Instance creation
      Color mode
      $ nvidia-smi mig -i 0 -gi 0 -cci 4
      Successfully created compute instance ID 0 on GPU instance ID 0 using profile MIG 7g.80gb(ID 4)
      $ nvidia-smi mig -i 0 -gi 0 -cci 4
      Successfully created compute instance ID 0 on GPU instance ID 0 using profile MIG 7g.80gb(ID 4)
      Code block. nvidia-smi command - MIG Compute Instance creation example
    • MIG Compute Instance check
      Color mode
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –lci
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –lci
      Code block. nvidia-smi command - MIG Compute Instance check
      Color mode
      $ nvidia-smi mig -i 0 -gi 0 –lci
      +-----------------------------------------------------------------+
      | Compute instance profiles:                                      |
      | GPU     GPU     Name            Profile  Instances   Placement  |
      | GPU   Instance                     ID      ID        Start:Size |
      |         ID                                                      |
      |=================================================================|
      |   0      0      MIG 7g.80gb         4       0            0:7    |
      +-----------------------------------------------------------------+
      $ nvidia-smi mig -i 0 -gi 0 –lci
      +-----------------------------------------------------------------+
      | Compute instance profiles:                                      |
      | GPU     GPU     Name            Profile  Instances   Placement  |
      | GPU   Instance                     ID      ID        Start:Size |
      |         ID                                                      |
      |=================================================================|
      |   0      0      MIG 7g.80gb         4       0            0:7    |
      +-----------------------------------------------------------------+
      Code block. MIG Compute Instance confirmation example
      Color mode
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
        MIG 7g.80gb     Device  0: (UUID: MIG-53e20040-758b-5ecb-948e-c626d03a9a32)
      $ nvidia-smi –L
      GPU 0: NVIDIA A100-SXM-80GB (UUID: GPU-c956838f-494a-92b2-6818-56eb28fe25e0)
        MIG 7g.80gb     Device  0: (UUID: MIG-53e20040-758b-5ecb-948e-c626d03a9a32)
      Code block. nvidia-smi command - Check GPU status (1)
      Color mode
      $ nvidia-smi
      Mon Sep 27 09:52:17 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                   On |
      | N/A   32C   P0    49W / 400W  |      0MiB / 81251MiB |     N/A      Default |
      |                               |                      |              Enabled |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      |   0    0   0    0 |     0MiB / 81251MiB | 98      0 |  7   0    5    1    1 |
      |                   |     1MiB / 13107... |           |                       |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      $ nvidia-smi
      Mon Sep 27 09:52:17 2021
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
      |-------------------------------+----------------------+----------------------|
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                   On |
      | N/A   32C   P0    49W / 400W  |      0MiB / 81251MiB |     N/A      Default |
      |                               |                      |              Enabled |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      |   0    0   0    0 |     0MiB / 81251MiB | 98      0 |  7   0    5    1    1 |
      |                   |     1MiB / 13107... |           |                       |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      | No running processes found                                                  |
      +-----------------------------------------------------------------------------+
      Code block. nvidia-smi command - Check GPU status (2)

Using MIG

  1. Use the MIG Instance to perform the Job.
    • Work execution example
      Color mode
      $ docker run --gpus '"device=[GPU ID]:[MIG ID]"' -rm nvcr.io/nvidia/cuda nvidia-smi
      $ docker run --gpus '"device=[GPU ID]:[MIG ID]"' -rm nvcr.io/nvidia/cuda nvidia-smi
      Code Block. Work Execution Example
      You can check an example of how to perform the task as follows.
      Color mode
      $ docker run --gpus '"device=0:0"' -rm -it --network=host --shm-size=1g --ipc=host -v /root/.ssh/:/root/.ssh
      
      ================
      == TensorFlow ==
      ================
      
      NVIDIA Release 21.08-tf1 (build 26012104)
      TensorFlow Version 1.15.5
      
      Container image Copyright (c) 2021, NVIDIA CORPORATION. All right reserved.
      ...
      
      # Python process execution
      root@d622a93c9281:/workspace# python /workspace/nvidia-examples/cnn/resnet.py --num_iter 100 
      ...
      PY 3.8.10 (default, Jun 2 2021, 10:49:15)
      [GCC 9.4.0]
      TF 1.15.5
      ...
      $ docker run --gpus '"device=0:0"' -rm -it --network=host --shm-size=1g --ipc=host -v /root/.ssh/:/root/.ssh
      
      ================
      == TensorFlow ==
      ================
      
      NVIDIA Release 21.08-tf1 (build 26012104)
      TensorFlow Version 1.15.5
      
      Container image Copyright (c) 2021, NVIDIA CORPORATION. All right reserved.
      ...
      
      # Python process execution
      root@d622a93c9281:/workspace# python /workspace/nvidia-examples/cnn/resnet.py --num_iter 100 
      ...
      PY 3.8.10 (default, Jun 2 2021, 10:49:15)
      [GCC 9.4.0]
      TF 1.15.5
      ...
      Code Block. Work Result
  2. Check the GPU usage rate. (Creating a JOB process)
    • You can see that when the Job is driven, the process is assigned to the MIG device and the usage rate increases.
      Color mode
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip
      $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -lcip
      Code Block. nvidia-smi command - Check GPU usage
      You can check the GPU usage rate as follows.
      Color mode
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      |   0    0   0    0 | 66562MiB / 81251MiB | 98      0 |  7   0    5    1    1 |
      |                   |     5MiB / 13107... |           |                       |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      |   0     0    0     17483      C   python                           66559MiB |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | MIG devices:                                                                |
      +-----------------------------------------------------------------------------+
      |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
      |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
      |                   |                     |        ECC|                       |
      |=============================================================================|
      |   0    0   0    0 | 66562MiB / 81251MiB | 98      0 |  7   0    5    1    1 |
      |                   |     5MiB / 13107... |           |                       |
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      |   0     0    0     17483      C   python                           66559MiB |
      +-----------------------------------------------------------------------------+
      Code block. Example of checking GPU usage

MIG Instance deletion and release

To delete a MIG instance and release the MIG, follow these procedures.

MIG Removal Order
Compute Instance deletion → GPU Instance deletion → MIG feature disablement (deactivation)

Compute Instance deletion

  • Delete the Compute Instance.
    Color mode
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dci
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -ci [Compute Instance] –dci
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dci
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] -ci [Compute Instance] –dci
    Code Block. nvidia-smi command - Compute Instance deletion
    Color mode
    $ nvidia-smi mig -i 0 -gi 0 –lci
    +-----------------------------------------------------------------+
    | Compute instance profiles:                                      |
    | GPU     GPU     Name            Profile  Instances   Placement  |
    | GPU   Instance                     ID      ID        Start:Size |
    |         ID                                                      |
    |=================================================================|
    |   0      0      MIG 7g.80gb         4       0            0:7    |
    +-----------------------------------------------------------------+
    $ nvidia-smi mig -i 0 -gi 0 –lci
    +-----------------------------------------------------------------+
    | Compute instance profiles:                                      |
    | GPU     GPU     Name            Profile  Instances   Placement  |
    | GPU   Instance                     ID      ID        Start:Size |
    |         ID                                                      |
    |=================================================================|
    |   0      0      MIG 7g.80gb         4       0            0:7    |
    +-----------------------------------------------------------------+
    Code Block. MIG Compute Instance Check Example
    Color mode
    $ nvidia-smi mig -i 0 -gi 0 –dci
    Successfully destroyed compute instance ID  0 from GPU instance ID  0
    $ nvidia-smi mig -i 0 -gi 0 –dci
    Successfully destroyed compute instance ID  0 from GPU instance ID  0
    Code Block. Compute Instance deletion example
    Color mode
    $ nvidia-smi mig -i 0 -gi 0 –lci
    No compute instances found: Not found
    $ nvidia-smi mig -i 0 -gi 0 –lci
    No compute instances found: Not found
    Code Block. Compute Instance deletion confirmation

GPU Instance deletion

  • Delete the GPU Instance.
    Color mode
    $ nvidia-smi mig -i [GPU ID] –dgi
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dgi
    $ nvidia-smi mig -i [GPU ID] –dgi
    $ nvidia-smi mig -i [GPU ID] -gi [GPU Instance ID] –dgi
    Code block. nvidia-smi command - GPU Instance deletion
    Color mode
    $ nvidia-smi mig -i 0 -lgi
    +--------------------------------------------------------+
    | GPU instances:                                         |
    | GPU   Name               Profile  Instance  Placement  |
    |                            ID       ID      Start:Size |
    |========================================================|
    |   0  MIG 7g.80gb            0        0         0:8     |
    +--------------------------------------------------------+
    $ nvidia-smi mig -i 0 -lgi
    +--------------------------------------------------------+
    | GPU instances:                                         |
    | GPU   Name               Profile  Instance  Placement  |
    |                            ID       ID      Start:Size |
    |========================================================|
    |   0  MIG 7g.80gb            0        0         0:8     |
    +--------------------------------------------------------+
    Code block. nvidia-smi command - GPU Instance check example
    Color mode
    $ nvidia-smi mig -i 0 -dgi
    Successfully destroyed GPU instance ID  0 from GPU  0
    $ nvidia-smi mig -i 0 -dgi
    Successfully destroyed GPU instance ID  0 from GPU  0
    Code block. nvidia-smi command - GPU Instance deletion example
    Color mode
    $ nvidia-smi mig -i 0 -lgi
    No GPU instances found: Not found
    $ nvidia-smi mig -i 0 -lgi
    No GPU instances found: Not found
    Code block. nvidia-smi command - GPU Instance deletion example

MIG Function Disablement (Deactivation)

  • Disable MIG and then reboot.
    Color mode
    $ nvidia-smi -mig 0
    Disabled MIG Mode for GPU 00000000:05:00.0
    
    All done.
    $ nvidia-smi -mig 0
    Disabled MIG Mode for GPU 00000000:05:00.0
    
    All done.
    Code Block. nvidia-smi command - MIG disable
    Color mode
    $ nvidia-smi
    Mon Sep 30 05:18:28 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------|
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                    0 |
    | N/A   33C   P0    60W / 400W  |      0MiB / 81251MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    +-----------------------------------------------------------------------------+
    | MIG devices:                                                                |
    +-----------------------------------------------------------------------------+
    |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
    |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
    |                   |                     |        ECC|                       |
    |=============================================================================|
    | No MIG devices found                                                        |
    +-----------------------------------------------------------------------------+
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    | No running processes found                                                  |
    +-----------------------------------------------------------------------------+
    $ nvidia-smi
    Mon Sep 30 05:18:28 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------|
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVDIA A100-SXM...  Off   | 00000000:05:00.0 Off |                    0 |
    | N/A   33C   P0    60W / 400W  |      0MiB / 81251MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    +-----------------------------------------------------------------------------+
    | MIG devices:                                                                |
    +-----------------------------------------------------------------------------+
    |  GPU  GI  CI  MIG |        Memory-Usage |        Vol|        Shared         |
    |       ID  ID  Dev |          BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
    |                   |                     |        ECC|                       |
    |=============================================================================|
    | No MIG devices found                                                        |
    +-----------------------------------------------------------------------------+
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI       PID   Type   Process name                   GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    | No running processes found                                                  |
    +-----------------------------------------------------------------------------+
    Code Block. nvidia-smi command - Check GPU status
Image Management
Using NVSwitch on GPU Server