NVIDIA Visual Profiler Quickstart Guide

NVIDIA Visual Profiler Quickstart Guide

NVIDIA Visual Profiler is installed on both the GPU machines and the workstations. The following guide will show you how to use the NVIDIA Visual Profiler to profile your CUDA code. For more details, please refer to the official documentation.

Generate a profiling report for your kernel

First, a profiling report must be generated on the machine with a GPU. The following code should replace the current code in benchmark.sh for Lab 3. Note that this is specific to the lab in ERB 125. If you’re running on your own machine with a GPU more recent than Pascal, it is highly recommended that you profile your code with NSight Compute instead.

The given script will first load CUDA Toolkit 11.5, which is compatible with the version installed on the workstations. It will then compile the code and run the benchmark with a 1024x1024x1024 matrix. The nvprof command will generate a profiling report in the form of a .nvvp file, which can be opened with NVIDIA Visual Profiler.

#!/bin/bash
#SBATCH --export=/usr/local/cuda-11.5/bin
#SBATCH --gres=gpu:1

module load cuda/11.5

make benchmark

nvprof --analysis-metrics --export-profile matmul_benchmark.nvvp -f ./build/main/benchmark 1024 1024 1024

module unload cuda/11.5

Open the profiling report

After running the script, you should have a file called matmul_benchmark.nvvp. Copy this file from the GPU machine to your local workstation first. You can open this file with the following command:

nvvp matmul_benchmark.nvvp

We will review the metrics reported from the report in class. You can use the provided guided analysis to get a feel for the output metrics and how to interpret them.