Dynamic Parallelism is an extension to CUDA that enables kernels to directly call other kernels. Earlier versions of CUDA only allowed kernels to be launched from the host code. When we studied , the segmented approach required multiple kernel calls.
Table of Contents Overview of Nsight Getting Started with Nsight Case Study: Matrix Multiplication Tips and Best Practices OCL Notes Overview of Nsight NVIDIA NSight Compute is a profiling tool for CUDA kernels. It features an expert system that can help you identify performance bottlenecks in your code. It is essential for methodically optimizing your code. These notes will cover the basics of using Nsight Compute to profile your CUDA applications.
Table of Contents Structure of the Course Heterogeneous Parallel Computing Measuring Speedup GPU Programming History Applications What to expect from this course Structure of the Course The primary of this goal is of course to learn how to program GPUs. A key skill that will be developed is the ability to think in parallel. We will start with simple problems that are embarrassingly parallel and then move on to more complex problems that require synchronization. One of the biggest challenges will be in converting processes that are simple to reason about in serial to parallel processes.