Nvidia’s CUDA 12.0 Will Usher a Paradigm Change in Programming
Nvidia’s CUDA 12.0 exposes programmable functionality for many features of NVIDIA’s Hopper
NVIDIA releases CUDA 12.0: it is the latest major feature update to their proprietary compute API. Nvidia’s CUDA 12.0 brings many changes including new capabilities for their latest Hopper and Ada Lovelace GPUs, updating their C++ dialects, making JIT LTO support official, new and improved APIs, and an assortment of other features.
Nvidia’s CUDA 12.0 exposes programmable functionality for many features of NVIDIA’s Hopper and Ada Lovelace architectures. Among the new CUDA 12.0 features, Hopper and Ada have tensor operations now supported with the public PTX intermediate representation, C intrinsics for cooperative grid array (CGA) relaxed barrier support, programmatic L2 cache to SM multi-cast, genomics/DPX instructions, and other additions.
Support for using virtual memory management APIs with GPUs marked as CUDA_VISIBLE_DEVICES. Application and library developers can programmatically update the priority of CUDA streams.
NVIDIA’s parallel computing platform CUDA (Compute Unified Device Architecture) is the primary foundation for general-purpose computing on graphics processing units. It’s a layer of software that lets you execute compute kernels in parallel using the GPU’s virtual instruction set.
In July, riding on the coattails of CUDA’s success, NVIDIA announced the release of their unified quantum computing platform, ‘QODA’ (Quantum optimised device architecture), intending to hasten advances in quantum research and development across artificial intelligence (AI), high-performance computing (HPC), and other fields.
CUDA minor version compatibility is a feature introduced in 11.x that gives you the flexibility to dynamically link your application against any minor version of the CUDA Toolkit within the same major release. Compile your code one time, and you can dynamically link against libraries, the CUDA runtime, and the user-mode driver from any minor version within the same major version of CUDA Toolkit.
For example, 11.6 applications can link against the 11.8 runtime and the reverse. This is accomplished through API or ABI consistency within the library files. For more information, see CUDA Compatibility.
Minor version compatibility continues into CUDA 12.x. However, as 12.0 is a new major release, the compatibility guarantees are reset. Applications that used minor version compatibility in 11.x may have issues when linking against 12.0. Either recompile your application against 12.0 or statically link to the needed libraries within 11.x to ensure the continuity of your development. Likewise, applications recompiled or built in 12.0 will link to future versions of 12.x but will not link against components of CUDA Toolkit 11.x.
Coroutines are resumable functions. Execution can be suspended, in which case control is returned to the caller. Subsequent invocations of the coroutine resume at the point where it was suspended. Coroutines are supported in host code but are not supported in device code. Uses of the co_await, co_yield, and co_return keywords in the scope of a device function are diagnosed as errors during device compilation.
Three-way comparison operator
The three-way comparison operator <=> is a new kind of relational system enabling the compiler to synthetize other relational operators.
Because it is tightly coupled with utility functions from the Standard Template Library, its use is restricted in device code whenever a host function is implicitly called.
Uses where the operator is called directly and does not require implicit calls are enabled.