CUTLASS: A CUDA C++ Template Library for Accelerating Deep Learning... Aniket Shivam & Vijay Thakkar
The Linux Foundation
@linuxfoundationorgAbout
The Linux Foundation is a nonprofit consortium dedicated to fostering the growth of Linux and collaborative software development. Founded in 2000, the organization sponsors the work of Linux creator Linus Torvalds and promotes, protects and advances the Linux operating system and collaborative software development by marshaling the resources of its members and the open source community. The Linux Foundation provides a neutral forum for collaboration and education by hosting Collaborative Projects, Linux conferences including LinuxCon, and generating original research and content that advances the understanding of Linux and collaborative software development. More information can be found at www.linuxfoundation.org.
Video Description
CUTLASS: A CUDA C++ Template Library for Accelerating Deep Learning Computations - Aniket Shivam & Vijay Thakkar, NVIDIA At the core of Machine and Deep Learning lie different flavors of linear algebra computations like matrix multiply and convolutions. In the last decade, GPU computing solutions from NVIDIA have accelerated AI compute, with an overall gain of 50X to 200X via architectural innovations. While this has helped applications like ChatGPT and Github Copilot to become a reality, the developers have to learn to optimally utilize and customize GPU compute for their applications. In this talk we present CUTLASS, an open-source header-only CUDA C++ template library that has been helping programmers, since 2017, in implementing high-performance CUDA kernels across various generations of NVIDIA's GPU architectures. CUTLASS, which contains, optimized, production quality implementations of AI computations has been the go-to source for Tensor Core programming details. CUTLASS provides modular abstractions and building blocks to CUDA programmers who are eager to write their own CUDA C++ kernels to perform deep learning computations such as matrix multiplication, convolutions, etc. We expect audience members to gain actionable knowledge and insights about Tensor Core programming and in developing custom CUDA C++ kernels using CUTLASS that push the limits of performance on NVIDIA GPUs.
Boost Your Deep Learning Setup
AI-recommended products based on this video

AocBook 15.6'' FHD Laptop, Intel N95, Nvidia GTX 1060 4GB, 32GB DDR4 RAM, M.2 SSD, Sleek Notebook with Type-C, HDMI, RJ45 Ethernet, Backlit Keyboard, Fingerprint (32GB DDR4 | 1TB SSD)

acer Nitro 50 N50-620-UA91 Gaming Desktop | 11th Gen Intel Core i5-11400F 6-Core Processor | NVIDIA GeForce GTX 1650 | 8GB DDR4 | 512GB NVMe M.2 SSD | Intel Wi-Fi 6 AX201 | Keyboard and Mouse

ASUS ROG Strix G16 Gaming Laptop, GeForce RTX 5070 Ti 12GB GDDR7, AMD Ryzen 9 8940HX, 64GB DDR5, 2TB SSD, Backlit Keyboard, Wi-Fi 6E, 16" WUXGA 165Hz Display, Win 11, Gray, 1TB Docking Station Set

95MM T129215SU Cooling Fan for ASUS ROG Strix for GeForce RTX 3060 3070 3080 3090 Ti Graphics Card CF1010U12S/D T129215BU(Black A-Fan T12)

ASUS ROG Strix G16 (2024) Gaming Laptop, 16” 16:10 FHD+ 165Hz, GeForce RTX 4060, Intel Core i7-13650HX, 16GB DDR5, 1TB PCIe SSD, Wi-Fi 6E, Windows 11, G614JV-AS71-CA

Asus ROG STRIX Z890-A GAMING WIFI Z890 LGA 1851 ATX motherboard, Intel® Core™ Ultra Series 2 Ready, Advanced AI PC-ready, 16+2+1+2 stages, DDR5, WiFi 7, 2.5G, 5x M.2, Thunderbolt™ 4, USB Type-C, AI OC

Skytech Blaze 3.0 Gaming PC Desktop – Intel Core i5 12400F 2.5 GHz, NVIDIA RTX 3060, 500GB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

STGAubron Gaming PC Desktop Computer, Intel Core i7 8th Gen up to 4.1G, GeForce RTX 3060 12G, 32GB DDR4, 2TB SSD, WiFi 6, BT 5.0, RGB Fan x9, Windows 11 Home

Lenovo LOQ 15.6" FHD 144Hz Gaming Laptop, Intel Core i5-12450HX, NVIDIA GeForce RTX 2050, 16GB DDR5, 2TB Storage (1TB SSD+1TB Docking Station Set), Number Pad, 720p Camera, Wi-Fi 6, Win 11, Gray

MSI Crosshair 18" QHD+ 240Hz Gaming Laptop, Intel Core Ultra 9 275HX, GeForce RTX 5070 8GB GDDR7, 32GB DDR5, 2TB Storage(1TB SSD+1TB Docking Station Set), 24-Zone RGB Backlit Keyboard, Win 11, Black

Corsair Vengeance LPX 32GB (2 X 16GB) DDR4 3200 (PC4-25600) C16 1.35V Desktop Memory - Black

Samsung 990 EVO Plus - 4TB PCIe Gen4. X4, Gen5. X2 NVMe 2.0 - M.2 Internal SSD, Speed Up to 7,250 MBs, Upgrade Storage for PC-Laptops, HMB Technology and Intelligent Turbowrite (MZ-V9S4T0B/AM)
![SAMSUNG 990 PRO SSD 4TB PCIe Gen4 NVMe M.2 Internal Solid State Hard Drive, Up to 7,450MB/s, Heat Control, Direct Storage and Memory Expansion, MZ-V9P4T0B/AM [Canada Version]](https://m.media-amazon.com/images/I/81WuG6lQuDL._AC_UL960_FMwebp_QL65_.jpg)
SAMSUNG 990 PRO SSD 4TB PCIe Gen4 NVMe M.2 Internal Solid State Hard Drive, Up to 7,450MB/s, Heat Control, Direct Storage and Memory Expansion, MZ-V9P4T0B/AM [Canada Version]
![Samsung 9100 PRO Series - 4TB PCIe 5.0 x4, NVMe 2.0, M.2 Internal SSD, Up to 14,800MB/s, Fast Speed, Thermal Contorl, MZ-VAP4T0B/AM [Canada Version]](https://m.media-amazon.com/images/I/71qygJIcKnL._AC_UL960_FMwebp_QL65_.jpg)
Samsung 9100 PRO Series - 4TB PCIe 5.0 x4, NVMe 2.0, M.2 Internal SSD, Up to 14,800MB/s, Fast Speed, Thermal Contorl, MZ-VAP4T0B/AM [Canada Version]
![SAMSUNG 870 EVO SATA SSD 500GB 2.5” Internal Solid State Drive, Upgrade PC or Laptop Memory and Storage for IT Pros, Creators, Everyday Users, MZ-77E500B/AM [Canada Version]](https://m.media-amazon.com/images/I/911ujeCkGfL._AC_UL960_FMwebp_QL65_.jpg)
SAMSUNG 870 EVO SATA SSD 500GB 2.5” Internal Solid State Drive, Upgrade PC or Laptop Memory and Storage for IT Pros, Creators, Everyday Users, MZ-77E500B/AM [Canada Version]



















