Media Summary: Sorting, Sorting Networks, Bitonic Sort Serial Implementation, Recursion. Comparator, Sorting subproblem, Bitonic Sort Parallel Implementation. Sorting bitinic sequence, All Prefix Sum , Inclusive and exclusive scan.
Lecture 31 Optimizing Reduction Kernels Contd - Detailed Analysis & Overview
Sorting, Sorting Networks, Bitonic Sort Serial Implementation, Recursion. Comparator, Sorting subproblem, Bitonic Sort Parallel Implementation. Sorting bitinic sequence, All Prefix Sum , Inclusive and exclusive scan. Steel inclusive scan, Prefix Sum Implementation, Blelloch Scan Algorithm and Implementation. Transpose Operation: Naive Row and Naive Col Implementations. Transpose: Resolving Shared Memory Bank Conflicts, Memory Padding.
Profiling Analysis using NVPROF, load transactions, store transactions. Inner and Inter Block Fusion - example, advantage and disadvantages. Transpose Using Shared Memory, shared memory load transactions; store transactions. CUDA Event Profiling, Analysis of Memory Accesses, Shared Memory Basics. Download 1M+ code from okay, let's dive into