Posts
-
GPU Histogram: From Global Atomics to Shared Memory Privatization
-
GPU Prefix Sum: From Multi-Kernel to Single-Pass Decoupled Lookback
-
Optimizing GPU Matrix Transpose: From 14% to 88% of Peak Bandwidth
-
GPU Parallel Reduction: Algorithm and Optimization Strategies
-
How Data Type Width Affects GPU Memory Throughput
subscribe via RSS