CUDA编程图例
CUDA編程圖例
CUDA C++ Programming Guide
Figure 7. Matrix Multiplication without Shared Memory
Figure 8. Matrix Multiplication with Shared Memory
Figure 20. Examples of Global Memory Accesses. Examples of Global Memory Accesses by a Warp, 4-Byte Word per Thread, and Associated Memory Transactions for Compute Capabilities 3.x and Beyond
Figure 21. Strided Shared Memory Accesses. Examples for devices of compute capability 3.x (in 32-bit mode) or compute capability 5.x and 6.x
Figure 22. Irregular Shared Memory Accesses. Examples for devices of compute capability 3.x, 5.x, or 6.x.
參考鏈接:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions__throughput-native-arithmetic-instructions
總結
- 上一篇: 写算子单元测试Writing Unit
- 下一篇: TensorFlow XLA优化与Mem