GPU Architectures and Programming |
Semester: |
SoSe 24 |
Course type: |
Lecture |
Course number: |
lv3039_s24 |
Lecturer: |
Prof. Dr. Sohan Lal |
Description: |
In this module, you will study the architecture and programming of GPUs. Please find below a brief outline of the lectures:
- Review of computer architecture basics - measuring performance, benchmarks, five-stage RISC pipeline, caches
- GPU basics - the evolution of GPU computing, a high-level overview of a GPU architecture
- GPU programming with CUDA - program structure, CUDA threads organization, warp/thread-block scheduling
- GPU (micro) architecture - streaming multiprocessors, single instruction multiple threads (SIMT) core design, tensor cores for deep learning, RT cores for ray tracing, mixed-precision support
- GPU memory hierarchy - banked register file and operand collectors, shared memory, GPU caches (differences w.r.t. CPU caches), global memory
- Branch and memory divergence - branch handling, stack-based reconvergence, memory coalescing, coalescer design
- Barriers and synchronization
- Temporal and spatial locality exploitation challenges in GPU caches
- Global memory- high throughput requirements, GDDR/HBM, memory bandwidth optimization techniques
- GPU research issues - performance bottlenecks, GPU power modeling, high-power consumption/energy efficiency, GPU security
- Application case study - deep learning
- Cycle-accurate simulators for GPUs
In addition to lectures, a semester-long problem-based project will augment the learning in the lectures. Several topics related to GPUs will be proposed. You are required to choose a topic and work on it. It is possible to work in groups. There will be (bi-) weekly meetings to discuss progress and problems.
In addition to the semester-long project, there will be assignments to teach CUDA programming.
Course Evaluation: Oral examination
Duration: 30 minutes |
Pre-requisites: |
- Basic course on computer architecture and C/C++ programming |
Learning organisation: |
- Weekly lecture
- Weekly lab |
Performance accreditation: |
Oral exam + Lab assignments |
Area classification: |
Studiendekanat Elektrotechnik, Informatik und Mathematik |
ECTS credit points: |
6 |
Stud.IP informationen about this course: |
Home institute: Institut für Massively Parallel Systems (E-EXK5)
Registered participants in Stud.IP: 81
Postings: 2
Documents: 1
|
|