Parallel Programming Systems (IN2365)

Lecturer (assistant)
  • Michael Klemm
DatesSee TUMonline


Dr. Ing. Michael Klemm is a Principal Member of Technical Staff and Senior Field Application Engineer in the HPC Center of Excellence at AMD.  His focus is on High Performance and Throughput Computing and his responsibilities include performance analysis and optimization, software development and porting, benchmarking, customer support and training, and platform evangelization.

He is also the Chief Executive Officer of the OpenMP Architecture Review Board.  The OpenMP Architecture Review Board ( is a non-profit organization that oversees the OpenMP API specification and develops future versions of the OpenMP language and API.  It also provides funding of conferences, workshops, tutorials, and other OpenMP-related events.

His background is in Computer Science.He obtained an M.Sc. (Dipl.-Inf.) and a Doctor of Engineering degree (Dr.-Ing.) in Computer Science from the Friedrich Alexander University, Erlangen, Germany.  His research focus was on compilers and runtime systems for distributed systems.  His areas of interest include compiler construction, design of programming languages, parallel programming as well as performance analysis and tuning.


This lecture focuses on the implementation aspects of parallel programming systems. Parallel programming models need compiler and runtime support to map the rich feature set of a parallel programming model to actual parallel hardware. To obtain high performance and high efficiency, this mapping needs to take into account the specific architectural aspects of the underlying computer architecture. This lecture briefly reviews key concepts that have been presented in the lecture "Parallel Programming" and "Microprocessors". It then turns towards the fundamental algorithms used to implement the concepts of parallel programming models and how they interact with modern processors. While the lecture will focus on the general mechanisms, we will use the Intel processor architecture to exemplify the discussed implementation concepts.

Recommended Literature

  • John Hennessy and David Patterson: Computer Architecture: A Quantitative Approach (Morgan Kaufmann Series in Computer Architecture and Design). Morgan Kaufmann, 6th edition, ISBN-13 978-0128119051.
  • William Stallings: Computer Organization and Architecture: Designing for Performance. Prentice Hall, 7th edition, ISBN 978-0131856448.
  • Yan Solihin: Fundamentals of Parallel Multicore Architecture. Apple Academic Press, ISBN 978-1482211184.
  • Sources of the LLVM OpenMP Runtime Implementation.
  • Sources of the Threading Building Blocks.
  • Select research papers regarding barrier implementation, lock implementation, scheduling task graphs, etc.
  • Intel Corporation: Intel® 64 and IA-32 Architectures Optimization Reference Manual, document ID 248966-040.
  • Barbara Chapman, Gabriele Jost, and Ruud van der Pas: Using OpenMP - Portable Shared Memory Parallel Programming. MIT Press, ISBN-13 978-0262533027.
  • Ruud van der Pas, Eric Stotzer, and Christian Terboven: Using OpenMP - The Next Step: Affinity, Accelerators, Tasking, and SIMD. MIT Press, ISBN-13 978-0262534789.
  • James Reinders: Threading Building Blocks. O'Reilly, ISBN 978-0596514808.
  • Alexander Supalov: Inside the Message Passing Interface - Creating Fast Communication Libraries, De|G Press, ISBN 978-1501515545.