Accelerating Convolutional Neural Networks using Programmable Logic

Dates:

Monday 10:00-12:00 (Lecture)

Friday 14:00-16:00 (Lab/Question Session)

First meeting:

Monday 17.04  at 10:00-12:00
ECTS: 10
Language: English
Type: Bachelor/Master lab course (IN0012, IN2106)
Moodle course:  

Registration:

Registration is through the matching system

Questions? 

Contact dirk.stober(at)tum.de

 

This course is part of the BB-KI (Brandenburg / Bayern Aktion für KI-Hardware) chips project, aimed at offering practical courses in the area of dedicated AI Hardware.

It is provided seperately for bachelor and master students, but carried out as one event.


Content

 The course consists of a weekly lecture to teach the required concepts, introduce the practical exercises and student presentations. In addition, a weekly lab slot is offered for students to ask questions and for help regarding the practical exercises. The course will cover the following:

  • Introduction to Convolutional Neural Networks (CNNs), which are widely used for image classification and object detection
  • You will implement and optimize CNN inference on a resource constrained ARM CPU and analyze bottlenecks
  • Introduction to state of the art dedicated ML accelerators in form of student presentations
  • Project in simulation and synthesis, co-designing your own CNN accelerator using HLS and RTL
  • You will implement the accelerator on an FPGA and integrate it with a CPU using the Pynq Z2 board
  • Evaluation of key performance metrics and comparison of SW/HW implementations

 


Grading

The lab will be done in small groups (max. 3 students) and consists of 3 non-graded tasks:

  • CNN inference on the CPU (Pass/Fail)
  • Presentation of an existing ML accelerator (Pass/Fail)
  • Proposal of an Accelerator Design (Pass/Fail)

and a final graded Project (HW/SW co-design of CNN inference) consisting of a Report, Presentation and a discussion of the implementation.


Learning Outcomes

  • Basic understanding of Convolutional Neural Networks (mainly Inference)
  • Understanding of common Optimization schemes for Convolution
  • Basic Knowledge of existing AI Accelerators
  • Understanding the challenges of using PL to accelerate workloads
  • Ability to design simple digital circuits using RTL and HLS languages
  • Implementation and Integration of both SW and PL on a SoC platform (Pynq Z2)
  • Co-design of SW and HW
  • Ability to reason about the performance of different implementations

Prerequisites

  • Experience in Programming C/C++ required
  • Basic knowledge of Microcontrollers
  • Basic knowledge of a RTL language (Verilog/VHDL) recommended or willingness to learn on your own
  • Knowledge of Machine Learning not required

BB-KI chips

Artificial Intelligence (AI) is a domain of research that already has disrupted many parts of our digital lives. To improve the energy-efficiency and performance of AI algorithms dedicated hardware becomes more and more important and while the industry already offers some solutions there is a lack of education at German universities. 

Today, a lot of AI classification is performed on the cloud, which allows big companies access to private data of end users. A focus on designing dedicated AI accelerators for end-user devices (on the "edge") could increase data security and privacy. 

This project aims at bringing multiple teams from TUM and Uni Potsdam together to offer hybrid practical courses regarding the development of dedicated AI chips. The project has access to the unique opportunity for students to fabricate their own AI-chips due to the Partnership with Leibniz Institute for High Performance Microelectronics (IHP).