Accelerating Convolutional Neural Networks using Programmable Logic

Dates:	Monday 10:00-12:00 (Lecture) Friday 14:00-16:00 (Lab/Question Session)
First meeting:	Monday 17.04 at 10:00-12:00
ECTS:	10
Language:	English
Type:	Bachelor/Master lab course (IN0012, IN2106)
Moodle course:
Registration:	Registration is through the matching system
Questions?	Contact dirk.stober(at)tum.de

This course is part of the BB-KI (Brandenburg / Bayern Aktion für KI-Hardware) chips project, aimed at offering practical courses in the area of dedicated AI Hardware.

It is provided seperately for bachelor and master students, but carried out as one event.

Content

The course consists of a weekly lecture to teach the required concepts, introduce the practical exercises and student presentations. In addition, a weekly lab slot is offered for students to ask questions and for help regarding the practical exercises. The course will cover the following:

Introduction to Convolutional Neural Networks (CNNs), which are widely used for image classification and object detection
You will implement and optimize CNN inference on a resource constrained ARM CPU and analyze bottlenecks
Introduction to state of the art dedicated ML accelerators in form of student presentations
Project in simulation and synthesis, co-designing your own CNN accelerator using HLS and RTL
You will implement the accelerator on an FPGA and integrate it with a CPU using the Pynq Z2 board
Evaluation of key performance metrics and comparison of SW/HW implementations

Grading

The lab will be done in small groups (max. 3 students) and consists of 3 non-graded tasks:

CNN inference on the CPU (Pass/Fail)
Presentation of an existing ML accelerator (Pass/Fail)
Proposal of an Accelerator Design (Pass/Fail)

and a final graded Project (HW/SW co-design of CNN inference) consisting of a Report, Presentation and a discussion of the implementation.

Learning Outcomes

Basic understanding of Convolutional Neural Networks (mainly Inference)
Understanding of common Optimization schemes for Convolution
Basic Knowledge of existing AI Accelerators
Understanding the challenges of using PL to accelerate workloads
Ability to design simple digital circuits using RTL and HLS languages
Implementation and Integration of both SW and PL on a SoC platform (Pynq Z2)
Co-design of SW and HW
Ability to reason about the performance of different implementations

Prerequisites

Experience in Programming C/C++ required
Basic knowledge of Microcontrollers
Basic knowledge of a RTL language (Verilog/VHDL) recommended or willingness to learn on your own
Knowledge of Machine Learning not required

BB-KI chips

Artificial Intelligence (AI) is a domain of research that already has disrupted many parts of our digital lives. To improve the energy-efficiency and performance of AI algorithms dedicated hardware becomes more and more important and while the industry already offers some solutions there is a lack of education at German universities.

Today, a lot of AI classification is performed on the cloud, which allows big companies access to private data of end users. A focus on designing dedicated AI accelerators for end-user devices (on the "edge") could increase data security and privacy.

This project aims at bringing multiple teams from TUM and Uni Potsdam together to offer hybrid practical courses regarding the development of dedicated AI chips. The project has access to the unique opportunity for students to fabricate their own AI-chips due to the Partnership with Leibniz Institute for High Performance Microelectronics (IHP).