Wissenschaftliches Seminar VLSI-Entwurfsverfahren

Vortragende/r (Mitwirkende/r)
Nummer0820073263
ArtSeminar
Umfang3 SWS
SemesterSommersemester 2024
UnterrichtsspracheDeutsch
Stellung in StudienplänenSiehe TUMonline

Termine

Teilnahmekriterien

Siehe TUMonline
Anmerkung: Die Studierenden wählen VOR der Einführungsveranstaltung ein Thema aus. Dazu setzen sie sich mit dem entsprechenden Betreuer in Verbindung. Themen werden nach dem Prinzip "first come, first serve" verteilt. Erst wenn der Betreuer das gewählte Thema bestätigt hat, gilt der/die Studierende als registriert. Eine Liste von Themen ist unter folgendem Link zu finden: https://www.ce.cit.tum.de/eda/lehrveranstaltungen/seminare/wissenschaftliches-seminar-vlsi-entwurfsverfahren/

Lernziele

Nach erfolgreichem Abschluss des Seminares sind die Studierenden in der Lage, eine neue Idee oder einen bestehenden Ansatz auf dem Gebiet des rechnergestützten Schaltungs- und Systementwurfs in verständlicher und überzeugender Weise zu präsentieren. Zu diesem Zwecke werden im Einzelnen folgende Fähigkeiten erworben: • Die teilnehmende Person kann sich selbstständig ein wissenschaftliches Thema aus dem Bereich des rechnergestützten Schaltungs- und Systementwurfs aneignen. • Die teilnehmende Person ist fähig, ein Thema strukturiert nach Problemstellung, Stand der Technik, Ziele, Methoden und Ergebnisse darzustellen. • Die teilnehmende Person ist in der Lage, ein Thema in der genannten Strukturierung mündlich zu präsentieren, in einem Foliensatz zu visualisieren, und in einem wissenschaftlichen Bericht schriftlich darzustellen. • Die teilnehmende Person ist mit den Grundlagen einer konstruktiven Begutachtung vertraut und kann diese auf eine fremde Arbeit anwenden.

Beschreibung

Spezifische Seminarthemen aus dem Bereich der Entwurfsautomatisierung für elektronische Schaltungen und Systeme werden angeboten. Beispiele sind Analogentwurfsmethodik, Entwurfsmethodik für digitale Schaltungen, Layoutsynthese, und Entwurfsmethodik auf der Systemebene. Teilnehmende arbeiten eigenständig auf einem wissenschaftlichen Thema, schreiben ein Paper von 4 Seiten. Außerdem fertigen die Teilnehmenden ein Gutachten über die schriftliche Ausarbeitung anderer Teilnehmender in einem Peer-Review Verfahren an. Abschließend präsentieren die Teilnehmenden ihr Thema in einem Vortrag. In einer anschließenden Diskussion wird ihr Thema detailliert behandelt.

Inhaltliche Voraussetzungen

Keine spezifischen Voraussetzungen.

Lehr- und Lernmethoden

Lernmethode: Die Studierenden arbeiten eigenständig und unter Beratung durch einen wissenschaftlichen Assistenten ein wissenschaftliches Thema aus. Lehrmethode: In Einführungsveranstaltungen werden den Teilnehmenden Hinweise zur fachlichen Arbeit, schriftlichen Ausarbeitung sowie zur Erstellung der Präsentation und zum mündlichen Vortrag gegeben. Während eines zusätzlichen interaktiven Präsentationtrainings können Techniken für einen gelungenen Vortrag von den Studierenden erlernt und geprobt werden. Weitere Details werden zwischen Studierenden und wissenschaftlichen Assistenten auf individueller Basis diskutiert. Alle geläufigen Techniken zur Vorbereitung und Präsentation von Papern und Vorträgen werden angewendet, z. B.: - Klassische Tafel, Weißwandtafel - Elektronische Folien, Beamer - Elektronische Textverarbeitung - Elektronische Folienbearbeitung

Studien-, Prüfungsleistung

Die Prüfung wird in Form einer wissenschaftlichen Ausarbeitung vorgenommen. Sie besteht zum einen aus einem schriftlichen Teil (50%), welcher sich aus einem Paper (4 Seiten) und einem Gutachten (ca. 2000-3000 Zeichen), das im Rahmen einer Peer-Review erarbeitet wird, zusammensetzt. Zum anderen besteht sie aus einem mündlichen Teil (50%) in Form einer ca. 30-minütigen Präsentation (inklusive nachfolgender Diskussion). Mit der wissenschaftlichen Ausarbeitung weisen die Studierenden nach, dass sie z. B. den wissenschaftlichen Stand der Technik, eine neue Idee oder einen bestehenden Ansatz auf dem Gebiet des rechnergestützten Schaltungs- und Systementwurfs für ein Fachpublikum aufbereiten, strukturiert darstellen und präsentieren können.

Empfohlene Literatur

Ein Satz an Themen und zugehöriger Literatur wird am Anfang des Kurses bereitgestellt. Die Studierenden wählen ihr Thema selbst aus.

Links

Themenwahl - offen

Die Themenliste für das Sommersemester 24 finden Sie unten.

Themen werden im FCFS Verfahren vergegeben. Bitte kontaktieren Sie dann direkt den Betreuer per E-Mail. Bitte versichern Sie sich, dass Sie eine Bestätigung Ihres Betreues erhalten, wenn Sie sich für ein Thema entschieden haben.

Seminars

Tensor program optimization for machine learning models

Short Description:
In this seminar the student will review state-of-the art tools allowing to perform tensor optimizations for machine learning (ML) models specially focusing on TVM-related approaches.

Description

The widespread use of ML in many real-life applications is enabled by deep learning models optimized and deployed to specific hardware platforms and devices. Typically, deep learning frameworks depend on libraries that have been manually optimized. Engineers need to choose from many tensor programs that are logically equivalent but differ significantly in performance due to memory access, threading, and the use of specialized hardware primitives. This selection and optimization process are quite labor-intensive. Thanks to the increase of model architectures, their sizes and hardware targets or backends, tensor program optimization has become a key factor for the efficient deployment of ML models. In this seminar paper, an extensive survey of state-of-the-art tools for tensor program optimization shall be done specially focusing on tools leveraging the ML compiler TVM, such as AutoTVM, Ansor [2], MetaSchedule[3], or others comparing against TVM, such as TensorIR [4]. Key areas of focus are: - Understanding core idea of the reviewed methods - Examining and comparing reported results - Assessing the advantages of using those methods and potential drawbacks References: [1] Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to optimize tensor programs (NIPS'18). https://proceedings.neurips.cc/paper_files/paper/2018/hash/8b5700012be65c9da25f49408d959ca0- Abstract.html [2] Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, and Ion Stoica. 2020. Ansor: generating high-performance tensor programs for deep learning (OSDI'20). https://www.usenix.org/conference/osdi20/presentation/zheng [3] Shao, Junru, et al. "Tensor program optimization with probabilistic programs." Advances in Neural Information Processing Systems 35 (2022): 35783-35796. https://proceedings.neurips.cc/paper_files/paper/2022/hash/e894eafae43e68b4c8dfdacf742bcbf3- Abstract-Conference.html [4] Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, and Tianqi Chen. 2023. TensorIR: An Abstraction for Automatic Tensorized Program Optimization. (ASPLOS 2023). https://dl.acm.org/doi/abs/10.1145/3575693.3576933 

Contact

  • Samira.ahmadifarsani@tum.de
  • Daniela.sanchezlopera@infineon.com

Supervisor:

Samira Ahmadifarsani

Pre-training Network Pruning

Short Description:
In this seminar the student will review state-of-the art pruning techniques applied before training such as SNIP.

Description

“Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. Conventionally, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters during training or using statistically heuristics after training. However, using suitable heuristic criteria, inspired by the “Lottery Ticket” hypothesis networks can also be pruned before training. This eliminates the need for both pretraining and the complex pruning schedules and is well suited to use in combination with neural architecture search. making it robust to architecture variations. The canonical method SNIP [1] introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. These methods can obtain extremely sparse networks and are claimed to retain the same accuracy as reference network on benchmark classification tasks.” As such pre-training pruning methods are potentially a highly attractive alternative to post-training training-time co-optimization methods for use in automated industrial machine learning deployment toolchains. References: [1] Lee, Namhoon, Thalaiyasingam Ajanthan, and Philip HS Torr. "Snip: Single-shot network pruning based on connection sensitivity." arXiv 2018. https://arxiv.org/abs/1810.02340 [2] Artem Vysogorets and Julia Kempe .“Connectivity Matters: “Neural Network Pruning Through the Lens of Effective Sparsity.” https://www.jmlr.org/papers/volume24/22-0415/22-0415.pdf [3] Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin: “Pruning Neural Networks at Initialization: Why are We Missing the Mark?” ICLR 2021. https://arxiv.org/abs/2009.08576 [4] Pau de Jorge, Amartya Sanyal, Harkirat S. Behl, Philip H.S. Torr, Gregory Rogez, Puneet K. Dokania: “Progressive Skeletonization: Trimming more fat from a network at initialization”. https://arxiv.org/abs/2006.09081

Contact

Andrew.stevens@infineon.com

Daniela.sanchezlopera@infineon.com 

Supervisor:

Daniela Sanchez Lopera - Andrew Stevens (Infineon Technologies AG)

Tensor program/graph rewriting-based optimization techniques.

Description

 

In machine learning (ML), tensor kernels often translate into pure mathematical expressions. This presents an interesting prospect for optimization through term rewriting [1]. A fundamental optimization technique used by deep learning frameworks is graph rewriting [2] [3]. Within production frameworks, the decision to apply rewrite rules and in what sequence rests heavily on heuristics. Research indicates that seeking a more optimal sequence of substitutions, rather than relying solely on heuristics, can lead to the discovery of better tensor computation graphs.

 

Moreover, term rewriting techniques prove beneficial in optimizing low-level tensor programs [4] alongside tensor graphs. Traditionally, application programmers manually add hardware function calls, or compilers incorporate them through handcrafted accelerator-specific extensions. Integrating domain-specific instruction or operation support into an existing compiler typically involves custom pattern matching to map resource-intensive tensor operations from applications to hardware-specific invocations. Despite these modifications related to pattern matching, users may still need to manually adjust their applications to aid the compiler in identifying opportunities for dispatching operations to target hardware, such as by altering data types or optimizing loops.

 

Leveraging term rewriting techniques offers a promising approach for streamlining various transformation and mapping tasks both for tensor graphs as well as programs. This approach not only enhances efficiency but also holds the potential for simplifying the deployment of DSLs, thus advancing the field of machine learning and computational optimization.

 

This seminar topic should cover literature research on existing rewriting techniques on tensor programs and graphs which includes:

 

1.    Research on existing rewriting techniques.

 

2.    Its application on tensor programs and graphs.

 

3.    Challenges and the relations between different rewriting techniques.

 

4.    Applications in and with the existing Machine learning compiler frameworks.

 

 

 

References:

[1] Franz Baader et al. 1998. Term Rewriting and All That. Cambridge University Press. https://doi.org/10.1017/ CBO9781139172752.

 

[2] Zhihao Jia et al. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). Association for Computing Machinery, New York, NY, USA, 47–62. https://doi.org/10.1145/3341301.3359630.

 

[3] Yang, Y., et al. (2021). Equality Saturation for Tensor Graph Superoptimization. ArXiv, abs/2101.01332.

 

[4] Gus Henry Smith et al. 2021. Pure Tensor Program Rewriting via Access Patterns (Representation Pearl). In Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming (Virtual, Canada) (MAPS 2021). Association for Computing Machinery, New York, NY, USA, 21–31. https://doi.org/10.1145/3460945.3464953.

 

 

Contact

Supervisor:

Philipp van Kempen - Mayuri Bhadra (Infineon Technologies AG)

Checksum-based Error Detection for Reliable Computing

Description

In safety-critical systems, random hardware faults, such as transient soft errors (e.g., due to radiation) or permanent circuit faults, can lead to disastrous failures. Detecting these errors is, therefore, one major design goal. A state-of-the-art solution is redundancy, where a computation is performed multiple times, and their respective results are compared. This can be achieved either sequentially (temporal redundancy) or at the same time, e.g., through lock-stepped computational units (spatial redundancy). The baseline is that a fault does not happen in a close vicinity to a former one.

However, this redundancy method introduces a significant overhead to the system: The required multiplicity of computational resources - execution time or processing nodes. Checksum-based computation aims to mitigate the amount of computational overhead by introducing redundancy into the algorithms, e.g., filter and input checksums for convolution algorithms.

Supervisor:

Johannes Geier

Arithmetic Code-based Error Detection for Reliable Computing

Description

In safety-critical systems, random hardware faults, such as transient soft errors (e.g., due to radiation) or permanent circuit faults, can lead to disastrous failures. Detecting these errors is, therefore, one major design goal. A state-of-the-art solution is redundancy, where a computation is performed multiple times, and their respective results are compared. This can be achieved either sequentially (temporal redundancy) or at the same time, e.g., through lock-stepped computational units (spatial redundancy). The baseline is that a fault does not happen in a close vicinity to a former one. However, this redundancy method introduces a significant overhead to the system: The required multiplicity of computational resources - execution time or processing nodes.

Code-based computation aims to mitigate the amount of computational overhead without loss in sensitivity. Notable examples are AN-codes and residue-based error detection.

 

Supervisor:

Johannes Geier

Compression Techniques for Floating-Point Weights in Machine Learning Models

Description

Deep Neural Networks (DNNs) offer possibilities for tackling practical challenges and broadening the scope of Artificial Intelligence (AI) applications. The considerable computational and memory needs of current neural networks are attributed to the increasing complexity of network structures, which involve numerous layers containing millions of parameters. The energy consumption during the inference execution of deep neural networks (DNNs) is predominantly attributed to the access and processing of these parameters. To tackle the significant size of models integrated into Internet of Things (IoT) devices, a promising strategy involves diminishing the bit-width of weights.

 

The objective of this seminar is to conduct a comprehensive literature survey around compression techniques available for floating-point weights. Gather the advantages and disadvantages posed by the available solutions. Depending on the time and reviewed contents, the survey can be extended to find a hardware-efficient technique for the compression of floating-point weights.

 

Bibliography:

[1] Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” 2019.
[2] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,”
2016.
[3] G. C. Marin ?o, G. Ghidoli, M. Frasca, and D. Malchiodi, “Compression strategies and space-conscious representations for deep neural networks,” in
2020 25th International Conference on Pattern Recognition (ICPR), 2021,
pp.9835–9842.
[4] G. C. Marin`o, A. Petrini, D. Malchiodi, and M. Frasca, “Compact representations of convolutional neural networks via weight pruning and
quantization,” CoRR, vol. abs/2108.12704, 2021. [Online]. Available: https://arxiv.org/abs/2108.12704

 

 

 

Contact

Supervisor:

Conrad Foik

RTL Generation with SpinalHDL

Description

 

Topic description:

 

Advances in technological developments necessitate increased productivity during the design phase. Traditional Hardware Description Languages (HDL) like (System)Verilog and VHDL have raised the level of abstraction from gate-level to the register transfer level (RTL). However, due to their limited flexibility, hardware designers are increasingly moving from writing parameterizable RTL code to creating generators embedded in high-level programming languages such as Scala or Python.

 

This seminar will delve into SpinalHDL, an open-source Hardware Generation Language (HGL) embedded in Scala, providing an overview of its concepts and benefits. Key areas of focus will include:

 

 

 

  • ·       Understanding the syntax and semantics of SpinalHDL, and how it differentiates from other HDLs.

 

  • ·       Exploring core concepts like components, interfaces, and generics in SpinalHDL.

 

  • ·       Examining extension libraries such as Stream and Flow, which simplify design with FIFO-semantics.

 

  • ·       Assessing the advantages of using SpinalHDL and potential drawbacks.

 

References:

 

[1]: https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Foreword/index.html

 

Contact

Supervisor:

Conrad Foik

On using LLM Technologies in writing a Seminar Paper

Description

Large Language Models such as ChatGPT are present everywhere. When taking a look at press, ChatGPT and other LLM technologies have been used for a while and boost the productivity of the authors. Also, in science, i.e. in thesis or paper writing, LLMs become more and more prominent. On the other hand, using LLMs but claiming self-written documents is ethically questionable, may cause problems with plagiarism, and has led to the refusal of theses up to expulsion from the university.

 

However, technology is here to be used. The question of whether a technology is good or bad is always linked to its usage. As computers are used to write documents and not typewriters, as we use spelling and grammar checkers, it is quite obvious that we will use LLMs as well.

 

The seminar paper shall address challenges and opportunities of using LLM technologies in writing a seminar paper. As the scope of the topic is very big, this seminar paper should focus on using LLMs writing another seminar paper. The task is to analyze and report on the usage of ChatGPT and to make proposals how ChatGPT usage can be referenced and marked in the seminar paper.

 

  • ·       Starting point is a literature search on ChatGPT and writing technical papers. How can it be used, are there recommended tutorials, and are there papers on challenges?

 

  • ·       As a first step, the use of ChatGPT and the setup of the writing environment shall be described.

 

  • ·       As a next step, the LLM supported writing of a technical seminar paper shall be analyzed. What code pieces can be used without modification, minor modifications, major modifications and what parts need to be handwritten.

 

  • ·       A similar analysis should be done on tables and figures.

 

  • ·       Lastly, a proposal should be made on how to make a “citation” of an LLM usage. This may include general phrases, marker in the code, or other ways.

 

The findings should be summarized in a 4-page paper which might be submitted to a workshop or conference with RISC-V scope and should be presented in an EDA seminar.

 

It is strongly encouraged to use LLM technologies for writing the paper and for making the presentation. This seminar is closely coupled with the Seminar called “Comparison of ARM and RISC-V ISA Bit-Manipulation Instructions”. A close cooperation and tandem work are essential.

 

Bibliography

 

Google reports 53.900.000 findings when asking for “chatGPT paper writing”. It is expected to find well-fitting papers.

 

 

Contact

Supervisor:

Conrad Foik

Comparison of ARM and RISC-V ISA Bit-Manipulation Instructions

Description

Different processors of ARM are de-facto standards in embedded SoCs and are on their way to being established in data centers and high-performance computing applications. Since about 15 years ago, a competitor is showing up which is following a completely different approach: An open and free to use instruction set definition that leaves space for various implementations. ISA subset support even allows special instruction extension and both open source and proprietary solutions. Only one thing must be guaranteed, the support of the I32 (respectively I64) instruction set.

 

The scope of this seminar is the comparison of the RV Zb* instruction set with instructions supported by ARM.

 

  • ·     Starting point are the ISA descriptions in [1], [2] and [3]. Further, literature research shall be performed that searches publications comparing the two ISAs.

 

  • ·       As a first step the instruction definition relevant system states (e.g. Program Counter, Register Files shall be enumerated and compared.

 

  • ·       As a next step, a summary of the RISC-V B and related ARM bit-manipulation Instructions should be made. An important aspect is the size of the instructions.

 

  • ·       Next a comparison strategy shall be setup, e.g. by defining instructions with the same behavior, a similar behavior and a different behavior or by registers being involved.

 

  • ·       Further, it should be elaborated how instructions supported by ARM can be mimicked by a sequence of RISC-V instructions.

 

The findings should be summarized in a 4-page paper which might be submitted to a workshop or conference with RISC-V scope and should be presented in an EDA seminar.

 

It is explicitly required to use LLM technologies for writing the paper and for making the presentation. As LLMs are of general interest in writing technical documents a second closely coupled Seminar called “On using LLM Technologies in writing a Seminar Paper” should be done in parallel. A close cooperation and tandem work are essential.

 

Bibliography

 

[1] Cortex-M0 Devices Generic User Guide Version 1.0, Chapter 3. The Cortex-M0 Instruction Set https://developer.arm.com/documentation/dui0497/a/the-cortex-m0-instruction-set

 

[2] RISC-V Specifications  https://riscv.org/technical/specifications/

 

[3] RISC-V Extension Specifications https://wiki.riscv.org/display/HOME/Ratified+Extensions

 

Contact

Supervisor:

Conrad Foik

Fast, multi target NAS without supernets

Description

Transitioning machine learning models into production environments comes with many challenges. MLOps offers guidelines and principles to navigate these challenges, advocating for iterative improvements as well as continuous integration and deployment. Such practices are imperative for post-deployment refinement and continuous updates of models. However, when deploying models on edge devices this process becomes more challenging: Limited resources of edge devices, require complex architecture optimization to allow efficient execution. Moreover, both the heterogeneity across different devices and the ever increasing number of devices exacerbate the optimization challenge.

Neural Architecture Search (NAS) has emerged as a powerful technique to tailor models for multiple targets, optimizing for the constraints of many devices. The most successful approaches in the domain involve training large supernets, which can be adapted to various accelerators with relative ease after a large initial training investment. Nonetheless, the upfront investment in training these supernets is substantial and goes against the rapid, iterative updates required in a production setting.

Hence, the objective of this topic is to survey approaches that are capable of finding efficient solutions for multiple edge targets, without the same large training upfront investment of supernet techniques, that make iterative updates and continuous integration infeasible.

Papers explaining the problem and supernet approaches:

  •  S. Li et al., “Hyperscale Hardware Optimized Neural Architecture Search,” in Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, Vancouver BC Canada: ACM, Mar. 2023, pp. 343–358. doi: 10.1145/3582016.3582049.
  •  H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, “Once-for-All: Train One Network and Specialize it for Efficient Deployment.” arXiv, Apr. 29, 2020. doi: 10.48550/arXiv.1908.09791.

 

 

Contact

Moritz.Thoma@bmw.de

Supervisor:

Conrad Foik

Adaptive pruning for adaptive latency at runtime

Description

As machine learning models are increasingly deployed on resource-constrained edge devices, the need for efficient model execution is paramount. Traditional model compression techniques, such pruning and quantization, offer some relief to the constraints of limited resources but typically lack the flexibility to adapt to changing runtime conditions. However, this ability to adjust model complexity in response to fluctuating workloads on shared accelerators is crucial. It allows for the optimization of model performance when resources are abundant, and the preservation of essential functionality when they are scarce.

 

 

The objective of this topic is to find and investigate strategies for enabling on-the-fly network reconfiguration, aiming to enable more flexible AI applications on edge devices. This entails identifying pruning methods that maintain model effectiveness while providing the agility to meet varying runtime requirements.

 

 

Papers showing general idea for quantization approaches:

 

  • Shen, Mingzhu, et al. "Once quantization-aware training: High performance extremely low-bit architecture search." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

 

  • Mori, Pierpaolo et. Al. „MATAR: Multi-Quantization-Aware Training for Accurate and Fast Hardware Retargeting.” Proceedings of DATE 2024.

 

Contact

Moritz.Thoma@bmw.de

Supervisor:

Conrad Foik

Model Compression Methods for Vision Transformer

Description

In recent times, Vision-Transformers (ViTs) [1] have shown high predictive performance on computer vision tasks such as image classification, and have outperformed Convolutional Neural Networks (CNNs). However, the high memory footprint and large compute requirements of common ViTs-based models restrict their potential deployment, especially on devices with computation constraints. Model compression techniques aim to improve the efficiency of deep learning models by reducing their memory and computational costs. These techniques include quantization, pruning, knowledge distillation and efficient architecture design.

Different from CNNs, the architecture of ViTs is mainly based on self-attention and feed-forward modules. This specific architecture presents challenges and opportunities for tailored compression strategy. For example, ViTs exhibit high variance in their weight and activation distributions, which can lead to severe performance drop when quantization is applied [2]. Existing quantization methods for ViTs improve the predictive performance in three ways: 1) by retaining the self-attention rank, 2) by rectifying the heavy tailed activation distribution, 3) by addressing the weight oscillation in quantization-aware training [3]. Moreover, ViT architectures tend to use more nonlinear operations than CNNs, such as GELU and Softmax. There is a need to solve the low hardware efficiency issue of such operations to improve the efficiency of ViT-based models [4].

This seminar topic should cover a literature research on model compression methods tailored for ViT architectures, including the challenges and the relations between different compression methods to improve the inference efficiency.

 

References:

[1] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16x16 words: Transformers for image recognition at scale.

[2] Huang, X., Shen, Z. and Cheng, K.T., 2023. Variation-aware vision transformer quantization.

[3] Tang, Y., Wang, Y., Guo, J., Tu, Z., Han, K., Hu, H. and Tao, D., 2024. A Survey on Transformer Compression.

[4] Chitty-Venkata, K.T., Mittal, S., Emani, M., Vishwanath, V. and Somani, A.K., 2023. A survey of techniques for optimizing transformer inference.

Supervisor:

Mikhael Djajapermana

Silicon Photonic Microring Resonators: A Comprehensive Design-Space Exploration and Optimization Under Fabrication-Process Variations

Keywords:
Design-space exploration, fabrication-process variations (FPVs), microring resonators (MRRs), silicon photonics.

Description

Silicon photonic microring resonators (MRRs) offer many advantages (e.g., compactness) and are often considered as the fundamental building block in optical interconnects and emerging photonic nanoprocessors and accelerators. Such devices are, however, sensitive to inevitable fabrication-process variations (FPVs) stemming from optical lithography imperfections. Consequently, silicon photonic integrated circuits (PICs) integrating MRRs often suffer from high power overhead required to compensate for the impact of FPVs on MRRs and, hence, realizing a reliable operation. On the other hand, the design space of MRRs is complex, including several correlated design parameters, thereby further exacerbating the design optimization of MRRs under FPVs. In this article, we present, for the first time, a comprehensive design-space exploration in passive and active MRRs under FPVs. In addition, we present design optimization in MRRs under FPVs while considering different performance metrics, such as tolerance to FPVs, quality factor, and 3-dB bandwidth in MRRs. Simulation and fabrication results obtained by measuring multiple fabricated MRRs designed using our designspace exploration demonstrate a significant 70% improvement on average in the MRRs’ tolerance to different FPVs. Furthermore, we apply the proposed design optimization to a case study of a wavelength-selective MRR-based demultiplexer, where we show considerable channel-spacing accuracy within 0.5 nm even when the MRRs are placed 500 µm apart on a chip. Such improvements indicate the efficiency of the proposed design-space exploration and optimization to enable power-efficient and variation-resilient PICs and optical interconnects integrating MRRs.

Contact

liaoyuan.cheng@tum.de

Supervisor:

Liaoyuan Cheng

Reliability-Aware Design Flow for Silicon Photonics On-Chip Interconnect

Description

Intercore communication in many-core processors presently faces scalability issues similar to those that plagued intracity telecommunications in the 1960s. Optical communication promises to address these challenges now, as then, by providing low latency, high bandwidth, and low power communication. Silicon photonic devices presently are vulnerable to fabrication and temperature-induced variability. Our fabrication and measurement results indicate that such variations degrade interconnection performance and, in extreme cases, the interconnection may fail to function at all. In this paper, we propose a reliability-aware design flow to address variation-induced reliability issues. To mitigate effects of variations, limits of device design techniques are analyzed and requirements from architecture-level design are revealed. Based on this flow, a multilevel reliability management solution is proposed, which includes athermal coating at fabrication-level, voltage tuning at device-level, as well as channel hopping at architecture-level. Simulation results indicate that our solution can fully compensate variations thereby sustaining reliable on-chip optical communication with power efficiency.

Contact

zhidan.zheng@tum.de

Supervisor:

Zhidan Zheng

Comparison of RISC-V based ASIP Co-Design Frameworks

Short Description:
Application Specific Instruction Set Processor (ASIP) are used to built highly efficient but performant SoCs for specialized tasks such as machine learning or graphic processing. Especially the RISC-V Instruction Set Architecture (ISA) allows hardware vendors to add their own custom instructions to RISC-V based processors easily. Conventional ASIP development is time intensive as lots of manual efforts (an potentially human errors are involved. Hence, Numerous methodologies and frameworks exist to generate ASIPs in an automated fashion using a HW/SW co-design driven flow, see i.e. High-Level-Synthesis.

Description

In this seminar paper, an extensive survey of state of the art tools for ASIP Co-Design in the RISC-V ecosystem shall be done using publications from academia (and industry).

 

Related work/Literature:

  • https://ieeexplore.ieee.org/document/9912050
  • https://www.synopsys.com/dw/doc.php/ds/cc/asip-brochure
  • https://codasip.com/products/codasip-studio
  • https://www.andestech.com
  • https://www.researchgate.net/publication/3980325_From_ASIC_to_ASIP_the_next_design_discontinuity
  • https://github.com/stevehoover/warp-v

Contact

philipp.van-kempen@tum.de

Supervisor:

Philipp van Kempen

Survey: Methods for Worst-Case-Execution-Time Analysis of Embedded Software

Description

To meet the strict requirements of modern embedded real-time systems, software designers rely on efficient methods to determine the worst-case-execution-time (WCET) of their programs. The literature has proposed both static and dynamic methods.

The goal of this project is to identify the most prominent approaches and to compare them.

Contact

conrad.foik@tum.de

Supervisor:

Conrad Foik

Simultaneously Tolerate Thermal and Process Variations Through Indirect Feedback Tuning for Silicon Photonic Networks

Keywords:
thermal tolerant; process variations; optical networks-on-chip

Description

Silicon photonics is the leading candidate technology for high-speed and low-energy-consumption networks. Thermal and process variations are the two main challenges of achieving high-reliability photonic networks. Thermal variation is due to the heat issues created by application, floorplan, and environment, while process variation is caused by fabrication variability in the deposition, masking, exposition, etching, and doping. Tuning techniques are then required to overcome the impact of the variations and efficiently stabilize the performance of silicon photonic networks. We extend our previous optical switch integration model, BOSIM, to support the variation and thermal analyses. Based on device properties, we propose indirect feedback tuning (IFT) to simultaneously alleviate thermal and process variations. IFT can improve the BER of silicon photonic networks to 10 -9 under different variation situations. Compared to state-of-the-art techniques, IFT can achieve an up to 1.52 ×10 8 times bit-error-rate improvement and 4.11X better heater energy efficiency. Indirect feedback does not require high-speed optical signal detection, and thus, the circuit design of IFT saves up to 61.4% of the power and 51.2% of the area compared to state-of-the-art designs.

Contact

zhidan.zheng@tum.de

Supervisor:

Zhidan Zheng

Percolation on complex networks: Theory and application

Description

In the last two decades, network science has blossomed and influenced various fields, such as statistical physics, computer science, biology and sociology, from the perspective of the heterogeneous interaction patterns of components composing the complex systems. As a paradigm for random and semi-random connectivity, percolation model plays a key role in the development of network science and its applications. On the one hand, the concepts and analytical methods, such as the emergence of the giant cluster, the finite-size scaling, and the mean-field method, which are intimately related to the percolation theory, are employed to quantify and solve some core problems of networks. On the other hand, the insights into the percolation theory also facilitate the understanding of networked systems, such as robustness, epidemic spreading, vital node identification, and community detection. Meanwhile, network science also brings some new issues to the percolation theory itself, such as percolation of strong heterogeneous systems, topological transition of networks beyond pairwise interactions, and emergence of a giant cluster with mutual connections. So far, the percolation theory has already percolated into the researches of structure analysis and dynamic modeling in network science. Understanding the percolation theory should help the study of many fields in network science, including the still opening questions in the frontiers of networks, such as networks beyond pairwise interactions, temporal networks, and network of networks. The intention of this paper is to offer an overview of these applications, as well as the basic theory of percolation transition on network systems.

Contact

m.lian@tum.de

Supervisor:

Meng Lian

Design of pressure-driven microfluidic networks using electric circuit analogy

Description

This article reviews the application of electric circuit methods for the analysis of pressure-driven microfluidic networks with an emphasis on concentration- and flow-dependent systems. The application of circuit methods to microfluidics is based on the analogous behaviour of hydraulic and electric circuits with correlations of pressure to voltage, volumetric flow rate to current, and hydraulic to electric resistance. Circuit analysis enables rapid predictions of pressure-driven laminar flow in microchannels and is very useful for designing complex microfluidic networks in advance of fabrication. This article provides a comprehensive overview of the physics of pressure-driven laminar flow, the formal analogy between electric and hydraulic circuits, applications of circuit theory to microfluidic network-based devices, recent development and applications of concentration- and flow-dependent microfluidic networks, and promising future applications. The lab-on-a-chip (LOC) and microfluidics community will gain insightful ideas and practical design strategies for developing unique microfluidic network-based devices to address a broad range of biological, chemical, pharmaceutical, and other scientific and technical challenges.

Contact

m.lian@tum.de

Supervisor:

Meng Lian

A polynomial time optimal diode insertion/routing algorithm for fixing antenna problem

Description

Abstract— Antenna problem is a phenomenon of plasma induced gate oxide degradation. It directly affects manufacturability of VLSI circuits, especially in deep-submicron technology using high density plasma. Diode insertion is a very effective way to solve this problem Ideally diodes are inserted directly under the wires that violate antenna rules. But in today's high-density VLSI layouts, there is simply not enough room for "under-the-wire" diode insertion for all wires. Thus it is necessary to insert many diodes at legal "off-wire" locations and extend the antenna-rule violating wires to connect to their respective diodes. Previously only simple heuristic algorithms were available for this diode insertion and routing problem. In this paper we show that the diode insertion and routing problem for an arbitrary given number of routing layers can be optimally solved in polynomial time. Our algorithm guarantees to find a feasible diode insertion and routing solution whenever one exists. Moreover we can guarantee to find a feasible solution to minimize a cost function of the form /spl alpha/ /spl middot/ L + /spl beta/ /spl middot/ N where L is the total length of extension wires and N is the total number of Was on the extension wires. Experimental results show that our algorithm is very efficient.

Contact

alex.truppel@tum.de

Supervisor:

Alexandre Truppel

A general multi-layer area router

Description

Abstract— This paper presents a general multi-layer area router based on a novel grid construction scheme. The grid construction scheme produces more wiring tracks than the normal uniform grid scheme and accounts for differing design rules of the layers involved. Initial routing performed on the varying capacity grid is followed by a layer assignment stage. Routing completion is ensured by iterating local and global modifications in the layer assignment stage. Our router has been incorporated into the Custom Cell Synthesis project at MCC and has shown improved results for cell synthesis problems when compared with the router Mighty which was used in earlier versions of the project.

Contact

alex.truppel@tum.de

Supervisor:

Alexandre Truppel