Yuanji Ye, M.Sc.

Wissenschaftlicher Mitarbeiter

Technische Universität München
TUM School of Computation, Information and Technology
Lehrstuhl für Integrierte Systeme
Arcisstr. 21
80333 München

Tel.: +49.89.289.28338
Fax: +49.89.289.28323
Gebäude: N1 (Theresienstr. 90)
Raum: N2116
Email: yuanji.ye@tum.de

Curriculum Vitae

Education

Since 2024 PhD student at LIS
2021 - 2024 Master of Science in Communications and Electronics Engineering (MSCE), TUM
2017 - 2021 Bachelor in Electronic Information and Science Technology, University of Electronic Science and Technology of China

Work Experience

2023 - 2024 Master Thesis Student, Huawei Munich Research Center
2022 - 2023 Working Student, Huawei Munich Research Center

Teaching

Project Lab IC-Design (seit SS 2025)

Research

CeCaS Project - Prefetcher Design

Open Student Work

Download Arbeit als PDF

Prefetching the Translation Path: MMU-Prefetch Co-Design for I/O Devices and Accelerators

Beschreibung

Modern I/O devices and accelerators increasingly rely on virtual memory support in order to simplify programming, improve isolation, and enable shared virtual address spaces with CPUs. However, address translation on the device side is often expensive. When an IOMMU or device TLB misses, the accelerator may suffer significant stalls due to multi-level page table walks and limited translation locality. Prior work has shown that translation overhead can become a major bottleneck for accelerators and that hiding or restructuring translation latency is an important architectural problem.

At the same time, traditional prefetching research mainly focuses on data accesses, while the translation path itself can also be seen as a target for latency hiding. This raises an interesting question: instead of only prefetching data, can we also prefetch translation-related information, such as page-table entries, or redesign the translation path so that address translation and data access can overlap more effectively? Recent and prior studies suggest that this direction is promising for accelerator-centric systems.

The goal of this seminar is to build a clear architectural understanding of how prefetching and address translation can be combined for accelerators or IO systems. The seminar will compare different approaches, identify their main design trade-offs, and discuss whether translation-path prefetching could become an important design direction for future heterogeneous systems

Voraussetzungen

Basic Knowledge of Computer Architecure
Good English Skill

Kontakt

Yuanji Ye

yuanji.ye@tum.de

Betreuer:

Yuanji Ye

Download Arbeit als PDF

Prefetching for LLM Inference: KV Cache Movement

Beschreibung

Large language model(LLM) inference is increasingly limited by memory access rather than pure computation, especially in long-context and decoding scenarios. A major reason is the high cost of moving model weights and KV cache data across the memory hierarchy. Recent work shows that prefetching can be used to overlap memory movement with ongoing computation or communication.

Compared with traditional hardware prefetching, LLM inference introduces a different setting. The prefetched object is no longer a small cacheline but larger units such as KV blocks. Moreover, prefetch timing depends on token generation, layer execution order, and runtime scheduling. Some recent studies also suggest that temporal patterns in attention behavior can be exploited to guide KV cache management and cross-token prefetching more effectively.

The goal of this seminar is to build a clear understanding of how prefetching concepts can be extended to AI inference systems. The student will compare recent approaches for KV cache and weight prefetching, discuss and summarize their architectural trade-offs, and determine whether LLM-serving workloads require new prefetching principles beyond those in CPU memory hierarchies

Voraussetzungen

Basic Knowledge of Computer Architecure
Good English Skill

Kontakt

Yuanji Ye

yuanji.ye@tum.de

Betreuer:

Yuanji Ye

Ongoing Student Work

Download Arbeit als PDF

Hardware Prefetcher Implementation for HPDcache System

Beschreibung

Hardware prefetching is a technique used in modern processors to reduce memory access latency by predicting future data accesses and fetching data into the cache before it is explicitly requested by the processor. By exploiting regular access patterns, such as strided memory accesses, prefetchers can improve cache hit rate and overall system performance.

This research internship focuses on the implementation of hardware prefetchers in modern processor cache systems. The intern will study the design and interfaces of HPDcache, which is already integrated into the CVA6 SoC. Based on this understanding, the intern will design and implement an IP-stride prefetcher and connect it to the HPDcache. The work also includes evaluating the implemented prefetcher using existing benchmarks on the FPGA board.

Through this internship, the student will gain hands-on experience in computer architecture, cache subsystem design, hardware integration, and experimental evaluation of prefetching techniques.

Voraussetzungen

Familiar with HDL
Have basic computer architecture knowledge

Kontakt

yuanji.ye@tum.de

Betreuer:

Yuanji Ye

Completed Student Work

Kontakt

Yuanji Ye

yuanji.ye@tum.de

Betreuer:

Yuanji Ye

Betreuer:

Yuanji Ye

Student

Yuxuan Li

Kontakt

Yuanji Ye

yuanji.ye@tum.de

Betreuer:

Yuanji Ye

Publication

2025

Oliver Lenke, Yuanji Ye, Jens Nöpel, Thomas Wild, Georg Sigl, Andreas Herkersdorf: A Combined Preload & Security (PLS) Unit for In-Vehicle Memory Hierarchies. 38th IEEE International System-on-Chip Conference, 2025 mehr… BibTeX

Yuanji Ye, M.Sc.

Wissenschaftlicher Mitarbeiter

Curriculum Vitae

Education

Work Experience

Teaching

Research

Open Student Work

SEM: Prefetching the Translation Path: MMU-Prefetch Co-Design for I/O Devices and Accelerators

Prefetching the Translation Path: MMU-Prefetch Co-Design for I/O Devices and Accelerators

Beschreibung

Voraussetzungen

Kontakt

Betreuer:

SEM: Prefetching for LLM Inference: KV Cache Movement

Prefetching for LLM Inference: KV Cache Movement

Beschreibung

Voraussetzungen

Kontakt

Betreuer:

Ongoing Student Work

FP: Hardware Prefetcher Implementation for HPDcache System

Hardware Prefetcher Implementation for HPDcache System

Beschreibung

Voraussetzungen

Kontakt

Betreuer:

Completed Student Work

SEM: 28.01.2026 Prefetching Techniques for GPGPU

Kontakt

Betreuer:

FP: 10.09.2025 Yuxuan LiProfiling-based Prefetcher Design

Betreuer:

Student

SEM: 24.02.2025 Prefetching Techniques Based on Machine Learning

Kontakt

Betreuer:

Publication

2025