### M.Sc. Frederik Walter

Technische Universität München

Professur für Codierung und Kryptographie (Prof. Wachter-Zeh)

###### Postadresse

Postal:

Theresienstr. 90

80333 München

- Tel.: +49 (89) 289 - 23492
- Raum: 0104.04.403
- frederik.walter@tum.de

## Biography

Frederik is currently a doctoral candidate in the Associate Professorship of Coding and Cryptography (Prof. Wachter-Zeh) with the Institute for Communications Engineering, School of Computation, Information and Technology, TUM, Munich, Germany.

Before joining the institute, he was part of a medical technology startup that was funded by the eXist program of the German Ministry of Economic Affairs. From 2017 to 2020, he worked in a management consultancy, advising clients on complex strategic transformations.

Frederik received his M.Sc degree with high distinction (1.0) in Electrical Engineering and Information Technology from TUM in 2016. During this time, he spent a semester at the National University of Singapore. He received his B.Sc. also with high distinction (1.0) in Electrical Engineering from Ulm University in 2014.

## Research Interests

Frederik's research interests focus on coding and information theory applied to the area of DNA data storage within the DiDAX project. In particular, he studies improvements in the synthesis of DNA strands.

One field is the use of composite DNA as a tool to increase the alphabet size and, therefore, the information density. Instead of synthesizing only one nucleotide in each cycle, mixing several nucleotides and encoding information in the occurring ratio is possible. Therefore, many interesting coding-related challenges arise when the strands are exposed to substitution, deletion, and/or insertion errors.

Another area in his research is the efficient synthesis for gene expression analysis. He applies coding theoretic concepts to minimize the synthesis time while ensuring all genes are uniquely represented on a testing array.

## Teaching

#### Teaching Assistant at TUM

#### Teaching Assistant an der Universität Ulm

- Grundlagen der Elektrotechnik I
- Grundlagen der Elektrotechnik II
- Signale und Systeme

#### Available Theses

#### Theses in Progress

## Bounds for irregular channels

#### Beschreibung

#### Abstract

The Gilbert–Varshamov bound and the sphere packing bound are fundamental results in coding theory, providing limits on the size of error-correcting codes. The Gilbert–Varshamov bound gives a lower bound on the size of a code for a given code length and minimum distance, while the sphere packing bound offers an upper bound based on how codewords can be arranged in space, accounting for the volume of error spheres.

In regular channels, error balls—regions in which decoding errors can occur—are uniform across all codewords. However, in *irregular channels*, these error balls may vary for different codewords, making the analysis more complex. Understanding how these bounds generalize to such irregular settings is crucial for advancing coding theory.

There are two papers which aim at generalising the classical bounds to irregular channels:

[1]** "The Generalized Gilbert–Varshamov Bound is Implied by Tura?n’s Theorem"**: This paper explores how Tura?n’s theorem, a result from extremal graph theory, can be used to generalize the Gilbert–Varshamov bound to irregular settings.

[2]** "Generalized Sphere Packing Bound"**: This paper extends the classical sphere packing bound to cases where error balls are not identical across codewords, considering more complex decoding conditions.

#### Task

The target of this seminar is to:

- Understand the classical Gilbert–Varshamov and sphere packing bounds. (Recap from Channel Coding lecture)
- Analyze how these papers generalize these bounds for irregular channels.
- Compare the approaches taken in both papers and describe the implications of their results for error-correcting code design in irregular communication environments.

**References: **

[1] A. Fazeli, A. Vardy, and E. Yaakobi, “Generalized Sphere Packing Bound,” *IEEE Transactions on Information Theory*, vol. 61, no. 5, pp. 2313–2334, May 2015, doi: 10.1109/TIT.2015.2413418.

[2] L. M. G. M. Tolhuizen, “The generalized Gilbert-Varshamov bound is implied by Turan’s theorem [code construction],” *IEEE Transactions on Information Theory*, vol. 43, no. 5, pp. 1605–1606, Sep. 1997, doi: 10.1109/18.623158.

#### Voraussetzungen

**Prerequisites:**

- Channel coding lecture
- Interest in descrete mathematics and graph theory

**Tags: **

Graph theory, sphere packing bound, linear programming, asymmetric errors, Z channel, deletion channel, subspace codes, fractional transversals, Clique, Gilbert–Varshamov bound, Turan’s theorem.

#### Kontakt

frederik.walter@tum.de

#### Betreuer:

## Coding for Composite DNA

#### Beschreibung

**DNA Data Storage**

Data storage on DNA molecules is a promising approach for archiving massive data.

In classical DNA storage systems, binary information is encoded into sequences consisting of the four DNA bases {A, C, G, T}. The encoded sequences are used to generate DNA molecules called *strands* using the biochemical process of DNA synthesis. The synthesized strands are stored together in a tube. To retrieve the binary information, the strand must be read via *DNA sequencing* and decoded back into the binary representation.

The synthesis and the sequencing procedures are error-prone, and with the natural degradation of DNA they introduce errors to the DNA strands. To ensure data reliability, the errors have to be corrected by algorithms and error-correcting codes (ECCs).

A 5min video with an overview of DNA storage: https://youtu.be/r8qWc9X4f6k?si=Yzm5sOW-a6VDnBu3

**Composite DNA**

Recently, to allow higher potential information capacity, [1,2] introduced the *DNA composite* synthesis method. In this method, the multiple copies created by the standard DNA synthesis method are utilized to create *composite DNA symbols*, defined by a mixture of DNA bases and their ratios in a specific position of the strands. By defining different mixtures and ratios, the alphabet can be extended to have more than 4 symbols. More formally, a composite DNA symbol in a specific position can be abstracted as a quartet of probabilities {p_A, p_C, p_G, p_T}, in which p_X, 0 ≤ p_X ≤ 1, is the fraction of the base X in {A, C, G, T} in the mixture and p_A+p_C+ p_G+ p_T =1. Thus, to identify composite symbols it is required to sequence multiple reads and then to estimate p_A, p_C, p_G, p_T in each position.

**Problem description**

ECCs for DNA data storage differ in many aspects from classical error correction codes. In this model, new error type gain relevance, like deletions and insertions which affect the synchronization of the sequences. Especially for composite DNA data storage, these error types received only little attention.

The most related work to this problem was recently studied by Zhang et al. in [6]. The authors initiated the study of error-correcting codes for DNA composite. They considered an error model for composite symbols, which assumes that errors occur in at most t symbols, and their magnitude is limited by l. They presented several code constructions as well as bounds for this model. In this thesis, we want to analyse a different way to model the composite synthesis method and studies additional error models. We already have some results for substitution and single deletion errors. This thesis should focus on evaluating more error models in the channel model.

This should only roughly introduce the problem. No need to review all references. If you are interested, please reach out to me, and we can discuss a suitable direction for you.

**References**

[1] L. Anavy, I. Vaknin, O. Atar, R. Amit, and Z. Yakhini, “Data storage in DNA with fewer synthesis cycles using composite DNA letters,” *Nat Biotechnol*, vol. 37, no. 10, pp. 1229–1236, Oct. 2019, doi: 10.1038/s41587-019-0240-x.

[2] Y. Choi *et al.*, “High information capacity DNA-based data storage with augmented encoding characters using degenerate bases,” *Sci Rep*, vol. 9, no. 1, Art. no. 1, Apr. 2019, doi: 10.1038/s41598-019-43105-w.

[3] V. Guruswami and J. Håstad, “Explicit Two-Deletion Codes With Redundancy Matching the Existential Bound,” *IEEE Transactions on Information Theory*, vol. 67, no. 10, pp. 6384–6394, Oct. 2021, doi: 10.1109/TIT.2021.3069446.

[4] J. Sima, N. Raviv, and J. Bruck, “Two Deletion Correcting Codes From Indicator Vectors,” *IEEE Trans. Inform. Theory*, vol. 66, no. 4, pp. 2375–2391, Apr. 2020, doi: 10.1109/TIT.2019.2950290.

[5] I. Smagloy, L. Welter, A. Wachter-Zeh, and E. Yaakobi, “Single-Deletion Single-Substitution Correcting Codes,” *IEEE Transactions on Information Theory*, pp. 1–1, 2023, doi: 10.1109/TIT.2023.3319088.

[6] W. Zhang, Z. Chen, and Z. Wang, “Limited-Magnitude Error Correction for Probability Vectors in DNA Storage,” in *ICC 2022 - IEEE International Conference on Communications*, Seoul, Korea, Republic of: IEEE, May 2022, pp. 3460–3465. doi: 10.1109/ICC45855.2022.9838471.

#### Betreuer:

## Publications

- Coding for Composite DNA to Correct Substitutions, Strand Losses and Deletions. Summer Doctoral Seminar 2024, 2024 mehr…
- Towards a Cryptographic Framework for DNA Data Storage. Coding Theory and Algorithms for DNA-based Data Storage - ISIT2024 Satellite Workshop, 2024 mehr…
- Coding for Synthesis of Composite DNA. Meeting on DNA Data Applications, 2023 mehr…